Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
Trac
Trac
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Operations
    • Operations
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Issue Boards

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • Legacy
  • TracTrac
  • Issues
  • #16555

Closed (moved)
Open
Opened Jul 11, 2015 by Karsten Loesing@karsten

Make user statistics more robust against outliers

tl;wr: From June 11 to 13, 2015, the number of bridge users briefly went up from around 20k to 140k. A closer investigation of the underlying data revealed that the aggregate statistics reported by a single bridge were responsible for this major spike. The estimation method used for user statistics should be made robust against outliers, possibly by applying the more recently developed techniques that are used to extrapolate hidden-service statistics.

Here are more details about that single bridge reporting almost unbelievable high statistics: It's the bridge with nickname "solemnizersfiaun" and hashed fingerprint 420C39C86B0E71F653E18552B28B9189DA2F1377 that reported to have served up to 80k users. But from the bandwidth statistics it looks like that bridge actually answered a huge number of consensus requests during those days in June. It pushed up to 20 MB/s, which is probably rather unusual for a bridge. A closer look at the descriptor tells us that most of these bytes were used to answer directory requests. (I didn't do the math whether a such a burst over a few hours would be sufficient to write 800k compressed consensuses.) So, either the bridge is telling us the truth, or it's lying to us in a very sophisticated way.

And it's not only that bridge that reported very high statistics in June. There's another bridge with nickname "Unnamed" and hashed fingerprint 82F37B9A8400A1E0C0730D8E4639150AE11AC640 that reported to have served around 10k users on June 18 and 22. Similarly, that bridge reported extremely high traffic during those days. I didn't look for more bridges, but it's possible that there were more that reported unusual numbers that didn't stand out as much as these.

So, I'm not sure if we'll find out what exactly happened there, but it seems very unrealistic that these directory requests were generated by actual human users. That's why I think we should remove these outliers in our estimation method.

To upload designs, you'll need to enable LFS and have admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: legacy/trac#16555