Do some analysis for the overload data we currently have

changed milestone to %Sponsor 61 - Making the Tor network faster & more reliable for users in Internet-repressive places

added S61-O2-Maybe - FINISHED Sponsor 61 - FINISHED labels

mentioned in issue #33 (closed)

added Backlog label

@mikeperry how soon do we need this?

@mikeperry how soon do we need this?

Not before September we said in today's meeting.

assigned to @gk

added Doing label and removed Backlog label

mentioned in issue #64 (closed)

I think I am almost done with all the tooling for now, so I have some graphs to share. They start from last week even though I have data from earlier already. We've been hunting Onionoo bugs since 2 weeks which might affect the number, so I am not sure whether I want to take earlier data into account. Maybe it's okay to just go with what we have right now.

overload-general for relays

overload-fd-exhausted for relays

overload-ratelimits for relays

bridge overload

There seems to be a pretty tight coupling of exit not general overload and relay general overload which is really visible on the other two relays related graphs. It's not clear why that's happening. Maybe it's due to tpo/core/tor#40491 (closed).

Once thing I want to investigate a bit further is the significant overload general drop on 10/15. Maybe that's caused by another issue with out Onionoo-based data.

Another thing which I might add to our graphs to put things into perspective is the amount of relays that are actually on Tor >= 0.4.6 given that the overload is not reported earlier. For now this boils down to:

1058: 0.4.6             [18.81 %] (MAJOR)
  55: 0.4.6.0-alpha-dev [0.12 %] 
   1: 0.4.6.2-alpha     [0.00 %] 
   3: 0.4.6.3-rc        [0.06 %] 
  14: 0.4.6.4-rc        [0.11 %] 
 122: 0.4.6.5           [3.22 %] 
  96: 0.4.6.6           [2.29 %] 
   1: 0.4.6.6-dev       [0.00 %] 
 760: 0.4.6.7           [12.93 %] 
   6: 0.4.6.7-dev       [0.08 %] 
  72: 0.4.7             [1.18 %] (MAJOR)
  14: 0.4.7.0-alpha-dev [0.34 %] 
  53: 0.4.7.1-alpha     [0.73 %] 
   5: 0.4.7.1-alpha-dev [0.11 %]

which means roughly 1160 relays. Given the numbers in the graphs that means an overload between 20% and 25% of all relays reporting overload right now.

For bridges that's a bit harder to figure out currently due to a bug in our helper script (helper-scripts#13 (closed)).

added 8h of time spent

Once thing I want to investigate a bit further is the significant overload general drop on 10/15. Maybe that's caused by another issue with out Onionoo-based data.

Does not look like an Onionoo issue. It's mainly caused by the Emerald Onion folks restarting their exit fleet from time to time. They get overloaded fast (maybe due to tpo/core/tor#40491 (closed)) and then restart their relays due to adding new servers and updating family settings etc.