Skip to content

Improve overload-general wrt ntor onionskin drops

We need to do a couple small tweaks to how overload-general reacts to dropped ntor handshakes. Basically:

  • overload-general should not be listed unless X% of ntors drop over Y seconds (X is a consensus param, as a fraction of 100, Y also consensus param)
  • Add checks to mark_my_descriptor_dirty_if_too_old() to make us republish our descriptor if overload-general disappears, appears, or changes timestamp.

For the first point, the question of X% over how long is relevant, but this matters less if we update our descriptor immediately whenever the overload state or timestamp changes.

Remember that as soon as a relay is so overloaded that it is dropping ntors, traffic is already being biased away from that relay, because those circuits fail. So percents of tolerance, which can be related to the percent of backoff by sbws seem to make the most sense, but I'd be open to other ideas. Favoring implementation simplicity seems important here, too.

This is not super urgent, since we are not reacting to overload-general in terms of relay weights, and won't until after tpo/network-health/sbws#40125 (closed), but we should fix this before we hit 0.4.7.x-stable.

Related references:

Cc: @gk, @dgoulet

Edited by Georg Koppen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information