Specify overload relay descriptors for load balancing and network health
We need to specify descriptor fields to signify when relays:
-
Are at or near CPU overload -
Are at or near OOM killer invocation -
Are at or near connection count limits -
At or near their token bucket limit -
Accumulating too many cells in queues (circuitmux, tls outbuf, aes) -
Are failing too many onionskins, tls handshakes, other things? -
Flag/checks to signify which relays are on the same machine
The specification should only emit enough information to determine if relays are at or near various forms of overload. They should not report detailed statistics, as these may aid in DoS attacks and traffic analysis.
With this information, we will use sbws to avoid allocating extra load to these relays, as well as use these fields to report unhealthy relays on the metrics portal, and investigate other misbehavior.
I can work on this spec but I will need much input from @dgoulet.
Edited by Mike Perry