Document for the Relay Operator community how to debug relays that are slower than what the operator expects
This idea origins from a conversation betweeh @beth, @gk and I on #tor-dev today.
We often release new features of C Tor to the relay operators that causes discussions/conversations around whether Tor has gotten faster/slower/uses (more|less) memory/crashes (more|less) often/etc. many of these items are hard to give a definitive "yes, the cause of this is X" and it's very time consuming for the Network Team to debug each item individually with the operator.
It would be very useful to have a document in place that informs relay operators about the different situations that may impact performance and how they can get some performance measurements they can then compare to and see if our performance have truly regressed. This can also be used to push MetricsPort to more operators.
We can expand upon the document over time as we discover new ways to do this analysis and/or from feedback from the relay operator community.
This is related to:
- https://lists.torproject.org/pipermail/tor-relays/2023-December/021409.html
- https://lists.torproject.org/pipermail/tor-relays/2023-December/021407.html
This may be relevant to Arti Relay too.
CC @mikeperry for awareness.