Dealing with non-compliant protocol messages
Summary
While discussing #40400 (closed) on irc/matrix with @nickm, we thought it might be good to have a ticket discussing potential protocol extensions that would help track down reasons behind non-compliant protocol messages.
As of the current situation, #40400 (closed) and the related merge request deal with improving the logging system:
- making ProtocolWarnings less noisy (e.g., rate-limiting or downgrading to info or debug)
- adding a periodic Heartbeat message that logs further information about the misbehaved circuits (e.g., the circuit lifetime)
Protocol level extensions could be useful to fetch/share the nodes' view of the circuit and could complement the periodic Heartbeat message. The main interest would be to help into distinguishing between 1) bugs, 2) bugged byzantine client/relays, 3) actively malicious client/relay with information report from one endpoint. The main problem would be to avoid sharing any too sensitive information.
A first simple idea
An echo/echo-response type of protocol may help in exchanging/getting more information when an abnormal protocol state is detected. Upon receiving a RELAY-level protocol echo, the node (endpoint) may log information or/and echo back some information to the other endpoint. Those questions yet remain:
- What to exchange/get from the peer?
- What to log on the endpoints?
- What to do if in answer to an echo message we get a non-compliant protocol message.
- Among information which we may think could help, what is safe to exchange?