Dealing with non-compliant protocol messages

Summary

While discussing #40400 (closed) on irc/matrix with @nickm, we thought it might be good to have a ticket discussing potential protocol extensions that would help track down reasons behind non-compliant protocol messages.

As of the current situation, #40400 (closed) and the related merge request deal with improving the logging system:

making ProtocolWarnings less noisy (e.g., rate-limiting or downgrading to info or debug)
adding a periodic Heartbeat message that logs further information about the misbehaved circuits (e.g., the circuit lifetime)

Protocol level extensions could be useful to fetch/share the nodes' view of the circuit and could complement the periodic Heartbeat message. The main interest would be to help into distinguishing between 1) bugs, 2) bugged byzantine client/relays, 3) actively malicious client/relay with information report from one endpoint. The main problem would be to avoid sharing any too sensitive information.

A first simple idea

An echo/echo-response type of protocol may help in exchanging/getting more information when an abnormal protocol state is detected. Upon receiving a RELAY-level protocol echo, the node (endpoint) may log information or/and echo back some information to the other endpoint. Those questions yet remain:

What to exchange/get from the peer?
What to log on the endpoints?
What to do if in answer to an echo message we get a non-compliant protocol message.
Among information which we may think could help, what is safe to exchange?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information