Confusing "Your relay has a very large number of connections to other relays" relay message
A new relay operator reports this complaint in their logs, showing up hourly:
Your relay has a very large number of connections to other relays. Is your
outbound address the same as your relay address? Found 13 connections to 8
relays. Found 13 current canonical connections, in 0 of which we were a
non-canonical peer. 5 relays had more than 1 connection, 0 had more than 2, and
0 had more than 4 connections.
I checked, and their outbound address was the same as their relay address.
Upon investigation, it looks like the redundant connections are to directory authorities.
My theory is that the change from #17592 (moved) (which went into 0.3.1.1-alpha, commit d5a151a0) is responsible: while before that canonical relay-to-relay connections would expire after either side first reached its randomized "15 to 22.5 minute" timeout, now they expire after either side reaches its "45 to 75 minute" timeout. And since directory authorities test reachability every 1280 seconds (around 21.3 minutes), that means it is expected that most relays will have duplicate canonical connections with directory authorities.
Possible fixes:
(A) Change the notice-level log to make it clearer that it's not scary, or at least it's not actionable. Maybe that means making it info-level so nobody will see it. Probably not the best option, assuming there are cases where we do want relay operators to hear it.
(B) In channel_check_for_duplicates(), change MIN_RELAY_CONNECTIONS_TO_WARN 5
to a high enough number that even if we have 2 canonical conns per authority, plus a bit more, the log message still doesn't trigger.
(C) In channel_check_for_duplicates(), skip over connections to directory authorities in some way, since we know they will be special.
(D) Make connections to or from directory authorities expire quicker, on the theory that they don't really need the same level of padding protection as other connections.
(E) Your idea here?
I'd be fine with any of B,C,D. Whichever one can be done with an easy, short, and non-invasive patch is my favorite. Maybe that's "B, raise it to 30"? That would make the message trigger when we have connections to more than 30 relays and also we have more than 45 connections open. Or we could pick the more conservative "raise it to 40", on the theory that small numbers are more likely to have edge cases and less indicative of major network problems anyway.
And while we're at it, it might be smart to say in the log message what action we want the relay operator take, e.g. "Please report:".