hs-v3: Service circuit retry limit should not close a valid circuit
Context: Lets say a service has 3 intro circuits opened and stable.
Now, imagine one circuit collapses, like for instance the intro point restarted "tor" after an update. Our code is designed to retry 3 times that is once every second and then give up on the intro point.
What it looks like:
Every second, run_build_circuit_event()
is run and launches intro circuit(s) if we are missing any. For each IP, it will increment the circuit_retries
counter but does not actually look at it to decide to launch or not.
Before that event, also every 1 second, cleanup_intro_points()
checks that every intro point has not expired, fell off the consensus or that circuit_retries
is greater than (>) MAX_INTRO_POINT_CIRCUIT_RETRIES = 3
.
Putting this together, imagine now that the first 3 attempts failed for whatever reasons and then we launch a 4th one because circuit_retries = 3
, it does pass validation of > MAX_INTRO_POINT_CIRCUIT_RETRIES
so then a circuit is launched and that very one succeeds.
Because circuit_retries
is now 4 then the next second, cleanup_intro_points()
removes the IP and closes the valid open established circuit...
I've observed the above a surprising amount of time because booting a tor relay takes some seconds mostly due to the diff-cache parsing.
In a nutshell, we should NOT launch a circuit if we reached the max retries for an intro point. This back and forth of opening a circuit and then deciding that we went over the limit makes no sense in the first place.