hs: Cleanup race between circuit close and free with the HS circuitmap
Here is the scenario that has been encountered by asn. The following is in chronological order:
-
[02:21:27] An intro point (IP) is picked and circuit A is launched.
-
[02:23:47] Circuit A never completes and thus is marked for close which is NOT free thus still in our HS circuitmap.
-
[02:23:48] A second later, the HS housekeeping kicks in (every 1 sec) and we notice that we are missing a circuit for this IP so we launch a new circuit B. What happens then is this new circuit B takes the place of circuit A in the HS circuitmap with the same hs_token (the token in this case is the IP authentication public key).
At this point, we have circuit A marked for close and we have circuit B that was launched and replaced circuit A with the same hs_token which identify the IP. So far so good.
- [02:23:48] At the same second, we get a
circuit_build_failed()
which is only called fromcircuit_about_to_free()
->circuit_close_all_marked()
which means it is freeing circuit A leading to ahs_circuitmap_remove_circuit()
which will use itshs_token
to remove circuit B because it is the same token.
We end up with no circuit for the intro point IP in our circuitmap.
-
[02:23:49] A second after our circuit build failed which removed circuit B from the circuitmap, we cleanup the IP from our list through the house keeping and we launch a new circuit to a new intro point. The reason why we removed the IP is either because it has expire, no
node_t
could be found or we retried too many times on that circuit which the allowed number of retries is 3 right now. When we do remove an IP from our service list, we also try to close the circuit but because circuit B got removed from the circuitmap, we can't find it thus it is not closed. -
[02:24:54] Circuit B completes but we are unable to find the IP in our service list. And boom we get the warning:
In summary, we end up with an intro circuit being established but not in our circuitmap leading to the failed lookup when we remove that IP from our service list for whatever reason.
Seems like one thing we could do is actually remove a circuit from the HS circuitmap when we mark for close because at that point, the circuit is good as dead. We avoid this race between mark for close and actually being freed where in between a new circuit to the same destination with the same hs_token can be launched.