Tor has extra guard connections

added Backlog label

Some tinkering with this seems to show that it can definitely happen at startup if a guard connection is delayed due to networking issues. When this happened, I ended up with a guard with confirmed_idx=20 (very low priority) as a third guard, once connectivity resumed.

This third guard connection then stayed around, because Tor would still happily use it for more circuits, instead of avoiding it for all future use, once we had enough connected guards with higher priority.

Worse, even when I killed the connection, Tor immediately respawned two more connection attempts to this guard, even though it already had 2 other guards open at that point. The most bizarre thing to me is that it kept trying for a while.

I am wondering if we can add also more checks not to launch more guards if we already have enough connected. I am guessing that such checks exist, but some other part of the maze is overriding them, or they are checking in the wrong place in terms of circuit construction/retry.

added Next label and removed Backlog label

guard-n-primary-guards-to-use=2 * cfx_num_legs_set=2 = 4, no?

It's not that simple, but good guess. It appears to be conflux related.

Conflux applies Guard restrictions so that the same guard cannot end up in both legs of the conflux set.

However, it applies these restrictions to exclude guards before the number of primary guards is considered (in select_primary_guard_for_circuit()). Guards that are not excluded are added to the list of usable guards, and this loop stops once the primary guard limit is reached.

So for each conflux set with 1 leg, it will build a list of two more "primary guards" to choose from, for the second leg. Thus, a third guard can be used for the second leg of some conflux sets.

changed milestone to %Tor: 0.4.8.x-post-stable

added Backport label

added Doing label and removed Backport label

added Backport label

Ugh this one is going to be an annoying amount of refactoring to fix. We were using the restrictions so that we did not have to redo all the guard filtering logic, but it appears we can't use them for this reason...

Unless we set some kind of flag on guard restrictions to make them "temporary" or something, and set a bool in-param if a temporary restriction is hit, so that the list counter does not count temporarily-excluded primary guards.

This has fragility if the list can somehow become empty from temporary restrictions, so we'll have to make sure we always end up with at least one guard before exiting the loop.

I realized a third guard can also be used if the user's Guard is also an Exit, and it is chosen as an Exit. Then, the list of primary guards can also have a third node.

Here's a stab at fixing both issues, but this breaks tons of guard unit tests: https://gitlab.torproject.org/mikeperry/tor/-/commits/bug40876

There are also similar issues with the confirmed list, secondary list, and filtered list.

I think the startup problem I noticed in #40876 (comment 2957143) is also separate. But I believe with this fix, it might stop using those extra startup guards now? I will have to dig to make sure, though.

More work needed.

~~This seems to help, but it is not enough. Sometimes it would converge to two guards, but I also got it to use three after a restart once.~~

Edit: It was skipping a list length check. That seems to have gotten most cases. Testing to uncover more.

mentioned in merge request torspec!182 (merged)

Ok I am just going to focus on the immediate issue with this bug, rather than dig into the infinite rabbithole of ways Tor can end up using different guards sometimes. I added extra logs to try to diagnose the other cases later.

Code MR: !778 (merged) Spec MR: torspec!182 (merged)

mentioned in issue arti#1091

This was merged into 0.4.8

closed

mentioned in issue #40892

Tor has extra guard connections

Child items ...

Activity