Resolve spec issues found while implementing Arti guard manager code.

I have two proposed changes and one open question stemming from my Arti guard implementation.

Proposed change 1: exponential backoff for retrying guards

Right now, Tor and Arti retry guards on a fixed schedule based on PRIMARY_GUARDS_RETRY_SCHED and GUARDS_RETRY_SCHED in guard-spec.txt. I've updated those schedules to match what Tor and Arti actually do...

...but wouldn't it be better for them to do an exponential backoff with decorrelated jitter, like we do for directory retries?

Proposed change 2: Make recorded dates consistent with recorded orderings.

Right now, Tor and Arti record an "added_on_date" and an optional "confirmed_date" for each guard. These are randomized to be at some time in the past before the guards were truly added or truly confirmed. [See guard-spec.txt section A.2] Additionally, Tor and Arti record the actual order in which the guards were added and confirmed.

These randomized dates can be inconsistent with their orderings. Combined, they can be used by an attacker with access to the list of guards to narrow down the real times at which the guards were really added and confirmed.

Can we do better here?

One possibility is to randomize the times such that they always reflect the true order.

Open question

Arti differs from Tor when it comes to deciding whether or not an exploratory circuit (one using a non-primary guard) can be used.

Tor behaves as described in guard-spec section 4.9: if an exploratory circuit is "waiting_for_better_guard", then we advance it (or not) depending on the status of all other circuits using guards that we'd rather be using.

Arti is simpler: it advances such circuits only depending on the status of their guards in relation to the status of other guards.

Specifically, Arti does something like this:

After completing a circuit, the implementation checks whether its guard is usable.

The decision is reached according to these rules:

Primary guards are always usable.

Non-primary guards are usable for a given circuit if every "better" guard is either unsuitable for that circuit (e.g. because of family restrictions), or marked with {is_reachable} = , or has been pending for at least {NONPRIMARY_GUARD_CONNECT_TIMEOUT}.

If a circuit's guard is not usable immediately, the circuit is not discarded. Instead, it is kept up to {NONPRIMARY_GUARD_IDLE_TIMEOUT} to see if it becomes usable.

Is this behavior okay? Should I specify it as an alternative to what Tor does?

Edited Oct 21, 2021 by Nick Mathewson

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information