Conflicting logic about whether bridges need descriptors for fetching dir info from them
If you start your Tor with a pile of configured bridges but nothing cached, your Tor will sample the configured bridges to pick its ordered list of primary entry guards, and launch descriptor fetches to each of them.
But if the descriptor hasn't arrived yet, while trying to bootstrap dir info you get these confusing messages in your logs:
Jan 31 18:56:44.928 [notice] Ignoring directory request, since no bridge nodes are available yet.
Things do bootstrap eventually, but it takes longer than it should, and the pile of scary log messages is scary.
What's going on here?
The way the log message comes about is that directory_get_from_dirserver() calls
const node_t *node = guards_choose_dirguard(dir_purpose, &guard_state);
if (node && node->ri) {
[...]
} else {
[...]
log_notice(LD_DIR, "Ignoring directory request, since no bridge "
"nodes are available yet.");
}
i.e. guards_choose_dirguard had better return a bridge for which we have the descriptor, or we're going to log a complaint and abort the directory fetch attempt.
But in select_primary_guard_for_circuit(), we do
const int need_descriptor = (usage == GUARD_USAGE_TRAFFIC);
[...]
SMARTLIST_FOREACH_BEGIN(gs->primary_entry_guards, entry_guard_t *, guard) {
[...]
if (guard->is_reachable != GUARD_REACHABLE_NO) {
if (need_descriptor && !guard_has_descriptor(guard)) {
log_info(LD_GUARD, "Guard %s does not have a descriptor",
entry_guard_describe(guard));
continue;
}
That is, in select_primary_guard_for_circuit() we require that the bridge have a descriptor only for the GUARD_USAGE_TRAFFIC case, but then in directory_get_from_dirserver() we expect that the bridge will always have a descriptor, even in the GUARD_USAGE_DIRGUARD case.
In normal operation this bug isn't a big deal, because it is a race to finish fetching the descriptor before we happen to pick it for asking directory info. But with the #40578 fix, where we defer fetching the descriptor if we won't use the bridge for the GUARD_USAGE_TRAFFIC case, the bug becomes more obvious.
I believe the fix is simply to always need_descriptor in select_primary_guard_for_circuit() -- meaning when we're going to launch a directory fetch we always choose among our primary guards who have descriptors already.