When an onion service lookup has failed at the first k HSDirs we tried, what are the chances it will still succeed?
Right now onion services publish to 8 HSDirs every 1-2 hours (see upload_descriptor_to_all()), and clients fetch from any of the core 6 of those 8.
Right now a double-digit percentage of the onion service lookups in the network result in failure, i.e. no onion descriptor found. (See upcoming FOCI 2021 paper for data and graph.)
So the question is: if a client has gotten a "404 never heard of it" from five of the hsdirs, does asking the sixth ever help?
If it turns out that it doesn't, we should save time for the user, and save load for the network, and save privacy for the user (fewer circuits, less surface area) by not bothering with that sixth circuit.
More generally, is there a cutoff of request attempts after which it very likely won't help so we shouldn't bother?
(Even if there is a clear cutoff today, it could change tomorrow, so if we add this feature we'd want to have a consensus param, and continue measuring to know if it should change.)
Useful building blocks for this ticket, which started on something similar in the past but got closed before we got there:
tpo/network-health/metrics/analysis#13209 (closed)
tpo/core/tor#13208 (closed)
In those tickets, @dgoulet found that 3% of the time it helps to try a second HSDir, but it never helps to try a third. But I'm not sure if his experiment at the time was broad enough to conclude that we should change Tor's behavior to only try two HSDirs and then give up.