Recover from bootstrap failures, or at least don't misbehave
We should make sure Arti behaves sensibly under several cases that simulate typical bootstrapping problems, including:
- System clock set wrong, no directory
- System clock set wrong, cached directory is live (and live according to clock)
- System clock set wrong, cached directory is live according to clock only.
- Fallbacks don't respond
- Fallbacks serve ancient consensus
- Fallbacks time out
- Fallbacks give 404
- Unable to connect to the network (see also #319 (closed))
- All addresses are MITM'd, as if by a captive portal
- IPv6-only (see #92 (closed))
- All ports but 443 are blocked.
- Guard refuses all circuits
- Guard has wrong identity
- All relays have wrong identity
- All fallbacks have wrong identity
- TLS fails
- Consensus proves not to be well-signed after getting authority certs.
- consensus claims to be signed with keys that don't exist.
- (...what else?)
- (... can we get into a redirect loop? ...)
- Later: pt failure, bridge failure, outbound proxy failure.
We don't necessarily need to recover from every one of these cases, but we should make sure that we don't run wild. Specifically, we shouldn't make super-frequent attempts to connect to the network, we shouldn't eat up CPU or RAM, we shouldn't flood the logs, and so on.
- Followup:
- Use coverage tool to see if every error type is actually constructed
Subtasks:
-
#397 (closed): Implement a directory-munger for arti-testing
Cases to fix:
-
#403 (closed): Always send If-Modified-Since
for a consensus -
#404 (closed): Back off from directory caches that give us bad information -
#405 (closed): Track reported skew from NETINFO cells (!450 (merged), ...) -
#406 (closed): Whatever the heck is going on with the all-fallbacks-have-bad-keys case. -
#407 (closed): Rate-limit channel (or circuit?) retries? -
#412 (closed): Use older consensuses when new ones are not available. -
#437 (closed): Backoff on retrying predicted circuits when they fail. -
#438 (closed): Consider handling of future consensuses from the cache. -
#439 (closed): Retry when consensus signatures are bad. -
#466 (closed): Don't fetch a consensus from very-skewed directory. -
#467 (closed): Always send if-modified-since.
Not yet fixed, but addressed well enough for now:
-
#440: Better handling for case when certificates don't exist.
Other ideas for improvements:
Edited by Nick Mathewson