Skip to content

Recover from bootstrap failures, or at least don't misbehave

We should make sure Arti behaves sensibly under several cases that simulate typical bootstrapping problems, including:

  • System clock set wrong, no directory
  • System clock set wrong, cached directory is live (and live according to clock)
  • System clock set wrong, cached directory is live according to clock only.
  • Fallbacks don't respond
  • Fallbacks serve ancient consensus
  • Fallbacks time out
  • Fallbacks give 404
  • Unable to connect to the network (see also #319 (closed))
  • All addresses are MITM'd, as if by a captive portal
  • IPv6-only (see #92 (closed))
  • All ports but 443 are blocked.
  • Guard refuses all circuits
  • Guard has wrong identity
  • All relays have wrong identity
  • All fallbacks have wrong identity
  • TLS fails
  • Consensus proves not to be well-signed after getting authority certs.
  • consensus claims to be signed with keys that don't exist.
  • (...what else?)
  • (... can we get into a redirect loop? ...)
  • Later: pt failure, bridge failure, outbound proxy failure.

We don't necessarily need to recover from every one of these cases, but we should make sure that we don't run wild. Specifically, we shouldn't make super-frequent attempts to connect to the network, we shouldn't eat up CPU or RAM, we shouldn't flood the logs, and so on.

  • Followup:
    • Use coverage tool to see if every error type is actually constructed

Subtasks:

Cases to fix:

Not yet fixed, but addressed well enough for now:

  • #440: Better handling for case when certificates don't exist.

Other ideas for improvements:

  • #401: A future consensus indicates a skewed clock.
  • #402: Always be willing to cache a consensus?
  • #433: smarter directory timer retry interleaving.
  • #436: Reject untimely consensuses early.
Edited by Nick Mathewson