Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Arti Arti
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 142
    • Issues 142
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 13
    • Merge requests 13
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • The Tor Project
  • Core
  • ArtiArti
  • Issues
  • #329
Closed
Open
Created Feb 10, 2022 by Nick Mathewson@nickm🐻Owner12 of 17 tasks completed12/17 tasks

Recover from bootstrap failures, or at least don't misbehave

We should make sure Arti behaves sensibly under several cases that simulate typical bootstrapping problems, including:

  • System clock set wrong, no directory
  • System clock set wrong, cached directory is live (and live according to clock)
  • System clock set wrong, cached directory is live according to clock only.
  • Fallbacks don't respond
  • Fallbacks serve ancient consensus
  • Fallbacks time out
  • Fallbacks give 404
  • Unable to connect to the network (see also #319 (closed))
  • All addresses are MITM'd, as if by a captive portal
  • IPv6-only (see #92 (closed))
  • All ports but 443 are blocked.
  • Guard refuses all circuits
  • Guard has wrong identity
  • All relays have wrong identity
  • All fallbacks have wrong identity
  • TLS fails
  • Consensus proves not to be well-signed after getting authority certs.
  • consensus claims to be signed with keys that don't exist.
  • (...what else?)
  • (... can we get into a redirect loop? ...)
  • Later: pt failure, bridge failure, outbound proxy failure.

We don't necessarily need to recover from every one of these cases, but we should make sure that we don't run wild. Specifically, we shouldn't make super-frequent attempts to connect to the network, we shouldn't eat up CPU or RAM, we shouldn't flood the logs, and so on.

  • Followup:
    • Use coverage tool to see if every error type is actually constructed

Subtasks:

  • #397 (closed): Implement a directory-munger for arti-testing

Cases to fix:

  • #403 (closed): Always send If-Modified-Since for a consensus
  • #404 (closed): Back off from directory caches that give us bad information
  • #405 (closed): Track reported skew from NETINFO cells (!450 (merged), ...)
  • #406 (closed): Whatever the heck is going on with the all-fallbacks-have-bad-keys case.
  • #407 (closed): Rate-limit channel (or circuit?) retries?
  • #412 (closed): Use older consensuses when new ones are not available.
  • #437 (closed): Backoff on retrying predicted circuits when they fail.
  • #438 (closed): Consider handling of future consensuses from the cache.
  • #439 (closed): Retry when consensus signatures are bad.
  • #466 (closed): Don't fetch a consensus from very-skewed directory.
  • #467 (closed): Always send if-modified-since.

Not yet fixed, but addressed well enough for now:

  • #440: Better handling for case when certificates don't exist.

Other ideas for improvements:

  • #401: A future consensus indicates a skewed clock.
  • #402: Always be willing to cache a consensus?
  • #433: smarter directory timer retry interleaving.
  • #436: Reject untimely consensuses early.
Edited May 20, 2022 by Nick Mathewson
Assignee
Assign to
Time tracking