Skip to content

tor-relays do not recover from 15-20 minute network outage + broken ipv6 without restart

Summary

After a network/routing outage that leaves network connectivity on-link but no traffic flows the tor-relays will seem like they recover by pushing traffic again but the traffic will slowly fizzle out to zero over the next hours and the nodes will be shown as offline subsequently losing Guard/Stable flags and consensus.

Steps to reproduce:

  1. Setup a few Exit relays (this might also be the case for middle but we do not operate those)
  2. Turn off the router of the network segment for 15-20 minutes
  3. Recover the router and observe tor processes resume pushing traffic (1-2Gbps in our case)
  4. Observe tor processes to reduce their traffic until hitting 0 after ~6 hours
  5. All flags are now gone and recovery of the relays requires a restart as well as days in recovery to re-obtain flags and resume pushing original traffic levels (it's been 5 days and we're just at 50% of the original traffic levels at the time of writing)

What is the current bug behavior?

Tor seems to soft-lock and stops operations

What is the expected behavior?

Tor should be able to recover fully without manual restarts to minimize consensus slashing and flag losses

Environment

  • Tor version 0.4.8.13 / tor 0.4.8.13-2~jammy+1
  • Ubuntu/22.04
  • APT

Relevant logs and/or screenshots

https://metrics.torproject.org/rs.html#search/family:CFAFB2E0CBB00E067B83E3216AD49EF338E045E3 family in question outage occurred 18th Dec 14:00 UTC

Possible fixes

Perhaps implement a watchdog that automatically recovers the tor relay on connectivity failure/s

Edited by r0cket
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information