stateful firewalling on relays
Le me suggest doing a documentation enhancement and raise some awareness regarding operation of Tor relays behind stateful firewalls/packet filters.
Recently, I’ve enabled a stateful packet filter on my relays. Later on, consensus weight, guard probability and observed bandwidth started to decrease. Additionally, the avg count of open connections decreased. After some effort in debugging this I found out that the stateful tracking algorithm in the firewall started to drop packets due to TCP sequence number mismatches. I had approx. 3 - 7 mismatches per second, with relays serving an avg of 8mbyte/s and an avg of 13k open connections. When I reconfigured the ruleset to not perform sequence number verification, drops no longer occured and consensus weight and observed bandwidth is now slowly increasing again.
Of course, this can’t be fixed in Tor, as it is a fault in the TCP/IP stacks on the remote end, or might be caused by NAT operations in residential CPEs. Or maybe this happens because residential equipment is usually not designed to carry long-lasting TCP sessions and might have issues with sequence number wraps. But nevertheless, documentation should cover this and raise awareness:
1.) If possible, TCP sequence number verification should be disabled in the firewall, as a Tor relay must expect to receive packets that might not pass such verification. 2.) Even if the sequence number verification recovers after the remote end retransmits the packets, this might still trigger congestion avoidance in TCP/IP stacks - resulting in performance degradation for the end-user and resource starvation on the relay side. 3.) Firewalls should be configured to not drop invalid packets, but send an RST instead. Even if some fault with stateful firewalling happens, this prevents that TCP sessions are stalled and connections established by end-users run into timeout. 4.) With the increasing shortage of IPv4 addresses a lot of ISPs will start placing their customers behind CGNAT or NAT444. This might result in causing more issues on Tor nodes behind firewalls with security-cautious stateful filtering. 5.) Some areas of the world might already place their users behind gateways, that are doing unexpected TCP header modifications on NAT. This might also cause security-cautious stateful firewalling on a relay to fail.
Another issue might be the following: Some stateful firewalling implementations don’t allow either side of the connection to increase/decrease their MSS after the TCP connection is established. Maybe some low-end and residential devices do this, which also results in packets being dropped.
From my point of view, the overall risk is as follows: Due to the shortage of IPv4 addresses, more and more ISPs put their customers behind NAT. As such residential implementations often don’t implement the RFCs properly, stateful firewalling on the relays results in an increasing instability in the Tor network.
Maybe it’s also a good idea having Tor generate a warning message if it sees repeating bursts of timeouts in TCP connections.
This happened with Tor 0.2.7.6 on FreeBSD 10.1, using PF as firewall. Relays have public IP addresses, so no NAT is performed by the packet filter. Disabling TCP sequence number verification can be configured with the “sloppy” option on relevant rules. Most certainly, this can also happen with other firewalls, like iptables or commercial vendors (but haven’t verified this).