toralf and steinex and others on irc report that currently their Tor relays have many dozens of IP addresses that have more than 50 concurrent connections open to their relay.
When I run your command, I'm seeing numbers that range from 64-75. If the consensus parameter is set to 50, it doesn't seem too far off, considering the OS needs to clean those connections up after the application has told them to stop.
I decided to see what happens when I add a netfilter rule to only allow 60 connections per IP. I turns out it blocked ~500K connection attempts within 5 minutes.
nft -f - <<EOFtable inet per-conn-limitdelete table inet per-conn-limitdefine ports = { 80, 443 }table inet per-conn-limit { set conns_v4 { type ipv4_addr . inet_service size 65535 flags dynamic } set conns_v6 { type ipv6_addr . inet_service size 65535 flags dynamic } chain input { type filter hook input priority 0; policy accept; tcp dport $ports ct state new add @conns_v4 { ip saddr . tcp dport ct count over 60 } counter drop tcp dport $ports ct state new add @conns_v6 { ip6 saddr . tcp dport ct count over 60 } counter drop }}EOF
Well, Tor itself stops accepting new connections, but the underlying OS needs time to clean up those established TCP connections.
I would usually expect a connection to get rejected before it reached the established state. There also appears to be quite a discrepancy between the number of connections rejected by the above NFT rules and the number of rejected concurrent connections reported in Tor's Heartbeat messages. Tor reports far fewer rejected connections.
My best theory actually is that Tor is obeying the limit: every time it starts processing a new connection, if it's over the limit it closes it.
But if the rate of incoming connections is high enough, then there will be some dozens of extra connections in the 'established' state that Tor hasn't gotten around to closing yet.
I asked toralf for clarification on whether the remote ports of the established connections change each time he runs ss, and he confirmed that "the ports of the addresses with > 50 connections rotates very fast, there no long-term connections made"
So: mystery solved, Tor is obeying its DoSConnectionMaxConcurrentCount, but also this reinforces what @pege is saying above, that doing it at the Tor application layer is a crappier approach than doing it in iptables.
We might be wise to come up with a guide of recommended iptables rules for big relay operators to run alongside their Tor. It wouldn't be mandatory (after all, Tor has some adequate-ish defenses here at the application layer) but it seems clear that it would do the job better at scale.
@arma Thanks for mentioning this. Anyone who's ever been in the business of mitigating DDOS Attacks will tell you that trying to mitigate DDOS attacks at the application level is a futile task. It would be nice to apply a patch to Apache and make DDOS attacks go poof but it just doesn't work that way. That's why there are companies out there like Cloudflare etc... making millions protecting networks.
It should really be mitigated at the edge of the network but in the case of Tor, that's not possible because it's decentralized. The closest we can get to fixing this would be using firewalls and iptables.
I would love to see people like weasel and others who have been successful at it share their iptables rules and methods here -or anywhere really-, so a bunch of people like me could run and test them and we can finally come up with a standard rule that anyone can use. Because once we figure out the acceptable rules, then it would be as simple as copy and paste and anyone at any level of expertise can copy and paste.
I have a question though. Where did number 50 in `DoSConnectionMaxConcurrentCount=50' come from? Does anyone legitimately need to make 50 concurrent connections to use Tor? Why not 20 or 10 or 3?
I'll also note that weasel has been running some iptables rules in front of tor26 for at least a decade now, to protect against over-eager internet residents, and though they have produced some frustration (e.g. the consensus-health checker thinks it's down often) they have also served him well overall.
yes, and b/c I do have 2 ports at the same ip address, I distinguish between them.
Otherwise the port would not be necessary if no other service is running at that machine.
Metrics show a high number of dropped ntor onionskins. Hardware is a Raspberry Pi 4 with 4GB of RAM, running Raspberry Pi OS (arm64). Adiantum (for cryptographic work) is enabled in the kernel.
Is there anything else I could do? Should I reload or restart it when this happens, or just keep it like that?
UPDATE: it's no longer overloaded. I'll keep watching.
UPDATE 2: it's entering/leaving overloaded state very often.
@pege Thanks for mentioning the Snowflake. I checked my blacklist and sure enough it was caught in it. I've since added it to the whitelist. Are there any other IP addresses for it that I should be aware of?
Update: the solution with recent was not robust enough, the final solution as seen in Git just uses connlimit
The approach with connlimit reduces the pain but doesn't fully address the threat vector. The following approach tries to achieve that by using 2 rules:
Allow max X * new * connection attempts to the ORPort within Y min
Allow max Z connections from the same ip address to the ORport
The implementation is made using the iptables features recent and ipset. The ruleset looks basically this:
oraddr="65.21.94.13"orport=443seconds=300hitcount=12 # both tries 1x per minuteconnlimit=4 # 2 Tor relays at 1 ip address allowedblacklist=tor-ddosname=$blacklist-$orportiptables -A INPUT -p tcp --destination $oraddr --destination-port $orport -m recent --name $name --setiptables -A INPUT -p tcp --destination $oraddr --destination-port $orport -m recent --name $name --update --seconds $seconds --hitcount $hitcount --rttl -j SET --add-set $blacklist srciptables -A INPUT -p tcp --destination $oraddr --destination-port $orport -m connlimit --connlimit-mask 32 --connlimit-above $connlimit -j SET --add-set $blacklist srciptables -A INPUT -m set --match-set $blacklist src -j DROPiptables -A INPUT -p tcp --destination $oraddr --destination-port $orport -j ACCEPT
Caveat: Do not run these iptables command alone! They do only show the idea.
To release blacklisted ip addresses (potentially no longer being owned by the attacker) you should use either a daily cronjob (ipset flush) or implement the ipset feature timeout itself.
If ipset isn't available at your system then the action SET --add-set $blacklist src can be replaced with DROP.
The xt_recent list size default of 100 is too small. Adding xt_recent.ip_list_tot=10000 to the kernel command line or putting options xt_recent ip_list_tot=10000 into /etc/modprobe.conf.d/xt_recent is recommended.
I like to share few numbers here. For 2 relays at the same ip address (port 443 and 9001) each having over 10,000 connections there are about 700 addresses now blacklisted. Those produces 50x more new connection attempts (blocked) to an orport than the remaining 10K systems:
One more thing, in the #stats section you call crontab with non-existing users torproject and tinderbox. It seems to be trying to grep some port number from it, don't know yet what its purpose is.
Oh well, yes, those lines are not needed, they are special to local system.
And enp8s0 has to be replaced by eg. eth0 or whatever the local network card is named.
Commenting in "set +x" at start of the script shows the failing command. Just a guess: maybe " -m comment" doesn't work (then just omit it, it is only decorative).
I'm not familiar with using iptables directly like this, but I'll try to fix this myself later. It's good for you to know, should this ever be documented for other relay operators.
UPDATE: I got it working, but I'm not sure if it's a proper solution for everyone.
ip6tables -A INPUT --source fd00::/7 --destination$ulaaddr-j ACCEPT
fd00::/7 is the ULA range. $ulaaddr is the complete ULA address that points to your server (e.g. fd17:3cb2:d446:0:dea6:32ff:fe9c:c7ca).
@toralf I see that you've been constantly updating your scripts to fine tune it. One thing I noticed is that you removed the iptables rule regarding --hitcount per 300 seconds. Did you do that because it had undesirable side effects or did you do it because it didn't do what you're looking for?
The other question, why aren't we processing the rules for "conlimit" as PREROUTING? That way there are no connections made to begin with which saves memory and time waiting for local ports to close. May be something like this?
iptables -t raw -A PREROUTING -p tcp --dport 443 -j ACCEPTiptables -t raw -A PREROUTING -p tcp --dport 443 -m state --state RELATED,ESTABLISHED -j ACCEPTiptables -t raw -A PREROUTING -p tcp --dport 443 --tcp-flags FIN,SYN,RST,ACK SYN -m connlimit --connlimit-above 2 --connlimit-mask 32 --connlimit-saddr -j DROP
Actually I'm now fine both with the filter logic as well as with the logic to define relay address and port. The point IMO to discuss further is the upper limit for connlimit: 2, 3 or something else?
The recent logic indeed gave no additional value compared to the simple connlimit rule. Instead Tor relays landed in the blocklist and Tor authorities needed to be excluded from blocking. SO I kicked that part off.
The simple truth was: "Do not be too clever on layer 3." Higher sophisticated logic should belong to layer 7 - the Tor application itself. Eg. the question whether to allow a relay to have 2 connections whilst client are only allowed to have 1 and the authorities even are allowed to have more than 2 - all that shall belong to layer 7 IMO.
With PREROUTING I do not have any experiences till now.
@toralf Thanks for the reply. I guess the recent logic will also greatly depend on whether you're a Guard or a middle relay. A middle relay gets a lot of incoming connections from other relays or guards but a Guard is the first point of entry so practically most if not all incoming connections are coming from clients.
Another approach would be using --hashlimit
iptables -I INPUT -p tcp -m tcp --dport 443 -m state --state NEW -m hashlimit --hashlimit 2/min --hashlimit-mode srcip --hashlimit-name tor -j ACCEPT
This will accept only 2 connections per ip per minute and the hashlimit refreshes constantly and we won't have to blacklist an IP for a day. Just enough to stop what they're trying to do until they give up. the hashlimit timeout can be adjusted to whatever you like.
This is just an example and of course needs fine tuning. But it's an idea for those of you who know a lot more than I do to think about and play with.
Here's an example of a harmless test I found on the Internet to see how flexible it can be:
This will allow two pings and then stops for 12 seconds and then let's one ping through. And below is the result:
PING 192.168.168.3 (192.168.168.3): 56 data bytes64 bytes from 192.168.168.3: icmp_seq=0 ttl=63 time=0.432 ms64 bytes from 192.168.168.3: icmp_seq=1 ttl=63 time=0.435 msRequest timeout for icmp_seq 2Request timeout for icmp_seq 3Request timeout for icmp_seq 4Request timeout for icmp_seq 5Request timeout for icmp_seq 6Request timeout for icmp_seq 7Request timeout for icmp_seq 8Request timeout for icmp_seq 9Request timeout for icmp_seq 10Request timeout for icmp_seq 1164 bytes from 192.168.168.3: icmp_seq=12 ttl=63 time=0.460 msRequest timeout for icmp_seq 13Request timeout for icmp_seq 14Request timeout for icmp_seq 15Request timeout for icmp_seq 16Request timeout for icmp_seq 17Request timeout for icmp_seq 18Request timeout for icmp_seq 19Request timeout for icmp_seq 20Request timeout for icmp_seq 21Request timeout for icmp_seq 22Request timeout for icmp_seq 2364 bytes from 192.168.168.3: icmp_seq=24 ttl=63 time=0.445 ms
A middle relay gets a lot of incoming connections from other relays or guards but a Guard is the first point of entry so practically most if not all incoming connections are coming from clients.
Well, not really, these numbers are fram a middle/guard, collected with [1]
+------------------------------+-------+-------+| Type | IPv4 | IPv6 |+------------------------------+-------+-------+| Inbound to our OR from relay | 2033 | 869 || Inbound to our OR from other | 3843 | 52 || Inbound to our ControlPort | 2 | || Outbound to relay OR | 3839 | 70 || Outbound to relay non-OR | 1 | || Outbound exit traffic | | || Outbound unknown | 21 | 1 |+------------------------------+-------+-------+| Total | 9739 | 992 |+------------------------------+-------+-------+
@toralf . My apologies I stand corrected. I didn't realize that a guard could also act as a middle relay.
I guess the question is if a node on your network is infected by a worm, do you isolate the node and fix it or do you let it continue infecting the whole network? By the same token if a guard or relay is just relaying those concurrent connections down the road, would it be reasonable to block the connections? Not sure how it will affect the whole Tor network as a whole if a few relays end up in my block list though, but if it means the difference between keeping my relay stable as opposed to having to reboot all the time, I'd say so be it.
The other approach would be to throttle the connections instead of completely blocking them. The following code will give you an idea of what I'm talking about:
iptables -I INPUT -p tcp --dport 443 --tcp-flags FIN,SYN,RST,ACK SYN -m state --state NEW -m hashlimit --hashlimit-name TOR --hashlimit-mode srcip --hashlimit-srcmask 32 --hashlimit-above 1/minute --hashlimit-burst 3 --hashlimit-htable-expire 120000
Please note that this rule has no jump action so it will just monitor the packets and let's them all through. Adding it will have no effect on your network except it allows you to monitor the connections live by typing
You note a counter on the left starting from 120 seconds and counting down. Each time you get a connection it resets from the beginning until the counter reaches zero, in which case the IP is released from the ban.
If you want it to do anything you should add -j DROP to the end. In that case it allows three concurrent connections initially and then only one connection per minute subsequently.
P.S.
Again I'm assuming your OR port is 443. If not, change 443 to whatever you need.
Yes, I noticed that yesterday, although the limitations you've set are a bit confusing. Right now, you've set the limit to 10/minute for the hash limit with a burst of 9. So they get 9 connections before any action is taken. Then they're added to the blocklist. Up to this point it seems okay but if they're added to the blocklist, why are we wasting resources to keep track of them for 16.5 hours? Then I noticed that your blocklist expires in 30 minutes and now it makes sense why you're keeping track, but then again you confused me because after your blocklist expires and they're out of the jail, you still allow them to connect once every 6 seconds for a whole minute and that doesn't qualify for the 11 concurrent connections they need based on the --connlimit to go back to jail.
I'm not as generous as you are.
I would concentrate on holding them off with the hash limit from connecting to begin with rather than relying on conntrack. Remember, conntrack module doesn't begin tracking a connection until the third step of the 3-way handshake and by then it's too late and cleaning up after those connections take time and resources too.
I'm going to attach two files to give a visual reference for what's happening This from my middle relay under attack
Notice the Netin and Netout. A lot comes in and not as much go out. Some of it is because of --conntrack rules and the blocklist but most of it is because the connection is already made and now Tor is struggling to process them and then you get the Ntor drops and the server overload messages. But as you can see the server is not overloaded. The RAM is well below 50% and the CPU runs at 40%.
Now this image is from my Guard Relay after my iptables rules:
Now the Netout is higher than the Netin (as it should be) and notice the data transfer rate. I'm still pulling about 16-17 M which is my max limit, except this time, I'm relaying legitimate traffic and I'm not wasting resources. And touch on wood, it's running for 11 days and it shows green.
Just an update on my comments and the rules that I've implemented. I've been using your rules with a few modifications and additions. To make them more effective using less resources, I've moved most of them to PREROUTING. I humbly suggest that you try it. Like most people you're using the default table (filter) for your rules. The problem with filter table is that it doesn't understand PREROUTING. The better approach is to move them to mangle. mangle table not only understands all filter rules, it also understands all PREROUTING rules and much more.
I've also added a "connection per second" rule. I think if an IP is trying to connect to your OR port at a rate of 5,10,20 times per second, it's up to no good. I catch most of them PREROUTING and won't even let them try. An ipset with the lifetime of one hour, gathers about 250-300 IP addresses at any given time. It goes something like this:
ipset create tor hash:ip family inet hashsize 4096 timeout 3600iptables -t mangle -I PREROUTING -p tcp --syn --dport 443 -m recent --name tor --setiptables -t mangle -I PREROUTING -p tcp --syn --dport 443 -m conntrack --ctstate NEW -m hashlimit --hashlimit-name tor --hashlimit-mode srcip --hashlimit-srcmask 32 --hashlimit-above 8/sec --hashlimit-htable-expire 3600 -j SET --add-set tor srciptables -t mangle -I PREROUTING -p tcp --dport 443 -m set --match-set tor src -j DROP
It measures the connection per seconds, in 3.6 second intervals. I've been testing this for about 2 weeks now and I frequently run a comparison between the list and a list of current tor relays found at [https://bgp-tools.com/tor-lists/tor_relays.txt] and none of them are relay nodes except a couple of occasions when 2 of them belonged to two relay operators, one running over 100 and the other running over 80 relays. Even then, only one IP out of 100 or 80 relays they're running made the list.
I'm running tor on a VM and have access to the Host and I run these rules on the Host and no resources are used at the VM level. I just add --destination and the IP address of the VM before --dport. And if tor is not on a VM or you don't have access to the host, they can run on the server or the VM just fine at the PREROUTING level.
All your other filter rules can also be moved to iptables -t mangle with PREROUTING option, or without, for uniformity.
P.S.
Just to clarify the hashlimit rule above in case you want to modify it to your liking. I have not mentioned a --burst-limit in the rule therefore the default will apply. The default is 5. Basically the rule has two conditions and if at least one of them is met, a match is made and the IP is added to the list. Condition one is the IP makes more than 8 attempts per second to connect to your OR port and condition number two, if they try to connect more than 5 times within a 3.6 second time period.
My best theory actually is that Tor is obeying the limit: every time it starts processing a new connection, if it's over the limit it closes it.
If I understand networking correctly, there is no need to establish connection to drop it afterwards.
Software is able to look at IP address before accepting connection and if IP is wrong by some criterions, then reject it.
If i understand networking APIs correctly, that's why iptables trick works. But usermode application is notified only after a full TCP handshake - that's why Nmap considers SYN+RST to be "stealth scan".
Sadly, you are right. Generally. Looks like it is possible only with WSAAccept. It is strange that such obvious feature is not implemented in original APIs.
IMO being an exit or not shouldn't make a difference - the OUTPUT chain of table filter has default policy ACCEPT and no filter rule is in place.
The question is rather how likely 2 Tor users behind a NAT are connecting to the same Guard. In that (rare) case - maybe 1:4,000 - the first will made it but for the second Tor client the SYN packets will be dropped and therefore that client has to switch to another Guard.
It shouldn't make a difference when it comes to impacting people exiting, but knowing if we can get a positive net result on an exit with these rules in place, would be useful to know.
In case of multiple devices behind the same NAT, even though they share the same IP, they don't necessarily use the same guard or circuit. Each device acts independently and chooses its own guard and circuits. If one fails, tor simply picks another one and life goes on. Unless they all hard-code their EntryNode to the same guard in torrc, there shouldn't be any noticeable problems.
So let me put in some past perspective from when we designed the DoS subsystem back in 2018. One of the main question to answer was "How many connections from a single IP we allow in parallel before applying defense measures".
Essentially, the "airport" or "Starbucks" effect is that you get let say 200 people connect to Tor from there and so what is the probability for N of them choosing the same Guard. I can't find the results of that but the probability is actually pretty low but if someone wants to run this, beer on me!
And so what is the N value here that we consider close to impossible. Because, if this situation happens, then it is indeed likely that the client could switch to another primary Guard. And this is a situation we want to avoid.
Now, lets throw more madness in the mix: Fallback directories. It is a list of 200 relays that most, if not all these days, clients will poke for directory information (bootstrap phase). And so, what if 200 people start Tor Browser at an airport for the first time, what is the N value now?
All in all, I think we can guess a N value based on math and current consensus weigths and we could simply double that value and it should still be a good value to fight against these kind of DDoS scale we are seeing. And, we could monitor it as the network evolves to learn if the value we suggest to relay operator is too low or too high.
The DoSConnectionMaxConcurrentCount default value was chosen based on the starbucks effect and what if 100 people end up connecting to 1 single relay because their corporate firewall end up only allowing 1 relay IP. Far fetched but on the "Internet", it can be possible. Imagine a country with 1 or 2 ISPs and entire regions are filtered and NATed through a single IP... then things become weird on our relays ;).
I can't find the results of that but the probability is actually pretty low but if someone wants to run this, beer on me!
In a first approximation we can consider people either take the biggest guard (and count toward the DoSConnectionMaxConcurrentCount threshold), or any other (and does not count at all). This turns our problem to a binomial distribution, which makes things easier to compute. This is a lower bound approximation, a very conservative upper bound would be to multiply the resulting probability by the number of guards considered.
Current biggest guard has a guard_probability of 0.0034682825. If we have a list of 200 Fallback directories, which I assume are picked at random (not weighted), this gives us 0.005. I'll only consider P=0.005 as it's the worst case of both.
<number when watching a single fallback> (<conservative estimate watching all>) P((500 0.05) > 5) * 200 is over 1, as I said "conservative upper bound" (in practice it can be interpreted as being likely that multiple fallbacks get unhappy at the same time).
I will assume people stay one hour behind the airport IP, and that there are always 200 person using Tor, at every hour of the day, each day for a decade (that's 87600 samples).
Doing (1 - P)^87600 gives us the probability that no problem happen with the considered parameters. For 200 people, DoSConnectionMaxConcurrentCount=20, over a decade, and using the upper bound, probability of no issue is 1-4.907e-14. For DoSConnectionMaxConcurrentCount=10, probability is 0.870743 (ie, 13% that during at least an hour there will be issues). With 20, the chance of having an issue is infinitesimal. With 10, having at least an error is not unlikely, but that's over a decade and with overly conservative assumptions, so even that parameter seems decent to me.
I neglected the existence of multiple "airport" or "starbuck". I consider a single failure once in a while isn't actually that bad, so if for a single "airport" failure is low enough, collectively "airports" may have a yearly or monthly hour of problem, but it would go mostly unnoticed.
Someone should verify I did not make obvious mistakes. I believe I've made over-conservative assumptions overall, so result are skewed toward "there are issues", not the other way around.
For the large CG-NAT, the simplification/additional constraint made about using only fallbacks fall short, and get results which are not helpful.
In case of multiple devices behind the same NAT, even though they share the same IP, they don't necessarily use the same guard or circuit. Each device acts independently and chooses its own guard and circuits. If one fails, tor simply picks another one and life goes on. Unless they all hard-code their EntryNode to the same guard in torrc, there shouldn't be any noticeable problems.
I tried to make a R script to compute more accurately the probability of not DoS being counted as a Dos. Turns out my upper bound is a lot closer to the actual value than I expected. Sadly the precision is limited to approx 4.5e-13, but I think anything hitting that is fine anyway.
To get live Tor network stats you can use this command:
# you can also use NODE_COUNT=X instead of GUARD_DISTRIBUTION if you want to assume X nodes of equal probability, like fallback relays.$>MAX_CONCURENT=13 GUARD_DISTRIBUTION=proba.csv USER_PER_IP=1000 ./compute.rLoading required package: pmultinommax connection: 13 node count: 2133 user per ip: 1000 prob_ko: 4.653784e-05 prob ok full year: 0.6651911 average interuption per year (hour): 0.4076715
Assuming current network parameters, a thousand Tor users behind a single IP, and a DoSConnectionMaxConcurrentCount of 13, there is a 33% probability that no incident happen over a year. On average there would be 0.40 hours of "too many user from the same IP" per year