A bunch of DoS attacks are essentially ongoing since June 2022 and we discussed a bunch of potential solution to improve things for our users. One thing folks started to experiment with is trying to come up with good iptables rules to help fighting ongoing attacks.
This ticket is for collecting all the information we gathered so far and coming up with some rules we can recommend to our relay operators (and updating our support guidelines accordingly).
Here is a short summary of where we currently are. The discussion started over at tpo/core/tor#40636 and community members implemented tools for generating proper iptables rules:
Going over the feedback at our tor-relays@ mailing list (see for instance the https://lists.torproject.org/pipermail/tor-relays/2022-September/020793.html thread) gives the impression that those iptables rules do work against a part of the DoS at least and getting those into shape and deployed by a large part of our relays might be a good step in the DoS arms race from our side.
Things we should think about for next steps and writing proper support articles could include:
Which of those iptables rules should we recommend?
Should we create a repository for those rules and scripts ourselves or should we point to a community-maintained one?
Should we recommend those for all relay types?
Should we recommend to activate those rules just in case of DoS or should we advocate for generally running them?
In case of generally running them should we try to get those integrated into our Tor packaging work?
One thing to keep in mind is the copy-paste problem: If we publish firewall rules, people will copy and paste them, and then never change them. We need to make it clear to relay operators that they need to stay in the loop around these as they may need to be updated (subscribe to the mailing list, participate in relay operator meetups, other mechanisms?)
I agree. In fact I'm testing a few things and I'll most likely remove a line form the rules. The motto "Tor can't help you if you use it wrong!" applies to everything. But whether copy and paste or other kinds of scripts, there are always people who are going to apply it and forget it.
The idea of pointing people to a repository and asking them to subscribe for updates is probably the simplest and most efficient way, especially if we get other people contributing and bringing additional options to the table.
I'm afraid it's not enough to say "watch this space and apply the changes to your configuration". Remember that 459 relays had to be rejected recently because they were running EOL versions - many people are not actively involved. This needs to be as close to automatic as possible.
Could Tor itself (or a separate Tor organization managed package) contain the firewall rules and provide a way to generate iptables/nftables rules based on the current torrc? You would then only had to explain how to load these rules on boot / Tor upgrade. In fact, Tor could also apply those rules automatically if configured (see libnftnl for nftables).
A crazy thought: Since a lot of network parameters are part of the consensus, could firewall rules be as well? That way Tor could quickly react to attacks.
If we knew that everyone running a relay had no other bespoke
configuration (existing firewall rules, hardening configurations,
disabling kernel modules, etc.) then maybe we could consider doing
something automatic. Unfortunately, this is not something we can count
on and could have very unexpected consequences.
A crazy thought: Since a lot of network parameters are part of the
consensus, could firewall rules be as well? That way Tor could quickly
react to attacks.
Plenty of packages and applications out there apply automatic routes, iptables rules (e.g open ports, add nat, masquerade, etc...) upon installation. Tor is no different. The most efficient way would be for Tor to add a few lines of iptables rules during installation. The rules are ORPort specific and they won't interfere with any other rules people might have. Not to mention that they will be applied to a few thousand relays within a few days with a simple update release.
I don't think bridges were directly affected by ddos.
I do have 2 dozen bridges. None of them shows in their metrics a DDoS. ( OTOH 50% of my bridges - those with "email" or "moat" distribution - do not have any significant traffic at all).
( OTOH 50% of my bridges - those with "email" or "moat" distribution - do not have any significant traffic at all
I suspect that something broke during BridgeDB to rdsys migration, and some bridges are not being distributed even though they are functional.
Could you open a ticket here https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues so we can start investigating this issue? TY! <3
Thanks, this thread was very much needed. Based on my observations the nature of the ddos attacks have changed within the past 4 or 5 days. Haven't had a chance to investigate it but it has changed. I used to have about anywhere between 500-600 IP addresses in my block list on a rotating 12 hour cycle, only 1% of which were relays.
As of right now, I have 1024 IPs, 98 of which are relays. That's 9% give or take. This tells me that the focus of the attack has moved from Guards and servers with HSDir flag to all relays. This makes implementation of some sort of iptables rules on all relays of the utmost importance. Because as is, the relays are passing all these packets and concurrent connections to the rest of relays in their circuits. My relays were running for 27 days with green status using 2.6 GB of the allocated RAM. It started to get overloaded 4 days ago and today one of them was pushed all the way to 7GB of RAM and 3.5 CPU and somehow TOR restarted with the PID of 1816801
As for Bridges, they don't seem to be the focus of the attack yet. I checked about 200 top bridges, and found only 3 of them showing the overload status which tells me they're running okay. I have no problems with my relays either. At least not yet.
There's also another suggestion. I found that one of the main goals of the attack was to shut the systems down and or force them to reboot so they lose their HSDir status. Frankly if I wanted to control a good portion of HSDir servers, I'd set up a bunch of servers and make sure they are stable, then I would prevent others from getting the flag. That way a good portion of people would come to me.
Currently after each reboot, it takes 4 days to get the HSDir flag and most servers under this attack hardly last a week which means most people can hold the flag for maybe 3-4 days a week. I think the HSDir flag should be granted faster and frankly it shouldn't be forever. We need to find out how many HSdir servers are optimal and grant them at random like a lottery and then have a period of let's say a week and then take it away and give it to another group. This avoids centralization. There are 2 ways to stop an attack, one is defense, the other is to find the reason for it and stop them from reaching their goals.
The iptables rules that I posted are based on @toralf scripts. I've made some tweaks though and made them a bit easier on CPU. But most importantly - I think - I made them easy. If we want everyone to use this, we shouldn't ask them to take a course before doing it. Copy, paste makes everything popular. I'll be glad to add or remove anything based on suggestions and I don't care if you want to move the whole repository somewhere else either. Take it, use it, make it popular so we can all breathe easier.
There's also another suggestion. I found that one of the main goals of the attack was to shut the systems down and or force them to reboot so they lose their HSDir status.
Can you tell us how you came to the conclusion that an attack has moved to "HSDir flag relays"?
Frankly if I wanted to control a good portion of HSDir servers, I'd set up a bunch of servers and make sure they are stable, then I would prevent others from getting the flag. That way a good portion of people would come to me.
What is the end goal of the attacker here to do this?
We need to find out how many HSdir servers are optimal and grant them at random like a lottery and then have a period of let's say a week and then take it away and give it to another group.
This is can have dangerous consequences. The "uptime" requirement for HSDir was mostly set so we have some assurance that the relay is "stable". In an ideal world, all our relays would be stable and all of them would be HSDir. This expands the hashring which is overall a good thing to avoid targeted attacks.
But, since onion service v3, the position on the hashring can't be predicted due to the shared random value changing every 24 hours. And thus this uptime requirement becomes important to be at least above 48h (live through no consensus for 24h).
Relays losing the HSDir flag isn't problematic per-se as the client/service will just continue onto the hashring. What can be problematic is if a very few amount of relays compose the hashring which is what you are proposing in a sense.
All in all, from our analysis of the multiple DoS attacks going on, none of those had the goal to make relays loose the HSDir flag as HS directory attacks are way less potent then other much simpler ones.
Can you tell us how you came to the conclusion that an attack has moved to "HSDir flag relays"?
I didn't say it's moved that way. Destabilizing the servers has always been the goal and losing the HSDir flag is a side effect.
I said the current attack has changed in nature. Starting from Sep. 29. This current attack is most likely not even aimed at us. My guess is that someone, somewhere is being attacked and we're just zombies relaying it.
What is the end goal of the attacker here to do this?
I'm not claiming to be very knowledgeable in all things Tor but correct me if I'm way off base here please. Isn't the job of the HSDir to point the client to the location of the onion service? Would it be beneficial for an adversary to have control of the Directory? Again I may be wrong in my assumptions which is why I'm asking.
The "uptime" requirement for HSDir was mostly set so we have some assurance that the relay is "stable".
The problem is that with these attacks, even if a server starts stable, by day 4, it certainly is not and very close to the end of its life especially if it's running on 2 CPUs and 2 or 4 GB of RAM which includes a very good portion of our relays.
from our analysis of the multiple DoS attacks going on, none of those had the goal to make relays loose the HSDir flag
Understood and I take your word for it because I don't have access to those analyses so I have to reach conclusions based on my limited observations. Perhaps knowing what those analyses indicate could help us develop better methods of mitigations at least for our own servers.
Nevertheless, I apologize for derailing this thread because this was supposed to be about iptables and how to deal with promoting them or not.
nftables is the successor of iptables, it allows for much more flexible, scalable and performance packet classification. This is where all the fancy new features are developed.
nftables interprets all the iptables rules just fine so the provided scripts will work regardless of which one you have.
By the way the first two options in the above chart will tell you everything you need to know about why Tor's built in DoS mitigation is not working well.
The next one is XDP but that's beyond the skills and resources of the average Joe and not so average Joe.
Yes, at least since Debian buster. That's why I've been trying to learn nft since bullseye.
But I think that the Cloudflare package masters used the appropriate kernels for their benchmarks. In 2018, Debian stretch (with iptables in the kernel) was stable.
Damned I'm amazed and realize how fast the development of the GNU/Linux systems is. Debian releases
Georg Koppenchanged title from Provide a recommended set of iptables rules to help in case of DoS attacks to Provide a recommended set of iptables/nftables rules to help in case of DoS attacks
changed title from Provide a recommended set of iptables rules to help in case of DoS attacks to Provide a recommended set of iptables/nftables rules to help in case of DoS attacks
My relays are currently running with a variant of these nftables rules (inspired by @toralf iptables scripts). It doesn't stop the attack though, only slows it down - I'm still getting the infamous "Your computer is too slow" message from time to time. It does however catch 200M+ of packets a month which helps with the load considerably. Suggestions welcome.
Click to expand
#!/usr/bin/nft -fflush rulesetdefine ssh_ips = { ... }define ports = { 443 }table netdev firewall { set tor_authorities { type ipv4_addr flags constant elements = { 128.31.0.34, 131.188.40.189, 154.35.175.225, 171.25.193.9, 193.23.244.244, 194.13.81.26, 199.58.81.140, 204.13.164.118, 45.66.33.45, 66.111.2.131, 86.59.21.38, 193.187.88.42 } } set bogons { type ipv4_addr flags constant, interval elements = { 0.0.0.0/8, 10.0.0.0/8, 100.64.0.0/10, 127.0.0.0/8, 169.254.0.0/16, 172.16.0.0/12, 192.0.0.0/24, 192.0.2.0/24, 192.168.0.0/16, 198.18.0.0/15, 198.51.100.0/24, 203.0.113.0/24, 224.0.0.0/3 } } set meter_tor_authorities_v4 { type ipv4_addr size 65535 timeout 2d flags dynamic } set meter_flood_v4 { type ipv4_addr size 65535 timeout 2d flags dynamic } set meter_flood_v6 { type ipv6_addr size 65535 timeout 2d flags dynamic } set blacklist_v4 { type ipv4_addr size 65535 timeout 3d flags dynamic } set blacklist_v6 { type ipv6_addr size 65535 timeout 3d flags dynamic } chain tor_authorities { # Limit packets per minute add @meter_tor_authorities_v4 { ip saddr limit rate over 3/minute } counter log prefix "Tor authorities drop:" drop } chain ingress { type filter hook ingress device eth0 priority -500; policy accept; # Drop all fragments ip frag-off & 0x1fff != 0 counter drop # Drop bogons ip saddr @bogons counter drop # Drop Christmas tree packets tcp flags & (fin | syn | rst | psh | ack | urg) == { fin | syn | rst | psh | ack | urg } counter drop # Drop packets with invalid flags tcp flags & (fin | syn | rst | psh | ack | urg) == 0 counter drop # Drop packets originating from blacklisted IPs ip saddr @blacklist_v4 counter drop ip6 saddr @blacklist_v6 counter drop # Handle Tor authorities in a separate chain to avoid further processing tcp flags == syn tcp dport $ports ip saddr @tor_authorities counter goto tor_authorities # Limit SYN packets per minute, add offenders to blacklist tcp flags == syn tcp dport $ports add @meter_flood_v4 { ip saddr limit rate over 3/minute burst 1 packets } add @blacklist_v4 { ip saddr } counter drop tcp flags == syn tcp dport $ports add @meter_flood_v6 { ip6 saddr limit rate over 3/minute burst 1 packets } add @blacklist_v6 { ip6 saddr } counter drop }}table inet firewall { chain input { type filter hook input priority 0; policy drop; # Accept loopback traffic iifname lo accept # Drop invalid packets ct state invalid drop # Accept established, related (TODO: Do we need related?) ct state established,related accept # Drop new non SYN packets ct state new tcp flags != syn counter drop # Accept IPv6 ICMP traffic icmpv6 type { nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert } accept # Accept SSH ip saddr $ssh_ips tcp dport 22 accept # Accept traffic to Tor ports tcp dport $ports accept counter } chain forward { type filter hook forward priority 0; policy drop; }}
I've nearly the same netdev ingress rules. Inspired from nftables-hardening-rules-and-good-practices
Seems the IP FRAGMENTS rule is very useful against the ixgbe driver attacks.
You have policy accept on chain ingress like me. The Tor Dir authorities set make no sense for me. Hmm, why should I limit them?
I tried out conntrack last time. But conntrack_max=500k or 1mio reduces my network throughput by 40-50% :-(
On my small servers (only 1G interface) I have dynamic nft blocklist:
Click to expand
table inet mangle { # List of ipv4 addresses to block. (~# nft list set inet mangle blocklist_v4) set blocklist_v4 { type ipv4_addr size 65535 flags dynamic } # List of ipv6 addresses to block. (~# nft list set inet mangle blocklist_v6) set blocklist_v6 { type ipv6_addr size 65535 flags dynamic } chain prerouting { type filter hook prerouting priority -150; # CT INVALID ct state invalid counter drop # TCP SYN (CT NEW) tcp flags & (fin|syn|rst|ack) != syn ct state new counter drop # For the first packet of each connection (ie. packets matching ct state new), this adds an entry into blocklist set # https://wiki.nftables.org/wiki-nftables/index.php/Meters#Doing_connlimit_with_nft ct state new add @blocklist_v4 { ip saddr ct count over 50 } counter packets 0 bytes 0 drop ct state new add @blocklist_v6 { ip6 saddr ct count over 50 } counter packets 0 bytes 0 drop }}
Hhm, 50 conns per ip seems to high IMO.
Does the 50 came from the DDoS limit for the circuits ? Because circuits are not TLS connections.
May I ask if https://github.com/toralf/torutils/blob/main/ddos-inbound.sh runs out of the box at your system and how many ips do you have with 50 connections ?
Hhm, 3 min seems a long time, but well, for a lot of connections ...
But cool, that the ddos scripts works and helps you.
BTW I reduced the limit further from 10 to 4 and do play now with this rule set [1].
Just a visual update. The attacks seem to have calmed down, at least for now.
This is a chart of the iptables at work during the attack's peak. Notice the netin/netout ratio. My MaxAdvertisedBandwidth is 18 MiB but obviously the attackers don't care. The netin at the height of it reached 25.75 M but the iptables rules blocked most of the brunt
This is how the system is running after the attack. Both my relays are back to green status, at least for now. The RAM recovered back to 1.92 GB for one system and 2.68 on another one. and the system is running just fine.
I did have one Ntor drop of 1.21% on one system and none on the other which generally recovers within one or two Hearbeats and the system goes back to Green status. Before the iptables, my Ntor drops ranged between 26% - 38%.
What came into my mind:
A DDoS protection at layer 3 shouldn't invalidate any (heuristic ?) DDoS rules at layer 7.
Said that if malicious traffic gets blocked "well" then the remaining traffic might probably not generate "enough" noise for Tor to detect them.
Glad it's working for you. Would you be able to provide any stats as how many IPs are being blocked by this rule? The advantage of an ipset is that you can see the blocked IP addresses and you'd be able to figure out if any legitimate connections are blocked or not so you can fine tune the rules. So any kind of stats you can provide would be very helpful.
Another reason I'm interested to know is because I have a hashlimit rule that traps IPs connecting at above 3/s and it barely holds 10 IPs at any given time. At 7/s I'm assuming the numbers would be even less. Usually when a flood of ddos starts, within the first 5-10 minutes the conntrack rules block more than 4000 IP addresses, so I'm not sure how blocking 10 out of 4000 could have such an effect. Perhaps I didn't interpret your rules properly or may be I'm not implementing my rules correctly which is why I asked if you could provide some kind of stats.
Also, IMHO for people to use your rulesets, it would be nice to actually have the relevant iptables rules so they can apply them individually and on top of their existing rules as opposed to having to import the whole iptables-save rules which wipes and replaces all their existing rules.
I had logging enabled for a short timeframe (but it is disabled by now). Some IPs/subnets really stood out so I am quite confident I don't do any overblocking. Some insight how many unique source IPs got blocked (note however that I didn't log them again if they sent another SYN while they were blocked):
As noted in the repo, I indeed only block after 7 SYNs in one second and that seems to be effective enough. The IP is then remembered and blocked for 60 seconds. The attacker will reset the timer should they send another SYN in that timeframe.
I will implement your feedback about converting the iptables-save rules to actual iptables rules later today.
Just an update:
The version 3.0.0 of the scripts is now ready. I've made some modifications specially due to the new nature of the attacks.
Added some throttling to reduce the danger of SYN floods against the conntrack table. It will also reduce the pressure on the system.
The chances of relays, specially the ones with two ORPorts getting caught in the block list are greatly reduced.
Added scripts specifically for cron jobs to check the TOR relay list and remove any relay caught in the block list regularly.
The new rules have been tested on two of my relays for about 20 days with no ill effect. The systems run smoothly at a very steady RAM usage with RAM spikes during major attacks which recovers in about 10 minutes back to the previous level.
The systems ran for about 27 days with green status only to be restarted for system updates.
If you're already using the scripts, please update to the new version. You only need to run one script for update without the need to reboot or to restart Tor and you'll keep your existing block list intact.
If you're not using the scripts, please give it a test run if you can and give some feedback so they can be improved.
Now, where the ipset&iptables solution here works fine for OSI layer 3, I do observe within the Tor application spikes in the ntor_v3 values of 3 relays running here at the same ip. Those spikes correlates with a CPU usage > 1005 for a single Tor process. I do wonder if the behavior is common or DDoS ? :