A paper, and corresponding video was published in ACM in May of this year called, "HSDirSniper: A New Attack Exploiting Vulnerabilities in Tor's Hidden Service Directories"
Here is a summary:
Tor hidden services (HSs) are used to provide anonymous services to users on the Internet without revealing the location of the servers. However, existing approaches have proven ineffective in mitigating the misuse of hidden services. Our investigation reveals that the latest iteration of Tor hidden services still exhibits vulnerabilities related to Hidden Service Directories (HSDirs). Building upon this identified weakness, we introduce the HSDirSniper attack, which leverages a substantial volume of descriptors to inundate the HSDir's descriptor cache. This results in the HSDir purging all stored descriptors, thereby blocking arbitrary hidden services. Notably, our attack represents the most practical means of blocking hidden services within the current high-adversarial context. The advantage of the HSDirSniper attack lies in its covert nature, as the targeted hidden service remains unaware of the attack. Additionally, the successful execution of this attack does not require the introduction of a colluding routing node within the Tor Network. We conducted comprehensive experiments in the real-world Tor Network, and the experimental results show that an attacker equipped with a certain quantity of hidden servers can render arbitrary hidden services inaccessible up to 90% of the time. To ascertain the potential scope of damage that the HSDirSniper attack can inflict upon hidden services, we provide a formal analytical framework for quantifying the cost of the HSDirSniper attack. Finally, we discuss the ethical concerns and countermeasures.
HSDirSniper seems to work by utilizing two vectors in HSDirs to block onions:
Indiscriminate Descriptor Acceptance: HSDirs do not verify if they should receive a particular hidden service's descriptor. Attackers exploit this flaw to flood the HSDir's cache with malicious descriptors.
Simplistic Descriptor Aging Mechanism: HSDirs purge old descriptors hourly to manage memory. If attackers overload the cache within an hour, the HSDir purges all descriptors, including legitimate ones, making the targeted hidden service inaccessible.
It does this by creating a large number of hidden services (and thus a large number of descriptros).
In their PoC, they got a HS to upload descriptors to 16 controlled HSDirs, each with 2GB of RAM. They used 5 malicious hidden services to launch the attack, maintaining a 60MB/s bandwidth in uploading descriptors. Optimizing with circuit multiplexing and high-quality guards, they got an average attack duration (time to overload an HSDir and trigger the purge) of 4.865 minutes.
It appears the median time interval for descriptor re-uploads is 68.9 minutes, meaning that it takes about 68.9 minutes for a hidden service to upload a fresh descriptor after the previous one is purged.
Combining the attack duration and re-upload interval, the study indicated that the targeted hidden service remained inaccessible for about 92.9% of the time during the attack (this is only based on the test setup they used, they speculate that with an attack setup with more resources, they could be more effective)
micahchanged title from HSDirSniper: A New Attack Exploiting Vulnerabilities in Tor's Hidden Service Directories to HSDirSniper DoS: overwhelming HSDirs
changed title from HSDirSniper: A New Attack Exploiting Vulnerabilities in Tor's Hidden Service Directories to HSDirSniper DoS: overwhelming HSDirs
Yeah... this is something that we knew from the get-go :(...
The mitigation they propose is plausible except for the fact that, afaict, their evaluation of the mitigation is based on the assumption that the HSDir and service share the same consensus making it that they can calculate the very same hashring. That is unfortunately not true. How relay churn can affect their results, that is a whole new ball game.
I guess question now is what to do next... We can ask them to incorporate the above in their calculation maybe (?) but since their paper is published, I wonder how much more time they will put in :).
tldr: I think that their mitigation isn't the best choice, but for different reasons. I suggest other strategies below.
Indiscriminate Descriptor Acceptance: HSDirs do not verify if they should receive a particular hidden service's descriptor. Attackers exploit this flaw to flood the HSDir's cache with malicious descriptors.
I don't understand how mitigating this particular would necessarily be helpful:
Let's suppose that we can magically make it so that an HsDir never accepts any descriptors that are not at the right position in the hash ring.
The attacker then could just adjust by constructing a bunch of descriptors that are in the right position for that HsDir, right? They analyze this in section 8.2, but they seem to assume that the attacker needs to generate full descriptors and see if they are rejected... when in reality, the attacker just needs to generate a bunch of supposedly-blinded Ed25519 keys. This would take some effort, but they could optimize, parallelize, and precompute it pretty well, and get a faster result than they report in 8.2. (They can make it go even faster if they use techniques like the ones we recommend for vanity onions.)
On the whole, I would think think there are better approaches than this one.
their evaluation of the mitigation is based on the assumption that the HSDir and service share the same consensus making it that they can calculate the very same hashring. That is unfortunately not true. How relay churn can affect their results, that is a whole new ball game.
I think we could salvage this part of mitigation if we decide it's useful (but I am not sure that it is.) As they discuss in their section 6.2, you can have this mitigation work (probabilistically) if you assume that the HsDirs allow a larger amount of upload spread than the actual amount used.
That is, if the HS uses a spread of 4, they suggest that the HsDir should tolerate a spread of 8. (They analyze this approach in section 8.5)
I think we could do the following mitigations very easily, if you think they'd be useful:
Impose a maximum descriptor size, via the consensus, that is smaller than the 50k they say they are using.
If we currently allow multiple descriptors to be uploaded per circuit, we should stop doing that. (If I'm reading their section 8.2 correctly, they think that we do allow this, but they didn't do it for their attack, which is surprising to me.)
We could look for more efficient ways to store descriptors. (Right now, they say, we store them with something like 70% overhead.)
These mitigations would take more effort:
We could implement their proposed mitigation of 6.2, and investigate a safe value of D. As above, I am not convinced that this would help.
We could look for a replacement for our current cache ejection algorithm.
Maybe instead of purging everything, we could eject a random fraction.
Maybe we could purge the entries that have been downloaded the fewest number of times
Maybe we could design and introduce a new POW mechanism: when uploading, an onion service could include a POW in the HTTP POST request. When purging the cache, relays could prefer to keep entries that had included a POW.
Other, private/temporary Onionprobe instances can also be used if there
are specific addresses to monitor.
How to address this?
We can't drop everything else to work on this. Maybe in March.
Maybe decreasing the descriptor size limit will just increase a bit the
attack costs, but it can still be mounted.
Sidenote
I also happen to be thinking about HSDir DoS issues for a while, and
suggested it as a topic for Project 161 or upcoming grants,
among many other ideas.
Sadly, it is not possible to remotely learn how many descriptors a relay holds.
It is something we can export on the MetricsPort for sure so a spike on the graph could tell the operator something is up but not much more else I think.
It is something we can export on the MetricsPort for sure so a spike on the graph could tell the operator something is up but not much more else I think.
Yep, I was referring exactly to MetricsPort. In case of detection that a descriptor for a given service is consistently absent, it would allow contacting the corresponding relay operator and try to correlate with some descriptor metrics.
First, the DDOS metrics reported by the Tor MetricsPort did not show anything indicating an attack, so the attack was not using a well-known method. [...]
Second, the number of active circuits tracked via the Tor MetricsPort did not increase on our attacked relays. [...]
In contrast, the data logged via Tor’s MetricsPort did not even show that an attack was taking place. Consequently, our relays never reported themselves as overloaded to the Tor network, making this attack invisible to observers relying on the information gathered by the Tor network [...]
Currently, for HSDir relays, the available metrics in MetricsPort are limited to:
Would it be possible to introduce additional metrics to MetricsPort that could provide operators with better indicators of an ongoing HSDir-specific attack?
How would you do that with the MetricsPort on a relays reporting the amount of descriptor they have?
Suppose one suspects of specific services being under HSDirSniper attack.
This list of service could be fed in Onionprobe to let it compile metrics on descriptor availability.
If a pattern is found, it could be possible to contact the operator whose relay hosted the descriptor during the attack. If this relay stored descriptor metrics, the attack existence could be confirmed.
I understand that this analysis might be difficult to make, since it's full of "ifs" and up to chance of whether a given relay has a responsive/trusted operator that was collecting metrics in advance.
In this sense, the use case Gus just mentioned sounds more palpable, as it would allow relay operators to spot whether this attack is happening (but would not allow to relate with specific services).
I think this is good to merge, but after release we should update the spec and/or make an arti ticket so that arti's hsdir impl does not lose track of this defense.