Add a flag to disable Tor's DNS cache
Improving DNS privacy on Tor exit relays
Tl;dr we would like to disable Tor's internal exit relay DNS cache to mitigate correlation and timeless timing attacks. With this post we would like to start the conversation on how this can be done.
Preface
DNS caching has many advantages, such as decreasing DNS response latency, decreasing the amount of outgoing connections and increasing system efficiency. Aside from performance and speed, DNS caching also has a significant privacy advantage. By reducing the number of outgoing DNS queries, fewer domain lookups are leaked outside the DNS server. This increases user privacy and adds some resistance to correlation attacks, in turn making it harder for adversaries to depseudonymize Tor users.
On the other hand, utilizing DNS caching (even when adding all the domains used in the world) adds another attack vector that must be considered: timeless timing attacks [1]. An adversary is able to determine when a domain is no longer cached by repeatedly querying for the DNS record, effectively making it possible to find out when the DNS record was first requested by a Tor user with some granularity [2].
In a world where:
- The NSA targets Tor operators, the Tor network infrastructure/relays and Tor users [3] on a structural basis.
- Global surveillance systems like XKeyscore [4] and its brothers and sisters [5] provide detailed insight (when needed) to most of the internet's traffic, no matter where your autonomous system is situated.
- Developments and improvements on global surveillance systems have been mostly unknown to the public for more than ten years now (i.e. current scope, capabilities and retention have significantly improved).
One sufficiently paranoid can reasonably assume the vast majority of outgoing DNS queries can be monitored on a real-time basis, adding unnecessary risks to Tor users. To mitigate these risks, a DNS setup mitigating both correlation attacks and timeless timing attacks would be ideal.
1 Improved DNS setup
The ideal solution would be a DNS implementation that utilizes DNS caching extensively and is 100% resistant to both correlation and timeless timing attacks. But ideal doesn't exist, so we have to settle for suboptimal. In this case these suboptimal performance indicators would be "the cache hit rate is as close to 100% as possible" or "the amount of cache misses is as close to 0 as possible", while removing the relation between Tor user's DNS lookup requests and outgoing DNS queries. For this, Tobias and I came up with a setup that:
- Increases and randomizes TTL values.
- Uses extensive preloading of DNS records.
- Utilizes automatic refreshes for a long period of time.
This setup implements the DNS cache defenses at the DNS layer rather than in Tor. Having the option to implement these defenses independently of Tor will make it work with Arti in the future too and enables for improving or adding new defense layers with relative low effort.
1.1 Longer and random TTL
One of the objectives is to make the cached DNS records less predictable. TTL values are public by nature, so the idea is to ignore public values and set our own TTL values. As an example, this can look something like:
- if ORIGINAL_TTL < 300 seconds then TTL = RANDOM(600-900 seconds)
- if ORIGINAL_TTL > 300 seconds then TTL = RANDOM(3000-4000 seconds)
Assigning a random TTL value is actually somewhat similar to the inner workings of Tor's internal DNS cache [5][6]. This way the original TTL value and the TTL value in our DNS servers don't correspond, making it significantly harder for adversaries to correlate a DNS query to a user's DNS lookup request. This is especially important in the scenarios where a person (e.g. a dissident) visits an obscure domain that's not part of the cache already.
1.2 Extensive preloading
Every day, a list of domains will be created based on the Cisco Umbrella 1 Million, Cloudflare Radar Top 1,000,000 Domains and Tranco Top Sites 1,000,000 Ranking. These lists are merged, duplicates are removed and www subdomain counterparts are added. This list is then preloaded for both IPv4 A and IPv6 AAAA records in the DNS cache.
These domains get visited very frequently and by preloading them, the user's DNS lookup requests for these domains are completely disconnected from the outgoing DNS queries. Preloading this many domain names should increase the DNS cache hit rate considerably.
It's important to preload this list every time the DNS cache starts out empty (e.g. after a server reboot), otherwise millions of DNS queries (the first for each record) would be exposed.
1.3 Automatic refreshes
From the moment a domain is looked up once, the domain will be continuously refreshed close to the end of their respective TTL cycle (with a randomized offset) for a long retention period (e.g. 7 or even 30 days). Every time a domain's record is requested, the 'purge timer' will be reset to zero. If a domain's record has not been requested for the retention period, it gets purged from the cache to not let it linger around indefinitely . These same mechanics of course also apply to the daily preload lists as well.
Many domains have short TTL values, meaning that a DNS cache will only cache them for a limited amount of time (e.g. 5 minutes). A DNS lookup request after the TTL value expired will result in another outgoing DNS query. But with automatic refreshing of DNS records at a random time before they expire, this won't happen again (at least not until they are purged after the retention period).
2 Shortcomings
There may be considerable advantages to such a DNS setup, but that doesn't mean there aren't also shortcomings:
- DNS cache misses still exist.
- Domain groupings.
- Required infrastructure.
- It doesn't exist yet.
- Only A and AAAA records.
- No visited domains in preload list.
2.1 DNS cache misses
DNS lookup requests that result in DNS cache misses still exist. This is especially true for obscure domains and it might be especially obscure domains that impose the greatest risk for Tor users. This risk is partly mitigated by the longer and random TTLs though. It's impossible to preload all used domains in the world because this information is only known to the TLD registries (and sometimes a handful of researchers). Also see 2.6.
2.2 Domain groupings
Long live the modern day internet, where one visit to a website could result in DNS lookup requests for 100+ other domains. And each of those domains could be detrimental to the user's privacy, if they weren't preloaded and/or cached already. This is a problem that is not easy to fix, but one method would be to visit the preload domain list with a headless browser, log all the domains requested and add them to the preload list. This might be a good 'phase 2' project ;-).
2.3 Required infrastructure
One big drawback is that this DNS setup requires considerable system resources. This might not be a problem for large scale relay operators, but may be difficult to manage for smaller scale relay operators. That said, the 6 biggest exit relay operators together provide almost 50% of the exit relay consensus weight. Assuming DNS queries are proportionate to this percentage, then a large part of the Tor network could benefit from such a setup. Also smaller scale relay operators could still use the DNS servers of large scale relay operators, as this is actually preferable according to a research paper about the effect of DNS on Tor's anonymity [7].
The impact on the world wide DNS infrastructure should be negligible. Even the whole of the Tor network's DNS queries combined is probably not much more than a rounding error in the DNS.
2.4 It doesn't exist yet
The improved DNS setup doesn't exist yet and currently there is no piece of software that's able to do this out of the box. Fortunately PowerDNS (the DNS frontend and backend we use) is highly extensible so this is something we want to invest in if we get the green light from the Tor Project about an option to disable Tor's internal cache. There also is a chance we can get a small subsidy for the changes to PowerDNS, which would help out greatly.
2.5 Only A and AAAA records
In this setup only A and AAAA records are preloaded to the DNS cache. It may be beneficial to also add other DNS record types such as MX records in the future.
2.6 No visited domains in preload list
Extensive preloading based on top 1,000,000 domain lists will provide a good preloading base, but obscure domains that were recently visited won't be part of the preload list. When the DNS cache is emptied (for example after a server reboot or service restart), all the obscure domains are gone as well and new DNS lookup requests will result in DNS leaks again.
A potential solution for this is to make frequent 'snapshots' or backups of the DNS cache, and 'insert' it to the DNS cache after it is emptied. This also might be a good addition to tackle in a 'phase 2' project. In any case, it's paramount to address privacy and security risks when such DNS cache exports will be created.
3 Changes to Tor
In order for this DNS setup to work, Tor's internal DNS cache must be disabled. This cache is hardcoded so cannot be disabled by some flag. Let's start a conversation on what's the best method to disable this cache.
I already spoke with Arma and Tobias about this and they came up with a starting point:
Arma:
"I think there's a data structure that tracks both pending (launched but not yet received a response) and successful (received a response and will use it again for a while) DNS answers. So for example, if Tor gets a stream begin request for foo.com, and sends a request to its resolver about foo.com, and then it gets another stream begin request before it has received that response, should it generate two? Or just wait until the first answers and then use that answer for both. If 'wait until the first answers' is enough, I think that a patch might be pretty simple, just, once we get an answer from the resolver, use it and then remove it from the data structure."
Tobias:
Yes, this is a great suggestion. Actually, this part of the cache is really important, because FF/TB ends to build many concurrent connections to the same destination. As Roger suggests: after resolve, when the TTL of the entry in the cache is determined (check where https://gitlab.torproject.org/tpo/core/tor/-/blob/main/src/core/or/connection_edge.c#L500 is used), set the TTL to like 1s, or just delete it, and done. *
Is this indeed a viable approach? Let's discuss!
Also it might be a good idea to evaluate the performance impact of such a change to busy exit relays. Better safe than sorry. I suspect the impact will be small because 1) modern DNS caches add latency in the miliseconds and 2) the limited amount of per-relay bandwidth (and as a result the limited amount of DNS lookup requests) that can be achieved with Tor's current architecture severely decreases the cache in size and as a result its cache hit rate.
Thanks in advance for reading/replying,
Nothing to hide
[1] https://www.usenix.org/system/files/sec23summer_458-dahlberg-prepub.pdf
[2] Tor has some defenses in place here, but it's a can of worms
[3] https://daserste.ndr.de/panorama/aktuell/nsa230_page-1.html
[4] https://en.wikipedia.org/wiki/XKeyscore#cite_note-NDR-5
[5] https://en.wikipedia.org/wiki/Global_surveillance
[6] https://gitlab.torproject.org/tpo/core/tor/-/blob/main/src/core/or/connection_edge.c#L500
[7] https://gitlab.torproject.org/tpo/core/tor/-/blob/main/src/core/or/connection_edge.h#L193
[8] https://nymity.ch/tor-dns/tor-dns.pdf