Reach a security decision about version-fingerprinting on network documents
Often we would like to limit which network directory documents we process. If we want to reject currently generated documents, that's a compatibility issue. But sometimes we would like to reject documents that no current well-behaved supported Tor implementation will generate.
The question is: How safe it is to do so?
Let's consider the long-deprecated "opt" keyword line prefix. Right now, our implementations accept "opt" in directory documents, but nobody generates it. If we were to start rejecting "opt" tomorrow, then there would be possible network documents that our current implementations reject, but which older versions do not.
An adversary could use this different to generate a document which only some parties would accept, and use this property to partition the client set or something.
Let's consider an even harder-to-avoid situation. Suppose we add a new keyword "serenity" to some network document, and say that it takes a boolean argument. After this change, implementations will start rejecting serenity foo
, since foo
is not a boolean. Older implementations will still accept serenity foo
, however, since unrecognized keywords are ignored.
Thus, our current usual practice for adding keywords means that every time we do so, we introduce another version partitioning.
Now let's analyze the impact of these attacks.
As far as I can see (argument omitted), the only documents where these attacks matter are for a bridge's relay descriptor, and for a HsDesc.
With a bridge's relay descriptor, a bridge can, over time, use these attacks to learn the exact versions of each their clients. This is probably not a super fast attack. I'm not sure how bad the impact is, either: the bridge can already partition their clients by IP.
With an HSDesc, the attack seems stronger, but I'm having a hard time analyzing it. In the strongest version I can think of, a hostile operator can upload a different HsDesc to each of their 6 HsDirs, and put different introduction points in each. They can then partition their clients into up 6 six sets based on implementation versions. This
Mitigations
-
We can mitigate these attacks somewhat by ensuring that they do not fail silently. If a bridge operator or HSDesc operator knows that the clients that do reject their netdocs will get loud errors, and identify the party that's doing these attacks, then their ability to mount these attacks silently will decrease.
-
Network health and Anticensorship could probably do some kind of scanning to help with the bridge descsriptors; scanning isn't possible for HsDescs.
-
We could continue to encourage more rapid updating by clients and relays.
-
If we want, we can use consensus parameters to manage when implementations change their behavior. (This approach adds complexity and delays changes.)
-
We could do a two-stage process where at first we warn on newly invalid netdocs, and only later do we reject them. (The switch-over could be based on a version, or on a consensus parameter.)
Conclusion
So, what do we think here?
The first thing to decide is, "how bad is this kind of attack?" To my current analysis it seems "not so bad", but I don't think I really have a good intuition for how bad the contemplated HsDesc partitioning is.
The second thing to do is make recommendations based on what we decide.