First, allowing parties to send unexpected ignored traffic opens side channels, so we must do something about it. And secondly, you must be Jon Postel for his Law to apply, and you're not. And thirdly, Postel's Law is more what you'd call "guidelines" than actual rules.
To constrain side-channels, @mikeperry thinks and I agree that we should be more restrictive about accepting unrecognized cell types and formats.
We should figure out how this interacts with proposal 325 (packed cells), and we should make sure that we don't write something that precludes future extensions to the protocol.
My intuition favors an approach something like this, though of course it might be wrong:
Exits accept any well-authenticated cell that the client sends.
Clients and onion services reject anything that they do not recognize.
To add new relay message types in the future, we can declare that the exit may only send them in response to the client sending them first (or some other signal). We can declare that onion services must advertise that they accept them in their descriptors.
Additionally, I think we can go much harder in the anti-Postel direction than @nickm outlines, now that we have Protover and versioning to signal for new feature support.
For example: While I do not see any major attacks wrt being permissive in the exit direction, there seems to be no reason why an exit SHOULD accept a cell type that is not supported by its advertised ProtoVer lines. That is an error. At the very least, refusing to accept cells not advertised by ProtoVer will help us find bugs faster. It will also improve UX, since sending an unsupported cell to an exit will probably generate unexpected behavior eventually.
But I agree that this needs to be specified a little more carefully. If you are the only exit that suddenly accepts an experimental ProtoVer, and nobody else does yet, you suddenly get to capture a good chunk of the traffic of experimental clients, for use in attacks against developers and testers. This could be bad.
Also, allowing dropped cells towards the exit may be fine, but this will inherently allow it towards a service, which can be used as an invisible vector for "flood and check" attacks, where the adversary floods a service and then tries to notice it via extra-info descriptors or netflow logs.
I am hijacking this thread to let you know I have resumed working on this problem very recently, with Tariq Elahi. I am happy to see you're considering the side-channel built on forward compatibility abuses seriously. The few attacks that have been published on the dropmark paper are just the tip of the iceberg; I remember listing few other things an active attacker could abuse from Postel's principle implemented in Tor to guarantee end-to-end correlation. I have never further investigated this list since the root of the issue is the same, and it is far better to work on the disease itself than the symptoms.
So, I was hoping whether you could keep us in the loop for the progress you make on this specific topic? I am planning to investigate an even harder anti-Postel direction than Mike's anti-Postel suggestion, and I would be happy to keep you informed if progress are made. However, that won't be very compatible with your codebase.
Basically, any instance SHOULD NOT accept anything that is not part of its valid protocol operations. Obviously, being so conservative while maintaining a flexible and maintainable distributed network seems a lost cause to fight for. I think this assessment is correct for our current software architecture and lifecycle; i.e., the current way we specify, implement and then deploy new versions of the protocol. Basically, specification and implementation are under our control, and we can cycle and improve between these two steps. We have less control over deployment, as it is mostly under control of the relay operator or its OS policies. I plan working on an architecture that can give full and fine-grained control over these three steps, i.e., specification, implementation and deployment to the developers, and see how it goes.
The best case scenario: we can come up with some new design and implementation in a few years, and learn something from it. I expect it to blend in the Tor network as an independent implementation.
The worst case scenario: a bunch of academic papers not so useful.
I completely agree, @florentin. Nick loosened the requirements of what I think we should do. I do not think we should add any semantically-empty cells in the Tor protocol, and all code should verify that when a cell is received, it actually does something and is valid.
Forward compatibility is no longer an issue for us, at least not via Postel's maxim. Tor now supports protocol versions for specific feature support at relays. There is no reason why a relay has to accept anything not listed in its protocol versions, or send anything not listed in them, other than sloppiness.
This concept is actually already implemented in the vanguards addon, with some help from C Tor's CIRC_BW event. In the C tor implementation, I added calls to circuit_read_valid_data() only when relay cells were processed without an error condition, and were otherwise semantically valid and accepted for that situation, circuit purpose, and circuit state. Then, since command_process_relay_cell() processes every relay cell, if a relay cell is processed that does not result in these valid data byte counts changing after the calls down to relay command handing subroutines, I immediately emit a CIRC_BW event that reflects this fact, and vanguards can detect this condition from the CIRC_BW field values, and then close the circuit.
In this way, only cells that were expected, valid, and actually semantically meaningful are allowed, when the vanguards addon is used.
You should also read https://github.com/mikeperry-tor/vanguards/issues/67, and make sure you're testing with a vanguards addon using pypy and that fix, and be aware of what the race condition means wrt different adversary positions.
Even in the worst case, merely detecting this attack class is worthwhile, as it serves as a deterrent and can help inform the user that they are at risk. But most likely, we can also add circuit padding for these cases.
Forward compatibility is no longer an issue for us, at least not via Postel's maxim. Tor now supports protocol versions for specific feature support at relays. There is no reason why a relay has to accept anything not listed in its protocol versions, or send anything not listed in them, other than sloppiness.
Yes, I've see your Vanguard addon and work to detect and react to unvalid protocol messages. This is indeed very needed. But, if I understand correctly, it only answers the first part of a complex problem, i.e., we need to be both conservative AND super flexible:
Basically, any instance SHOULD NOT accept anything that is not part of its valid protocol operations.
Clearly what you did is a step forward, but I think we should be careful not to break too much ease of deployment, i.e., that's the difficult problem to me:
Obviously, being so conservative while maintaining a flexible and maintainable distributed network seems a lost cause to fight for
I believe that protocol negotiation is not enough. I believe we need something quite crazy to eventually call the problem solved: 1) we need the full network to be conservative. 2) We need having the ability to 'hit' some deploy button to make the whole network compatible to some new protocol feature at once. 3) Finally, we need to be able to show cryptographic evidence of misbehavior when some relay/client sends unvalid protocol messages.
I think 1) is ok. 2) is ongoing, and 3) needs some thoughts.
Regarding the Vanguards addon (I am totally supportive of such a design). I see you're expecting some tests of your defenses, and some thinking is ongoing regarding different threat models (i.e., Guard malicious/ISP malicious and race conditions between dropmarks and destroy cells). That's great. The reasoning there is worth digging more, and we need to test the defenses to move forward.
We would be happy to help, but we need to play the academic game and somehow produce a paper at the end of the process. That's how I can land a job in the future x) Vanguard could make an interesting paper, and we could rebase my old attack code on a more recent Tor version, test the design and help elaborating on the diverse questions.
Here's a potential paper topic, @florentin: First, see if the race can still be won by the attacker, to get enough information through before circuit close, while observing at various vantage points. Then, if it can be won (and it can always be won if the guard is malicious and refuses to close the circuit), to what degree can circuit padding from the middle node help? And at what points of circuit use would we need to pad, and how much?
@mikeperry sounds good! I'll try experimenting with circuit padding inside our framework. We're working on a proof-of-concept paper explaining our methodology, and demonstrating the capabilities of a anonymous network built upon 1), 2) and 3) described in my previous comment.
Regarding circpad, here's the plan: I will work soon on getting this framework 'remotely' re-programmable, such that one can inject circpad machines to some peer (experimenting alternative deployment methods) to protect their own traffic. We will try demonstrating this capability while protecting against dropmarks. That should help you get your work tested at the same time.
I am thinking of some scenario such as the client reprogramming its middle relay at circuit-establishment for it to send padding cells until the client tells the relay to stop because it successfully connected to some destination.
Man this is great. Are you planning to eventually give the ability to receive padding machines from the network as well? Loading the bitstring representation of its configuration and applying it?
This is neat stuff! Now, I assume this is somewhat linked to the very concern of this thread. Do you plan to model the Tor routing protocol as a finite state machine, and get this machine updated through the network via something like a signed bitstring representation of the new protocol state machine?
I guess that should effectively replace Postel's principle. The Tor routing protocol could stay conservative (and up to date when it receives a new bitstring from the authorities) by ensuring a circuit's state machine is always in a valid state. Otherwise, it kills the circuit.
That looks complicated though :) I would be happy if you could confirm these thoughts?
@florentin - close but not quite. The padding machines have events that trigger various behavior or state changes within themseleves. New kinds of events can be added if need arises. We did not design it to mirror the entire Tor protocol, but instead allow the machines to react to important activity within Tor. They are complicated, and primarily meant to be programmed by a GA or similar optimization process. This is exactly what Tobias did in the paper I linked on arxiv, to great success against website traffic fingerprinting. The padding machines could in theory be updated from the consensus, but we have not implemented that yet. They can also be programmed by hand, of course, and George and I have done this, too.
This doesn't have much to do with Postel's maxim. We no longer need Postel's maxim because we have protovers and negotiation as the policy for every new protocol feature in Tor. This is a separate concern from the need to upgrade stuff dynamically. While dynamic upgrades are interesting in a theoretical sense, they also pose a security and stability risk in a practical sense. You might get an interesting paper out of that topic, but it is unlikely we would deploy such a thing.
I think the most interesting practically deployable paper in this area remains: to what degree can padding help with respect to the race condition between closing the circuit, and side channel cells still in flight.
At this point, we should probably continue this conversation over email, as we're polluting this ticket. I'm glad you're excited though. You can reach me at this username at torproject.org.