TROVE-2021-003: layer hint not validated on half-open streams.
[This issue is confidential. It was reported by Jann Horn at Google. Here is the text of his report.] ``` ##### half-closed connection tracking ignores layer_hint ##### (one-sentence summary: entry/middle relays can spoof RELAY_END cells on half-closed streams, which can lead to stream confusion between OP and exit) When an OP receives a RELAY_END cell on some stream from the exit, the OP knows that it can't receive any more cells for that stream (the spec says "Upon receiving a RELAY_END cell, the recipient may be sure that no further cells will arrive on that stream"); and so it can immediately reuse the StreamID. But if the OP sends RELAY_END to the exit, it can not immediately reuse the StreamID without some kind of acknowledgement from the exit that the RELAY_END cell was received: Until the exit receives the RELAY_END, it might still send cells with that StreamID. It looks like this wasn't really addressed in the original protocol, and especially until https://gitweb.torproject.org/tor.git/commit/?h=144647031aa9e7eacc6f7cdd8fed663c7229b2aa ("Ticket #25573: Check half-opened stream ids when choosing a new one") the OP would just reuse StreamIDs in that situation, which, as the commit message notes, can lead to data corruption because streams get mixed up. Now, when the OP sends a RELAY_END, it stops tracking the stream normally, but instead tracks it as a "half-closed" stream. It continues doing so until it receives a RELAY_END from the exit; at that point, the half-closed stream is freed and its ID is available for reuse again. What's a bit weird here is that the exit-side code doesn't have code for acknowledging RELAY_END with a RELAY_END response; it just silently drops the connection. This means that when the client closes a stream, the resulting half-closed stream continues to occupy ID space and OP memory until the entire circuit goes away; the OP-side code that removes half-closed streams when RELAY_END is received only runs if both sides of the stream simultaneously send RELAY_END before receiving each other's cells. The security bug is that the OP-side logic for handling RELAY_END cells on half-closed streams (the calls from handle_relay_cell_command() to connection_half_edge_is_valid_end() and to connection_half_edge_is_valid_resolved()) ignores the layer_hint, which specifies which relay on the circuit the cell came from. This means that entry/middle nodes can spoof RELAY_END cells, causing connection_half_edge_is_valid_end() to prematurely make StreamIDs available for reuse, effectively restoring the pre-2018 stream confusion protocol issue. To actually make a stream confusion happen, an attacker with the goal of injecting a crafted reply into a connection by the client to some endpoint would probably need both: - a lot of control over when the client opens and closes connections - control over the middle (or entry) node I haven't tested whether this can be triggered from a browser; I've only tested it with a custom client that goes through the following steps, against a chutney network with a modified Tor on the entry node: [This assumes that no streams have been allocated on the circuit yet!] 1. on the client, pow(2,16)-3 times (with parallelism): 1.1: client: open a connection A through tor to some server [allocates a stream ID] 1.2: client: wait for connection A to be established 1.3: client: close connection A [places the stream ID in half-closed state; Tor never cleans this up] [at this point, only two consecutive stream IDs ID_a and ID_b are still available; ID_a will be used next] 2. client: open a connection B through tor to some server [allocates ID_a] 3. server: close connection B [frees up ID_a again when received by OP] 4. client: wait for connection B to be closed [at this point, ID_a and ID_b are again available, but now ID_b will be used next] 5. evil entry node: from now on, capture inward cells (from exit to OP) instead of forwarding them 6. client: open connection C through tor to evil injecting server [allocates ID_b] 7. evil injecting server: accept connection C [RELAY_CONNECTED cell is captured] 8. evil injecting server: reply on connection Cwith malicious data [RELAY_DATA cell is captured] 9. client: close connection C [RELAY_END cell is captured] [ID_b is placed in half-closed state] 10. client: open connection D through tor to some server [allocates ID_a] 11. client: close connection D [places ID_a in half-closed state] [for the next stream, the OP will first try to use ID_b if it is free] 12. evil entry node: stop capturing cells, discard inward cells from now on 13. evil entry node: for each possible stream ID from 1 to pow(2,16)-1, send a fake RELAY_END cell [before, all IDs were marked half-closed; now all IDs are free again] 14. client: open connection E through tor to victim server [allocates ID_b] 15. evil entry node: replay captured cells 16. client: connection E receives RELAY_CONNECTED, RELAY_DATA, RELAY_END that were intended for connection C Attached are: - 0001-relay-code-for-stream-confusion-attack.patch: a patch on top of Tor 0.4.5.7 to implement extra control commands that can be used to perform attack steps on the attacking entry/middle relay - stream_confusion_server.c: code for an attacking TCP server that the client contacts through the Tor network - confused_client.c: code for the SOCKS client that opens and closes streams in the right order to trigger the bug A successful run looks like this on the client: $ ./confused_client assuming that the circuit is pristine (no stream IDs allocated yet)! consuming most stream IDs... 65280/65533 hopefully there are now 2 consecutive stream IDs remaining using and freeing one ID... server closed socket, ID was hopefully reused please run TRAP_NEXT_CELL control command, then press enter: opening connection to injecting server... waiting for packet to go out... resetting next-ID hint... hopefully reset next-ID hint? please run BRUTE_DROP_STREAMS control command three times (with some time in between), then press enter: opening victim connection... waiting for stuff to settle... please *quickly* run REPLAY_CELLS control command *now* (or after a few seconds the OP will time out) victim_sock apparently connected? got string: 'injected text ' victim connection closed On the control connection to the entry node: $ (echo "authenticate $(hexdump -e '32/1 "%02x""\n"' chutney/net/nodes/003r/control_auth_cookie)"; cat) | nc localhost 8003 -vv Connection to localhost (127.0.0.1) 8003 port [tcp/*] succeeded! 250 OK TRAP_NEXT_CELL 200 targeting circuit with next relay cell BRUTE_DROP_STREAMS 200 done, dropped half-open streams up to 24577, call me again BRUTE_DROP_STREAMS 200 done, dropped half-open streams up to 49153, call me again BRUTE_DROP_STREAMS 200 done, dropped all half-open streams REPLAY_CELLS 200 done, replayed 5 Things I fiddled with in chutney's torrc files for testing: - configured fixed nodes for building circuits on the OP to simplify testing (EntryNodes, MiddleNodes, ExitNodes) - bumped the V3AuthVotingInterval on the authorities to 600 to reduce log spam This bug is subject to a 90-day disclosure deadline. If a fix for this issue is made available to users before the end of the 90-day deadline, this bug report will become public 30 days after the fix was made available. Otherwise, this bug report will become public at the deadline. The scheduled deadline is 2021-08-12. ```
issue