TROVE-2021-003: layer hint not validated on half-open streams.

[This issue is confidential. It was reported by Jann Horn at Google. Here is the text of his report.]
##### half-closed connection tracking ignores layer_hint #####
(one-sentence summary: entry/middle relays can spoof RELAY_END cells
on half-closed streams, which can lead to stream confusion between OP
and exit)

When an OP receives a RELAY_END cell on some stream from the exit, the
OP knows that it can't receive any more cells for that stream (the
spec says "Upon receiving a RELAY_END cell, the recipient may be sure
that no further cells will arrive on that stream"); and so it can
immediately reuse the StreamID. But if the OP sends RELAY_END to the
exit, it can not immediately reuse the StreamID without some kind of
acknowledgement from the exit that the RELAY_END cell was received:
Until the exit receives the RELAY_END, it might still send cells with
that StreamID.

It looks like this wasn't really addressed in the original protocol,
and especially until
https://gitweb.torproject.org/tor.git/commit/?h=144647031aa9e7eacc6f7cdd8fed663c7229b2aa
("Ticket #25573: Check half-opened stream ids when choosing a new
one") the OP would just reuse StreamIDs in that situation, which, as
the commit message notes, can lead to data corruption because streams
get mixed up.

Now, when the OP sends a RELAY_END, it stops tracking the stream
normally, but instead tracks it as a "half-closed" stream. It
continues doing so until it receives a RELAY_END from the exit; at
that point, the half-closed stream is freed and its ID is available
for reuse again. What's a bit weird here is that the exit-side code
doesn't have code for acknowledging RELAY_END with a RELAY_END
response; it just silently drops the connection. This means that when
the client closes a stream, the resulting half-closed stream continues
to occupy ID space and OP memory until the entire circuit goes away;
the OP-side code that removes half-closed streams when RELAY_END is
received only runs if both sides of the stream simultaneously send
RELAY_END before receiving each other's cells.

The security bug is that the OP-side logic for handling RELAY_END
cells on half-closed streams (the calls from
handle_relay_cell_command() to connection_half_edge_is_valid_end() and
to connection_half_edge_is_valid_resolved()) ignores the layer_hint,
which specifies which relay on the circuit the cell came from. This
means that entry/middle nodes can spoof RELAY_END cells, causing
connection_half_edge_is_valid_end() to prematurely make StreamIDs
available for reuse, effectively restoring the pre-2018 stream
confusion protocol issue.

To actually make a stream confusion happen, an attacker with the goal
of injecting a crafted reply into a connection by the client to some
endpoint would probably need both:

 - a lot of control over when the client opens and closes connections
 - control over the middle (or entry) node

I haven't tested whether this can be triggered from a browser; I've
only tested it with a custom client that goes through the following
steps, against a chutney network with a modified Tor on the entry
node:

[This assumes that no streams have been allocated on the circuit yet!]
1. on the client, pow(2,16)-3 times (with parallelism):
  1.1: client: open a connection A through tor to some server
[allocates a stream ID]
  1.2: client: wait for connection A to be established
  1.3: client: close connection A [places the stream ID in half-closed
state; Tor never cleans this up]
[at this point, only two consecutive stream IDs ID_a and ID_b are
still available; ID_a will be used next]

2. client: open a connection B through tor to some server [allocates ID_a]
3. server: close connection B [frees up ID_a again when received by OP]
4. client: wait for connection B to be closed
[at this point, ID_a and ID_b are again available, but now ID_b will
be used next]

5. evil entry node: from now on, capture inward cells (from exit to
OP) instead of forwarding them

6. client: open connection C through tor to evil injecting server
[allocates ID_b]
7. evil injecting server: accept connection C [RELAY_CONNECTED cell is captured]
8. evil injecting server: reply on connection Cwith malicious data
[RELAY_DATA cell is captured]
9. client: close connection C [RELAY_END cell is captured] [ID_b is
placed in half-closed state]

10. client: open connection D through tor to some server [allocates ID_a]
11. client: close connection D [places ID_a in half-closed state]
[for the next stream, the OP will first try to use ID_b if it is free]

12. evil entry node: stop capturing cells, discard inward cells from now on
13. evil entry node: for each possible stream ID from 1 to
pow(2,16)-1, send a fake RELAY_END cell
[before, all IDs were marked half-closed; now all IDs are free again]

14. client: open connection E through tor to victim server [allocates ID_b]

15. evil entry node: replay captured cells
16. client: connection E receives RELAY_CONNECTED, RELAY_DATA,
RELAY_END that were intended for connection C

Attached are:
 - 0001-relay-code-for-stream-confusion-attack.patch: a patch on top
of Tor 0.4.5.7 to implement extra control commands that can be used to
perform attack steps on the attacking entry/middle relay
 - stream_confusion_server.c: code for an attacking TCP server that
the client contacts through the Tor network
 - confused_client.c: code for the SOCKS client that opens and closes
streams in the right order to trigger the bug

A successful run looks like this on the client:

$ ./confused_client
assuming that the circuit is pristine (no stream IDs allocated yet)!
consuming most stream IDs...
65280/65533
hopefully there are now 2 consecutive stream IDs remaining
using and freeing one ID...
server closed socket, ID was hopefully reused
please run TRAP_NEXT_CELL control command, then press enter:
opening connection to injecting server...
waiting for packet to go out...
resetting next-ID hint...
hopefully reset next-ID hint?
please run BRUTE_DROP_STREAMS control command three times (with some
time in between), then press enter:
opening victim connection...
waiting for stuff to settle...
please *quickly* run REPLAY_CELLS control command *now* (or after a
few seconds the OP will time out)
victim_sock apparently connected?
got string: 'injected text
'
victim connection closed

On the control connection to the entry node:

$ (echo "authenticate $(hexdump -e '32/1 "%02x""\n"'
chutney/net/nodes/003r/control_auth_cookie)"; cat) | nc localhost 8003
-vv
Connection to localhost (127.0.0.1) 8003 port [tcp/*] succeeded!
250 OK
TRAP_NEXT_CELL
200 targeting circuit with next relay cell
BRUTE_DROP_STREAMS
200 done, dropped half-open streams up to 24577, call me again
BRUTE_DROP_STREAMS
200 done, dropped half-open streams up to 49153, call me again
BRUTE_DROP_STREAMS
200 done, dropped all half-open streams
REPLAY_CELLS
200 done, replayed 5


Things I fiddled with in chutney's torrc files for testing:
 - configured fixed nodes for building circuits on the OP to simplify
testing (EntryNodes, MiddleNodes, ExitNodes)
 - bumped the V3AuthVotingInterval on the authorities to 600 to reduce log spam


This bug is subject to a 90-day disclosure deadline. If a fix for this
issue is made available to users before the end of the 90-day deadline,
this bug report will become public 30 days after the fix was made
available. Otherwise, this bug report will become public at the deadline.
The scheduled deadline is 2021-08-12.
Edited May 14, 2021 by Nick Mathewson
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information