Right now, circuit_n_chan_done() checks rsa identity digest, but not ed25519 identity or is_canonical.
This is probably not a high-security issue, so I'm going to use it to experiment with confidential issues. Here's why I think this is not high-security: If we're a client, we already filled in the intended ed25519 identity and address for the channel when we launched it, and rejected the channel if it was wrong.
If we're a relay, then exploiting this issue would at worst require an attacker to jump through a lot of hoops (impersonating RSA identity, sending bogus EXTEND cell at exactly the right time to de-rail other pending circuits), and also either require the attacker to steal onion keys, or limit the attacker's capability to encrypted traffic sniffing.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Attacker asks relay1 for a circuit to relay2, at <IP-Evil, RSA, no-ed-key-given>. Relay1 doesn't have a channel to relay2 yet.
Before that channel opens, a legit user asks relay1 for a circuit to relay2 <IP-real, RSA, Ed>.
The channel completes, to IP-Evil. Without these checks, both circuits can get attached to this channel. The channel has the wrong IP, though, and the wrong Ed key.
Subsequent circuits through relay1 to relay2 at <RSA,Ed> are not affected and won't get misdirected, since this bug is only in circuit_n_chan_done().
What can the attacker do with this attack?
They can't actually impersonate the target relay in order to decrypt traffic, unless they also have stolen or cracked its onion keys.
They can open a channel of their own to the target relay, and pass traffic along. This lets the attacker do MITM for the purposes of traffic analysis from relay1->relay2.
What's the fix?
channel_get_for_extend() and circuit_n_chan_done() need to check the same propoerties, including ed25519 identity and is_canonical.
What's the severity?
Per our security policy, I don't think this fits well into any of the existing categories. It might be "impersonation of a relay", which would be high severity ... except that it doesn't let the attacker impersonate the relay. I think that the best summary is "it makes some kinds of traffic analysis easier" -- which is not on the list, as far as I can see.
Hey Nick. I think your logic in #40080 (comment 2704703) is sound and I agree with you that the security risk is actually very low for an attack that requires such precise coordination (since it only targets pending circuits).
The patch looks reasonable but I think you need !ed25519_public_key_is_zero() there.
Other than that, has this been tested on chutney?
This is my initial review here, but please wait for another one tomorrow since I'd like to give it a bit more thought.
Good stuff. Here are five thoughts, to see if I am understanding things correctly:
(1) This ticket started off looking like a "if they can crack the RSA key, they don't need to crack the Ed key" vulnerability, but I think it isn't one of those. That is, I don't see any attacks that become possible here by being able to break the rsa key but not the ed key. Even if the attacker has factored the rsa identity key, to make use of that, they would need to get their target to use an onion key they control, and that requires publishing a new descriptor with that onion key, and the dir auths will check the ed signature on that descriptor and refuse it. That is, our 'permanent greedy' binding between rsa and ed keys, and the fact that the dir auths are gatekeepers of directory info, are what stopped this from turning into a practical attack.
(2) Now, there is an attack that's possible even if you didn't break the rsa key. The bug is that circuit_n_chan_done() assumes that any completed channel to the right identity key is suitable for pending create cells. That is, it doesn't check whether the channel is to a canonical address for the destination relay (or if it isn't canonical, whether the channel is to the specific address:port that the extend cell requested). That's a real bug, and I bet it dates back to when we implemented the is_canonical stuff.
(3) I agree that the impact of that bug is limited here, first because it can only be applied on channels that aren't yet established (and probably two large relays will spend most of their time with a channel already established between them), and second because you need to win the race to get your extend request in at just the right time compared to the target's extend request.
(4) One way to compare the severity of the attack is whether it's any worse than having one relay be malicious. I pondered for a while if there are worse variants of the attack, e.g. where I spend my whole day spraying extend cells at many relays and some fraction of my attempts get lucky. But the fact that any legit extend cell will "innoculate" the relay pair really limits the damage from this broader-scale attack.
(5) Now, what exactly is the window of opportunity? It depends how long an intercepting attacker can hold the connection in the not-yet-opened state before the initiating relay decides it has failed. I wonder if there are edge cases in our state machine where we happily wait as long as it takes, with the conn in a not-yet-open state? On first glance, this looks fine: run_connection_housekeeping() checks
int past_keepalive = now >= conn->timestamp_last_write_allowed + options->KeepalivePeriod;[...] } else if (!connection_state_is_open(conn)) { if (past_keepalive) { /* We never managed to actually get this connection open and happy. */ log_info(LD_OR,"Expiring non-open OR connection to fd %d (%s:%d).", (int)conn->s,conn->address, conn->port); connection_or_close_normally(TO_OR_CONN(conn), 0); }
which looks like "at most five minutes of lack of progress until it gets closed". Maaaybe there are mechanisms where you can make it think it's continuing to make progress, yet it never completes? But even then, you'd need to not just be holding the not-yet-finished connection in limbo, but you'd need to race to make it complete before the target's other conn completes.
So: am I understanding things correctly so far? If so I think in terms of security severity, this falls into the category of "bug whose impact was mostly mitigated by other protections."
Hm -- unless "timing side-channel attacks" meant timing attacks to recover key material? In that case there's an argument that the thing we're talking about here is less bad than that, and could maybe count as low-severity. I'm fine with calling it either low or medium.
So here's why it matters whether we pick "low" or "medium" severity: if this is a medium-severity issue, we should embargo it until we're ready to put out stable releases. If this is a low-severity issue, we should just treat it as a regular bug.
(If it were high-severity, we'd need to put out new stable releases right now.)
I've walked the attack in the code here in order to fully understand. The window is super tight and correct me if I'm wrong:
When an EXTEND arrives from a legit client (in circuit_extend()), the function channel_get_for_extend() is the one responsible for finding the right channel.
It will match the RSA key but won't match the relay2 address because the function channel_matches_target_addr_for_extend() will attempt to match IP-Real vs IP-Evil resulting in a no match. And thus relay1 will end up creating a new channel to <IP-Real, RSA, ED>.
Then, our code path brings us in circuit_open_connection_for_extend() which is the function opening a new channel to the IP-Real relay2 but also puts the circuit in the circuits_pending_chans list due to the call to circuit_set_state(circ, CIRCUIT_STATE_CHAN_WAIT);.
And now, just after, lets assume the channel to IP-Evil opens, and because that legit client circuit is in the "pending chan" list waiting for relay2 IP-Real to open, it will be selected by circuit_n_chan_done() which only match the RSA identity ... bad.
The race is quite tight to be honest... An attacker would need to constantly do that at high rate all over the network in order to catch this race and possibly re-route traffic of some clients through their malicious relays.
However, the simple fact that "it is possible" that one single legit circuit can be re-routed to a malicious relay instead of the legit one picked by the client, I would at least mention it as medium since just the consequence of that is very bad even though it is hard (?) to pull off.
At the most recent network team thursday meeting, I got goahead to apply this publicly to 0.4.5.1-alpha, and then put it in stable releases a week or two later.