In channel_tls_process_netinfo_cell() we have a section that says
if (!(chan->conn->handshake_state->authenticated)) { tor_assert(tor_digest_is_zero( (const char*)(chan->conn->handshake_state-> authenticated_rsa_peer_id))); tor_assert(tor_mem_is_zero( (const char*)(chan->conn->handshake_state-> authenticated_ed25519_peer_id.pubkey), 32)); /* If the client never authenticated, it's a tor client or bridge * relay, and we must not use it for EXTEND requests (nor could we, as * there are no authenticated peer IDs) */ channel_mark_client(TLS_CHAN_TO_BASE(chan));
That's great and awesome -- it's a clear solid definition of client: a client is somebody who did the client handshake with us, not the relay handshake.
But then in command.c we have this weird clause:
if (connection_or_digest_is_known_relay(chan->identity_digest)) { rep_hist_note_circuit_handshake_requested(create_cell->handshake_type); // Needed for chutney: Sometimes relays aren't in the consensus yet, and // get marked as clients. This resets their channels once they appear. // Probably useful for normal operation wrt relay flapping, too. channel_clear_client(chan); } else { channel_mark_client(chan); }
This definition of client is much murkier: a client is anybody who, at this moment of receiving a create cell, had earlier presented an identity digest that isn't in our current consensus and that we don't have a cached descriptor for.
I think we should standardize on one definition, and I think we should pick the first one: it's clean and not full of false positives.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I discovered this bug while working on the current circuit / conn overload patches, considering blacklisting client IP addresses that misbehave in certain ways. I found that my fast relay has a lot of conns that are marked as channel_is_client() that are very clearly connections to/from relays.
I assume the conns started off not marked as clients, and then the clause in command.c declared them as clients.
Right now, in channel_tls_process_netinfo_cell(), for the OR_CONN_STATE_OR_HANDSHAKING_V3 case, we call channel_mark_client() properly.
I propose that we call everything a client that handshakes with tor_tls_used_v1_handshake() or OR_CONN_STATE_OR_HANDSHAKING_V2, since relays should never use those legacy handshakes with other relays at this point.
And then we can happily remove the dangerous clause from command.c.
Comment written in parallel to teor's above comment:
I pushed a bug24898 branch that I think fixes everything.
It's on master though, and we might want to backport this. I'm not sure how far back to backport it -- we could go back to maint-0.3.1 pretty easily (there's a conflict but it's just because of a comment), but going back to 0.2.9 would be quite tricky because #21406 (moved) went into 0.3.1.
The question of whether to backport more depends on how much we want these fixes in 0.2.9 -- this one is a cornerstone of some of the DoS mitigations that might come next.
I am happy with the latest branch, let's merge it to master and consider back to 0.3.1.
I added #22805 (moved) as a child, because those changes are related as well.
But 0.2.9 and 0.3.0 have been fine without any of this other code.
If we want to go back earlier than 0.3.1, let's only backport the parts of channel_tls_process_netinfo_cell() in #21406 (moved) that we need to get this working, and leave out everything else.
(This code sets chan->is_client = 0; directly in 0.3.0, and it does the rep_hist thing in another place, so we'll need another branch if we want to backport there. The good news is that the create cell bug was introduced in 0.3.1 with the netflow padding code, so we only need to backport the v1/2 fix to 0.3.0 and earlier.)
Commit f5ff9f23 in my bug24898-more branch has a bunch of fixes to replace connection_or_digest_is_known_relay() with channel_is_client().
Feel free to merge the commit straight-up, or feel free to strip it down for parts and take pieces of it for your own commits for all these related tickets.
Commit f5ff9f23 in my bug24898-more branch has a bunch of fixes to replace connection_or_digest_is_known_relay() with channel_is_client().
Feel free to merge the commit straight-up, or feel free to strip it down for parts and take pieces of it for your own commits for all these related tickets.
Let's deal with that in #23423 (moved), because all the details are there. It's not something we're going to backport.
If we want to backport arma's bug24898-031 to 0.2.9 and 0.3.0, we need to backport:
the channel_mark_client() in connection_or_set_state_open() from commit af8cadf3a9 in branch bug24898-031 in this ticket
the channel_mark_client() in channel_tls_process_netinfo_cell() from commit 46fe353f25 in master in #21406 (moved)
I am still happy with merging arma's bug24898-031 to 0.3.1 and arma's bug24898-2 to master.
Why (did we introduce the create-cell check in the first place? ISTR that we thought we had a reason to do so. It looks like Mike introduced it in b0e92634d85a3bf7612a6ce0339b96e4aad1e0bb, and moved it in 02a5835c27780e45f705fc1c044b9c471b929dbe. Adding mike to cc.
We could skip the 0.3.0 backport, I think, since 0.3.0 is EOL at the end of this month. But we should do an 0.2.9 backport if we think this is important: 0.2.9 is supported till 2020.
Why (did we introduce the create-cell check in the first place? ISTR that we thought we had a reason to do so. It looks like Mike introduced it in b0e92634d85a3bf7612a6ce0339b96e4aad1e0bb, and moved it in 02a5835c27780e45f705fc1c044b9c471b929dbe. Adding mike to cc.
Mike's comments say that it led to chutney failures.
I haven't observed any failures, but I have observed warnings in 0.3.2.9 when we apply this patch to master in a mixed network.
That really is a bug in 0.3.1 and 0.3.2, because Mike's 0.3.1 fix is identifying clients incorrectly.
There's nothing we can do about these warnings, except backport the fix.
We could skip the 0.3.0 backport, I think, since 0.3.0 is EOL at the end of this month. But we should do an 0.2.9 backport if we think this is important: 0.2.9 is supported till 2020.
It does appear to me that if we use the handshake patch from #21406 (moved) that we're going to be much better about identifying well-behaved clients correctly, and this should be better for the netflow code too (and not break it). I am a little worried about misbehaving clients - for those it feels smarter to check the digest in cases where that matters for DoS. But that should be a separate bug.
I have made a bug24898-029 branch, designed for inclusion in 029 and 030. It will conflict with 031 because stuff is already fixed there, so you'll want to do the right version of 'merge -ours' after 030.
We will want this branch merged when we put the #24902 (moved) DoS patch into 029.
Do we still use CREATE_FAST to mark clients in 029? Or did we backport that fix as well?
We still do it. That is, anybody who sends you a create-fast cell when you're running 029 or 030 now gets counted as a client, meaning the DoS defenses will be triggered against them (once #24902 (moved) gets backported), and meaning when we consider whether a channel can satisfy an extend request, this one can't.
Why, are we going to regret that? :) See also the discussions in #22805 (moved).
Do we still use CREATE_FAST to mark clients in 029? Or did we backport that fix as well?
We still do it. That is, anybody who sends you a create-fast cell when you're running 029 or 030 now gets counted as a client, meaning the DoS defenses will be triggered against them (once #24902 (moved) gets backported), and meaning when we consider whether a channel can satisfy an extend request, this one can't.
Why, are we going to regret that? :) See also the discussions in #22805 (moved).
I think we are going to regret that, and we should backport the parts of #22805 (moved) that remove the CREATE_FAST mark client.
We should also take them out on general principles, because it reduces the number of connections between relays.
Ok, I added a second commit to my bug24898-029 branch that backports that part from #22805 (moved). I think we are good to go now.
To be clear, that means commit 4bd208b wants to make it into 0.2.9 and 0.3.0, and its changes are already present in 0.3.1. Whereas commit 2076db4 wants to make it into 0.2.9 and 0.3.0 and 0.3.1, and its changes are already present in 0.3.2.
Sorry in advance for the mess Nick. I would just merge it all myself but I think I would screw up the "ours" aspects of the conflict handling. :)
Ok, I added a second commit to my bug24898-029 branch that backports that part from #22805 (moved). I think we are good to go now.
To be clear, that means commit 4bd208b wants to make it into 0.2.9 and 0.3.0, and its changes are already present in 0.3.1. Whereas commit 2076db4 wants to make it into 0.2.9 and 0.3.0 and 0.3.1, and its changes are already present in 0.3.2.
Sorry in advance for the mess Nick. I would just merge it all myself but I think I would screw up the "ours" aspects of the conflict handling. :)
Yes, we must backport 4bd208b and 2076db4 to 0.2.9, and backport 2076db4 to 0.3.1, so that the DoS backport identifies clients correctly.