However, we do not need to negotiate all of those, as some only affect an endpoint's behavior in ways that don't matter to the other endpoint. I will need to review everything and think for a bit to produce a list of parameters that do matter if they differ between endpoints.
(This is also a function of how much cheating detection we want to implement via algorithm checks vs just relying on the oomkiller).
Okay, here are a few options. I'm setting some design principles to begin
with that I hope will help us keep things simple:
These parameters are constant for a given circuit hop. Once the hop
is negotiated, they can't change.
It's probably best to stick with things that won't mess our state machine
too badly. State machines are easy to mess up in C-tor, and I don't want to
complexify them even further.
This won't be the last thing we ever want to negotiate.
The client has a good idea whether the relay supports these options, but
the relay could always downgrade. The relay, on the other hand has no idea
whether the client supports these options.
We don't want to add any more round trips to circuit setup than we already
have.
Question 1: Where do we put the negotiation
Option 1: Part of the CREATE/CREATED handshake.
Right now we have a variant of our ntor handshake that we use in onion
service rendezvous. One of the features of this variant is that the client
can encode additional encrypted data in its CREATE cell.
The client's additional encrypted data does not receive forward secrecy.
We would need to define a variant of this that allows the responder
to send additional encrypted data in its CREATED cell. The responder's
additional encrypted data could get PFS.
For details see rend-spec-v3.txt section 3.3.2, "NTOR-WITH-EXTRA-DATA".
We might not want to adopt all aspects of this handshake: We'd want different
fixed-string parameters, we'd not want to change our AES key length, and we
might choose to leave the hash/mac choice alone.
If we negotiate in this way, then the client's extra data should be a
key-value list of the parameters it wants to set. The relay's response would
be the actual values it chooses for those parameters.
Good points:
Every circuit would have negotiated parameters at creation time; nothing
further would need to be added to the state machine.
This is fairly simple to build and test. I would estimate about 6-12 hours
for me.
If we design this well it could share some code with our hs-ntor
implementation. Eventually we could converge on one handshake implementation
for both purposes.
Bad points:
The preceding relay would be able to tell that this negotiation was
happening because of the change in the EXTEND cell.
This will move us towards a future where we need fragmented cells. (But we
think we're going there anyway in order to support PQ and other stuff.)
No PFS on the client's negotiation message.
Option 2: Separate cell immediately on startup
We could define a CIRCUIT-NEGOTIATE message type that is sent immediately after
receiving the CREATED/EXTENDED cell for a circuit hop. We'd have to say that
it has to be the first message on the circuit, or it isn't valid.
(In theory we could allow this message at any point before the first BEGIN
cell on the circuit, but that seems more fragile.)
In response to CIRCUIT-NEGOTIATE, a relay that recognizes the type would have
to send a CIRCUIT-NEGOTIATE cell to complete the negotiation.
Good points:
Doesn't trivially leak the use of the protocol to the previous relay.
(But does leak it through timing/volume).
Doesn't require CREATE handshake changes.
Gets PFS on the client's negotiation mechanism.
Bad points:
Changes the state machine for circuit setup. We'd need a separate bit for
"have we received any non-negotiate cells" and we'd need to make sure to
check it and set it in a bunch of cases. On the client side we'd need a
separate bit for "after we get an EXTENDED/CREATED cell do we send
negotiation?" and for "did we send negotiation to this hop?" There are a
lot more corner cases to think about.
Adds an extra round-trip for the last hop at least, where we can't batch this
message with an EXTEND (but maybe we can batch it with a BEGIN, if we have
one?)
Question 2: How do we actually negotiate?
Here are options for how to "negotiate" a parameter that is set in
the consensus. We want to allow the consensus's value for this parameter to
change, but we don't want people picking "just any value".
We'll assume that the parameter is called "sendme_inc", but these options
apply generally.
Design: Range and preference
With this design the consensus should list three related parameters:
sendme_inc, sendme_inc_min, and sendme_inc_max.
When the relay operators want to experiment with different values for
sendme_inc, they set sendme_inc_min and sendme_inc_max to the range in
which they want to experiment, making sure that the existing sendme_inc is
within that range.
Once everybody should have a consensus with the new sendme_inc_min and
sendme_inc_max, the authorities can change sendme_inc to be anywhere
within that range.
To do negotiation, the client sends its view of sendme_inc. The relay must
accept this value if it lies between min and max, and reject it
otherwise.
If the authorities decide that they want to stick with a given value
long-term, they set all three parameters to the same value.
Good points:
Simple to explain
Allows minimal variation when not in an experimentation period.
Once min and max are defined, changing the value can happen in a single
consensus vote.
Bad points:
One party gets to pick any value between min and max. (We could consider
using a list of allowable values instead, but that's harder to vote on.)
The authorities need to wait for min and max to propagate before they
can change the value.
Question 3: What about onion services?
For parameters like these, I suggest that we can have the onion service
simply advertise whether or not it supports the required handshake/protover
in the encrypted part of its descriptor. This would potentially leak when it
upgraded, but no more.
If we want, we could condition such advertisement on a consensus parameter.
@mikeperry I've left a few thoughts above; let me know what you think. I'm inclined to go with "option 1" to question 1. If you think these approaches are sensible I'll turn it into a proposal or a patch to prop328.
I also agree that the Option 1 CREATE mechanism is simplest and best in this case. PFS is not necessary here; we do not expect these options to be security sensitive, nor should these parameters be changing frequently, and certainly not long-term once we find what look to be good values. Thus they are not huge anonymity risk, nor do they create an adversarial incentive for people to compromise onion keys to break the encryption.
However, for Proposal 329 (conflux), CREATE is not optimal. PFS is much more important there, as obtaining the conflux circuit join nonce allows an adversary to inject or steal data on a circuit. Hence, there will be an adversarial incentive to compromise onion keys in that case. This may be fine; in prop329 we have other requirements too, such as the need to measure RTT in each direction from the start. And trying to combine all that with this is a lot of state machine complexity, especially if we also try to optimize for Prop325 stacking in all cases to reduce RTT. We can treat the conflux handshake as orthogonal, for this reason, and it is likely best.
So I also believe all of this suggests that Option 1 is simpler and better here, but I wanted to cover this other aspect wrt Prop329, so we consider it. I could also be persuaded that we do want to combine them, but that can also be deferred to when we start working on conflux.
For question 2 (range values):
I like the range approach. The range can even be hardcoded in the implementation, and then the authority consensus changes can just make sure they don't change faster than this range. It will also be OK if we have to change consensus params over a few consensuses in the event we need to change larger than the range -- not the end of the world. These experiments will likely span a week or so each. Hardcoding this change range also simplifies voting too. Of course, we could also hardcode default ranges and allow consensus override, just in case.
For question 3 (onion services):
I think that onion services should mirror node protover advertisements in their descriptors, but otherwise obey consensus parameter values just like relays. This allows us to treat protocol support for them in a very similar way as relays, and use the same kinds of negotiation as relays with respect to protocol consensus parameters.
I looked over the set of consensus parameters and the only one we definitely need to negotiate is 'circwindow_inc'. That is the SENDME cell count interval (100). I don't see it changing more than 50% at a time from the consensus parameter value, if we want to do ranges on safe values to negotiate.
For the rest, functionally, it does not matter if the endpoints use completely different algorithms. It won't break anything.
We would need to negotiate more if we wanted to do cheating enforcement via algorithm checks on the receiving endpoint, but I believe we should rely on the OOMkiller and EWMA for that instead. (Cheating will necessarily imply too much queuing).
We of course still also need to check if FlowCtrl=2 is in the protover list of the exit, or present in the onion service descriptor, before marking a circuit as eligible for this negotiation.
Oh, the onionskin negotiation must send a failure notification flag to the other endpoint in the event that we disable congestion control from the consensus by setting 'cc_alg=0'. In this case, we would use the existing flow control (which will break if only one endpoint uses it).
Update: I've written out a spec and reference implementation for "ntor3", a version of ntor that includes encrypted and authenticated data as part of its handshake. Next steps are to do an editing pass to make sure that the two versions actually match, and to add the prop324-specific parts in an addendum.
Quick feedback that I mentioned on IRC: This looks good for the circwindow_inc negotiation, but it needs a way for us to also negotiate if congestion control is disabled by the consensus or not. If one endpoint sees a consensus where cc_alg=0, then the end result should be that it is disabled on that circuit.
For the client side, this is easy. If the client sees 'cc_alg=0', it just does not use this negotiation, the circuit will use the old sendme system.
For the exit side, if it sees 'cc_alg=0', then it needs to send a nack field back to instruct the client not to use congestion control or the 'circwindow_inc' field value, and use the old sendme system instead. I suppose we can just use a fixed message string for this, as the message back from the exit? So maybe this does not need anything extra in the actual spec here?
Other than that, the two endpoints do not need to agree on the specific congestion control algorithm in use, so we should not need a priority list or agreement or anything beyond this. So maybe it is all good as-is?
Hm. Maybe we should use a similar approach that we use for the windows: have a range of allowed cc_alg, and have a preferred one. The client selects the preferred one, and the relay rejects only if it is not on the list. That way, we can decisively turn the algorithm on or off, but we can do so in a way where a small desynchronization doesn't cause anybody to reject a request from a client. What do you think?