A couple minor spelling/grammar corrections.
Hi @nickm - I was reading over your new prop355, really interesting and impressive effort! I enjoyed reading about the handshake design choices for a PQ tor and I like the transitional vs. "next-generation environment" distinction. The performance tables and the short round trip examples are really helpful to see also!
While reading, I found a couple of minor typos that are addressed in this MR.
NB: one of these changes, I am not certain about, specifically this one:
- * R2->Client: "Extra data reply", "EXTENDED2 (from R23."
+ * R2->Client: "Extra data reply", "EXTENDED2 (from R3)."
I had a couple notes that I made while reading that aren't necessarily actionable items, but I provide them here because I wasn't sure where else it would make sense to do so, feel free to ignore if I've got anything fundamentally wrong that would require substantial explanation to educate me in what I am missing, as that is likely the case due to my limited depth in both the PQ and lower-level tor fundamentals.
-
You mention the the NIST standardized algorithms by calling them "ML-KEM, ML-DSA, SLH-DSA," but I kept mentally translating them to what I think are the 'standard' names from NIST (although I'm having trouble confirming this). From what I can tell, NIST's official standardization documents and most cryptographic libraries refer to CRYSTALS-Kyber (the 'module-lattice' KEM), CRYSTALS-Dilithium (the 'module-lattice' digital signature), SPHINCS+ (the stateless hash-based signature), and possibly FALCON (another lattice-based signature). So I was mentally substituting 'ML-KEM' with 'Kyber,' and 'ML-DSA,' with 'Dilithium', and I was presuming that SLH-DSA was SPHINCS. I fully understand why you would want to use the short acronyms. Maybe a footnote, 'These are references to NIST’s CRYSTALS-Kyber and CRYSTALS-Dilithium, plus SPHINCS-like.' to help future readers who do not recall the older(?) naming might help.
-
For large KEM parameters would we need to split or fragment handshakes due to payload size? It seems the largest parameter sets in the table (e.g., ML-KEM-1024) are already in the ballpark of 1–1.5 KB for the pubkey or ciphertext.
-
I was wondering how older Tor relays/clients would handle these new handshake message types, maybe its out of scope for an exploration like this, but some kind of version negotiation approach, including a fall-back might be necessary for deployment transitions.
-
If the extra data in ntor v3 is not forward-secure would that mean that in transitional handshakes, an attacker with a future CRQC can potentially decrypt those fields? <aybe that is acceptable for transitions, unless other data that should have forward secrecy is included in extension data.
-
Really interesting to see the numbers in your tables! I was thinking about large relays that build thousands/tens of thousands of circuits/second and if adding 40–60 microseconds per handshake might be significant. Some of the approaches (PQ-KEM-DSA) are comparatively quite large which seems out of scope for typical relays.
-
You mention that next‐generation "will require relays to have PQ identity keys", and that is out of scope (which makes sense) - I was just wondering if this requirement is because a CRQC could forge the relay identity as the circuit handshake alone can't provide that property, and that would be required to resolve MiTMs? That wasn't obvious to me, but probably would be to other readers who are more knowledgeable in this space.
-
In implementation, should we be considering "KEM agility" so there is a generalized KEM-based approaach that allows for future PQ KEM standardization?
Merge request reports
Activity
requested review from @nickm
assigned to @micah
You mention the the NIST standardized algorithms by calling them "ML-KEM, ML-DSA, SLH-DSA," but I kept mentally translating them to what I think are the 'standard' names from NIST (although I'm having trouble confirming this).
AFAICT the standard names are ML-KEM etc. Have a look at the standardization documents linked at https://csrc.nist.gov/projects/post-quantum-cryptography. Now, these standards are based on CRYSTALS-{Kyber,Dilithum} and SPHINCS+, but they are not the same.
In fact, each of the standards has a section (e.g. appendix C.2 in FIPS 203) explaining how they are different from the submitted algorithm.
Other standards that are using these algorithms (like https://datatracker.ietf.org/doc/draft-kwiatkowski-tls-ecdhe-mlkem/) are using the versions of the algorithms from NIST, and the names from NIST. This also goes with our own previous usage, like how we refer to "SHA-3" and "AES" rather than "Keccak" and "Rijndael".
Action - add a note explaining all of this in the proposal.
For large KEM parameters would we need to split or fragment handshakes due to payload size?
Yes; we have an open proposal for this that needs to be implemented.
Action - note prop340 in the proposal.
I was wondering how older Tor relays/clients would handle these new handshake message types
We've had circuit handshake negotiation since we introduced CREATE2 cells and EXTEND2 messages for ntor: when telling a relay to extend a circuit, the client chooses a "handshake type" to use.
To advertise which protocols are supported, relays use "subprotocol versioning" in their descriptors and microdescriptors. (Also see proposal 346 for refinements to the current system.)
If the extra data in ntor v3 is not forward-secure would that mean that in transitional handshakes, an attacker with a future CRQC can potentially decrypt those fields?
"Not forward secure" in this context means that if an attacker steals a key after the session is over, they can decrypt something. So in this context it means that an attacker who steals an ntor onion key can decrypt all of the extra data that was sent to that key in the past.
With respect to quantum computing, ntorv3 (Tor's current handshake) is completely vulnerable: An attacker with a future CRQC can decrypt everything sent over ntorv3.
For the variants marked PQ:everything and PQ:Decrypt-XD, an attacker with a future CRCQ can decrypt all extra-data fields. For the variant marked PQ:NoFs-XD, an attacker who steals a PQ onion key can decrypt past extra data.
I was thinking about large relays that build thousands/tens of thousands of circuits/second and if adding 40–60 microseconds per handshake might be significant.
Yeah; this is an area that needs more analysis. I don't think that circuit handshakes are currently the top CPU-user for Tor relays, since (unlike the rest of C tor) they are very well parallelized, but we'll need to see how things look with Arti relays.
If the cost turns out to be significant, we'll need to look in ways to safely lower the frequency with which clients change circuits.
You mention that next‐generation "will require relays to have PQ identity keys", and that is out of scope (which makes sense) - I was just wondering if this requirement is because a CRQC could forge the relay identity as the circuit handshake alone can't provide that property, and that would be required to resolve MiTMs
That's right: if you can forge/steal an identity key (as a CRQC attacker could do with a non-PQ identity key), you can make your own descriptors, and impersonate a relay. Some kind of certificate-transparency at the directory authorities might mitigate this, but it's better to aim for a complete solution.
Fortunately, we have experience with ID key migration, and Arti is set up very nicely to handle it. C tor isn't, though, and it would be really nice to not have to solve the ID key problem until after C tor is long gone.
In implementation, should we be considering "KEM agility" so there is a generalized KEM-based approaach that allows for future PQ KEM standardization?
There are several kinds of KEM agility we might aim for. There are :
- The ability to swap in a new handshake as needed. (We have that today.)
- A design based on an arbitrary KEM, where we can write a new handshake using a new KEM as needed. (I hope that will be the result of any new proposal.)
- Implementing multiple KEMs in clients and relays, so that we can rapidly switch to a stronger one if needed. (This isn't the trade-off we've made in the past, but we could think about it. It depends on how long we think it would take us to deploy something new in an emergency, and how much notice we think we'd get of a need to switch. Historically, having unused crypto options has led to vulnerabilities in other programs in the past, so we need to be careful here.)
mentioned in commit 33ef7615
mentioned in merge request !357 (merged)
I've added !357 (merged) for followups in the spec to clarify the Action points above.
Thanks for the follow-ups @nickm !