Starting an hour or so ago, our main website (www.torproject.org) now fails to load, with the message
An error occurred during a connection to www.torproject.org. The server uses key pinning (HPKP) but no trusted certificate chain could be constructed that matches the pinset. Key pinning violations cannot be overridden.Error code: MOZILLA_PKIX_ERROR_KEY_PINNING_FAILURE
The new cert appears to be Valid Not Before Mon, 08 Jul 2024 00:33:24 GMT, i.e. we just got it.
The theory is that we are pinned to some earlier LE cert that is no longer used in today's chain.
Some more hints:
<PieroV> issuer=C=US, O=Let's Encrypt, CN=R11<PieroV> It seems something not on the hardcoded list<Peng> Let's Encrypt started using new intermediate certificates June 6 or so<Peng> Same root certificates, new intermediates
Suggested short term workaround is to move back to yesterday's cert, because it should still be valid for some weeks, and then you can breathe more easily while figuring out what the new cert ought to actually be.
If our let's encrypt install doesn't keep the old cert around, peng provides this hint, since our new cert is apparently still using the same private key as yesterday:
<Peng> You can redownload certs from Let's Encrypt's API or something like https://crt.sh/?id=12996645452 FWIW
those are actual public/private keypairs that can presumably be used to sign x509 keys in a contingency. they correspond to the file ssl-contingency-keys in TPA's password manager
I've reinstalled the previous certificates still present on nevii to /srv/puppet.torproject.org/from-letsencrypt on pauli, then ran Puppet on our static website hosts. This seems to have fixed the problem in the short term.
According to the actual HTTP headers and #33592 (closed), we dropped our deployment of HPKP some time ago. It's unclear to me where Firefox is seeing any certificate pinning going on.
There are two types of pinning: the HTTP header, which you stopped using, and small static lists built into browsers, used by a small number of orgs including Tor Project, Facebook, and Google and Mozilla themselves. The latter is at issue here.
In terms of LE-issued certificates, it seems we've currently pinned R3, R4, and X3. While X3 has been EOL for some time now, R3 (and its backup, R4) are expiring next year and have been deprecated in favor of R11, which is now being used for certificate issuance.
Besides www.torproject.org we have several websites pinned to these old intermediates, and likely to break once their LE certs come up for renewal:
blog.torproject.org (Not After: Fri Aug 09 00:57:34 UTC 2024)
check.torproject.org (Not After: Fri Aug 09 00:57:54 UTC 2024)
dist.torproject.org (Not After: Fri Aug 09 00:58:52 UTC 2024)
torproject.org (Not After: Thu Aug 22 00:53:51 UTC 2024)
The first three are expected to renew automatically in a few hours, and as such are expected to break rather soon unless we put a hold on those renewals.
Once that's done, we have a month to figure out what to do on the longer term. Ideally, I'd like to avoid the need to issue certificates using the contingency ssl keys: as far as I know we've never ever issued any public facing certificates using these keys.
According to the Mozilla release calendar there's are releases being made tomorrow July 9th and another one on August 6th. @micah would it be possible to look into whether we can squeeze a certificate pinning update in either of those releases? If we can, we might want to consider pinning the ISRG Root instead of the faster-moving intermediates?
Even if that succeeds, it'll still leave users with an older version of the browser unable to access our most critical websites, but I'm not sure if we can really do anything about that. I guess this is one big downside to having these pins in the first place...
Out-of-date browsers automatically stop enforcing pinning -- the last line of Mozilla's file is the expiration timestamp. It appears they run a cron job every few days to update it to 98 days in the future in Unix timestamp microseconds.
In terms of LE-issued certificates, it seems we've currently pinned R3, R4, and X3. While X3 has been EOL for some time now, R3 (and its backup, R4) are expiring next year and have been deprecated in favor of R11, which is now being used for certificate issuance.
However, when @anarcat contacted them, the email requested asked that kISRG_Root_X1Fingerprint be added, as @ma1 had identified that it was a more stable pin. I don't see this in the code, are we sure we are looking at the right code there?
@lavamind wrote, "If we can, we might want to consider pinning the ISRG Root instead of the faster-moving intermediates?" -- which seems like what we tried to do last year.
@lavamind wrote, "If we can, we might want to consider pinning the ISRG Root instead of the faster-moving intermediates?" -- which seems like what we tried to do last year.
Indeed that seems to be the case. However the ticket dealing with this last year doesn't say whatever came out for that email exchange, if Mozilla confirmed the requested changes would happen, if they requested any further clarifications, if they just dropped off the radar, or what... I've asked @anarcat to look into his emails and let us know what he finds.
I think we could feasibly keep the pins in Tor Browser, considering we've got a greater handle on updates and release for that project. One of the main issues here is getting the Mozilla people to move fast on this. In any event, since Tor Browser pins the ISRG X1 Root already, there's no immediate danger of anything breaking over there.
I've removed the affected domains from our list of automatically renewed certificates and re-enabled Puppet on the web hosts, such that the regular renewal of other certificates isn't disrupted.
Since it doesn't seem possible to have Let's Encrypt issue new R3-signed certificates, I've been looking into whether we might be able to deploy certificates which satisfy the current pinning. RapidSSL was identified as a potential candidate, however the specific certificate pinned looks like an intermediate cert that has been expired for some time:
$ base64 -d < transport_security_state_static.pins | sed -n '/^RapidSSL/,/-----END/p' | certtool --certificate-infoX.509 Certificate Information: Version: 3 Serial Number (hex): 0236d1 Issuer: CN=GeoTrust Global CA,O=GeoTrust Inc.,C=US Validity: Not Before: Fri Feb 19 22:45:05 UTC 2010 Not After: Tue Feb 18 22:45:05 UTC 2020 Subject: CN=RapidSSL CA,O=GeoTrust\, Inc.,C=US [...]
Getting a certificate signed by another pinned root, the "DigiCert High Assurance EV Root CA", might be a feasible option, as this one is valid until 2031 so DigiCert is likely still issuing certificates in this chain.
$ base64 -d < transport_security_state_static.pins | sed -n '/^DigiCertEVRoot/,/-----END/p' | certtool --certificate-infoX.509 Certificate Information: Version: 3 Serial Number (hex): 02ac5c266a0b409b8f0b79f2ae462577 Issuer: CN=DigiCert High Assurance EV Root CA,OU=www.digicert.com,O=DigiCert Inc,C=US Validity: Not Before: Fri Nov 10 00:00:00 UTC 2006 Not After: Mon Nov 10 00:00:00 UTC 2031 Subject: CN=DigiCert High Assurance EV Root CA,OU=www.digicert.com,O=DigiCert Inc,C=US
However, EV certificates are a little more involved to obtain and definitely on the more pricey side...
I checked with DigiCert support and they confirmed to me that the EV certificates they issue currently are part of a trust chain the includes the pinned DigiCert High Assurance EV Root CA cert.
Mozilla got back to me and said that there is no difference in terms of how updates are delivered to users, and so sec-critical updates would not be able to update older browsers.
So, they will do a point release on both Firefox and ESR channels, and will follow-up with dates and a confirmation soon.
Thought: Since you're asking browsers to disable PKP (#41175 (closed)), maybe now's the time to break out one of the break-glass keypairs? Won't need 'em again, unless 4 different Heartbleeds happen in the next 3 months. :-)
I suppose we could, but I'm not at all clear exactly how to use them, and what would be the consequences on replacing the current, publicly-trusted LE certificate, with a certificate signed by one of the contingency keys.
I'm assuming because these keys are only trusted in Chrome/Chromium and Firefox via pinning, it would break a lot of other things. I'm thinking for example of Atom/RSS clients parsing the feeds on blog.torproject.org, or API clients that pull data from check.torproject.org. Unless I'm missing something here, I would much rather prefer to temporarily deploy EV certificates from DigiCert.
I have not checked how the Chrome or Firefox implementations actually works, so I may be wrong, but it may work if you get a publicly-trusted certificate with one of those keys. Pinning clients would trust them because of the pin; other clients would trust them because of the CA.
That sounds interesting, unfortunately we have zero documentation on how to go about creating such a certificate. I suppose @weasel would know more about this, but I've not heard back from him so far.
@mnordhoff may be correct that if we have the private key material for these keys, we can use them to generate a CSR that a "trusted" CA would sign and it should be then trusted by these pin lists.
The EV certificates from DigiCert would be the preferable way to go about this, but we will need to get one that covers every domain that is pinned.
Maybe there's a way I could request a certificate from LE in such a way, I guess it doesn't hurt to try?
The EV certificates would not only be expensive, over 2000 USD for 5 certs, but also time consuming because of the verification process inherent to EV certs.
It's definitely theoretically possible; the question is how. And, as I said, I haven't confirmed it would really work in Chrome or Firefox.
At a glance, dehydrated supports specifying a CSR via the --signcsr option. But the documented Puppet module doesn't appear to have a way to pass that option.
(You can generate a CSR for a specified keypair using openssl req or something, although I don't know the syntax off-hand.)
Terrible, terrible hack idea: Since you apparently have dehydrated configured to reuse keys (at least for www.torproject.org), it's possible you could plunk a different keypair in its configuration directory and it might just use it. :-) Or it might error out if it compares it to the old cert.
Terrible, terrible hack idea: Since you apparently have dehydrated configured to reuse keys (at least for www.torproject.org), it's possible you could plunk a different keypair in its configuration directory and it might just use it. :-) Or it might error out if it compares it to the old cert.
I was thinking just that: testing with https://blog.torproject.org, I replaced this domain's private key with the contingency key, requested a new LE certificate via dehydrated and deployed it. The new certificate appears to work in both Chromium and Firefox, and is signed using LE's new R11 intermediate.
Now, the thing is we have 5 domains and only 3 different contingency keys. Would it be very bad to use the same private key for the three months period to bridge the expiry of the pins in older browser versions, or just, uh, not great?
We would also need to confirm how the browser's cert pinning expiry actually works, and if a similar mechanism is in place in Chrome/Chromium.
One idea to avoid reusing private keys would be to deploy all 3 contingency keys on different domains, and buy 2 EV certificates for the remaining two.
Consider the threat model: what kind of compromise would we protect against by ensuring a different private key was used to generate a certificate for each site? Presumably, the answer is if that private key was compromised, then all the sites that are served by the certificate that is generated from that private key are at a certain kind of "risk".
Is there any way that one key could be compromised, where the other keys are not also compromised? I think that the only compromise we are talking about here is a compromise of the private key material, and that is probably all stored in the same place where one compromise will jeopardize all the keys. If that is the case, and private key material isn't exposed in any other way, I don't see any gain from having distinct private keys for each site, except that it is "best practice"?
That private key material is indeed all stored in the same place, but contrary to all other TLS keys, they also live in our password manager, so that's an additional attack surface we'd need to take into account. And even if private key material leaking out isn't a significant risk factor, @mnordhoff mentioned on IRC that you are the type of org people might use novel, complex TLS exploits against, which isn't false...
Of all the private key material, those backup keys are probably the most at risk because they've been around for a long time and the longer private keys have been around, the more possibility of exposure/leaking.
@lavamind Mozilla has the change in Firefox Nightly, could we setup a test scenario with a new cert to try Firefox Nightly to confirm that their changes work?
i just wanted to chime in here to say that i've reviewed the threads here and i think everyone is doing a fantastic job considering how messed up everything is. i do agree going with digicert might our best bet as a stopgap measure, and would make for a fantastic disaster recovery test to see "what happens when we switch away from LE", having a procedure for this would be pretty nice, so that the next time this happens (oh yes, you know it will), we're not in such a panic.
so we have a few weeks, let's bite that bullet and document the hell out of it, so we know how to recover next time we rotate root CAs.
that, of course, is if you can make it. if you don't have time and everything is on fire, just do it and dump as much as you can in this ticket without docs, it's better to fix the issue, of course...
again, good job everyone, and thanks for holding the fort!
anarcatchanged the incident status to Acknowledged
DigiCert got back to me, and it appears they will provide these. We need a set of CSRs to attempt to get them to confirm that they are signed by the right intermediary
DigiCert provided us with new certificates, one per domain, and after confirming they were indeed signed by the hoped-for DigiCert CA that was part of Tor's pinned certificate list, I installed them manually on pauli and adjusted our Puppet configuration to deploy them to the web hosts.
I created #41694 (closed) to follow-up on eventually switching these certificates back to Let's Encrypt.
@arma pointed out today that 2019.www.torproject.org is now also showing a pinning error.
Looking back into Firefox's pinning code, it seems like all the domains with the exception of torproject.org have an "include subdomains" propertyenabled, which is likely the cause here.
I looked at our zone file and it doesn't look like we have any active subdomains on the others.