certificate issuance and renewal is broken

in #41613 (closed) i have tried to get a new cert through letsencrypt-domains.git, and it failed:

Click to expand
$ git … push -v origin refs/heads/master\:refs/heads/master
Pushing to nevii.torproject.org:/srv/letsencrypt.torproject.org/repositories/letsencrypt-domains
Writing objects: 100% (3/3), 523 bytes | 523.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: in post-receive hook        
remote: # INFO: Using main config file /srv/letsencrypt.torproject.org/etc/dehydrated-config        
remote: Processing torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Aug 22 00:53:51 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing www.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Aug  7 00:13:49 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing anonticket.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Jul 29 01:01:27 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing archive.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Jul 30 00:49:07 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing arti.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Jul 17 00:29:54 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing atlas.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Jul 30 00:49:50 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing aus1.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Aug  1 00:28:51 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing aus2.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Aug  1 00:29:11 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing blog.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Aug  9 00:57:34 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing bridges.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Aug 26 00:18:47 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing bridges-test.torproject.org        
remote:  + Checking domain name(s) of existing cert... unchanged.        
remote:  + Checking expire date of existing cert...        
remote:  + Valid till Aug  7 00:14:00 2024 GMT (Longer than 30 days). Skipping renew!        
remote: Processing bridges-email.torproject.org        
remote:  + Creating new directory /srv/letsencrypt.torproject.org/etc/certs/bridges-email.torproject.org ...        
remote:  + Signing domains...        
remote:  + Generating private key...        
remote:  + Generating signing request...        
remote:  + Requesting new certificate order from CA...        
remote:  + Received 1 authorizations URLs from the CA        
remote:  + Handling authorization for bridges-email.torproject.org        
remote:  + 1 pending challenge(s)        
remote:  + Deploying challenge tokens...        
remote: Adding challenge '_acme-challenge.bridges-email.torproject.org. 60 IN TXT "Xe_dA6xV3qGnBbnRiIwpEZ9Jo_5zkhxW8fDWZI6JH6M"' for bridges-email.torproject.org.        
remote: 2024-06-12 19:24:03 /srv/dns.torproject.org/bin/update: ***** start of script *****        
remote: 2024-06-12 19:24:03 /srv/dns.torproject.org/bin/update: pre flock        
remote: 2024-06-12 19:24:03 /srv/dns.torproject.org/bin/update: pre update-keys        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre build-services        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre for loop        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for 0.0.0.0.2.0.0.6.7.0.0.0.0.2.6.2.ip6.arpa        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for 30.172.in-addr.arpa        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for 99.8.204.in-addr.arpa        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for onion-router.net        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for rev        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for torproject.com        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for torproject.net        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre write_zonefile for torproject.org        
remote: 2024-06-12 19:24:04 /srv/dns.torproject.org/bin/update: pre dns-update        
remote: 2024-06-12 19:24:05 /srv/dns.torproject.org/bin/update: done!        
remote: 2024-06-12 19:24:05 /srv/dns.torproject.org/bin/update: ***** end of script *****        
remote: Waiting for master to update torproject.org (for _acme-challenge.bridges-email.torproject.org) from 2024061201.  Currently at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote:  SOA nevii.torproject.org. hostmaster.torproject.org. 2024061202 10800 3600 1814400 3601 from server 49.12.57.135 in 0 ms.        
remote:  SOA nevii.torproject.org. hostmaster.torproject.org. 2024061201 10800 3600 1814400 3601 from server 194.58.198.32 in 27 ms.        
remote:  SOA nevii.torproject.org. hostmaster.torproject.org. 2024061202 10800 3600 1814400 3601 from server 89.47.185.6 in 31 ms.        
remote:  SOA nevii.torproject.org. hostmaster.torproject.org. 2024061202 10800 3600 1814400 3601 from server 204.8.99.145 in 131 ms.        
remote:  SOA nevii.torproject.org. hostmaster.torproject.org. 2024061202 10800 3600 1814400 3601 from server 204.8.99.145 in 131 ms.        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote: Waiting for secondaries to update to match master at 2024061202..        
remote:  + Responding to challenge for bridges-email.torproject.org authorization...        
remote:  + Challenge is valid!        
remote:  + Cleaning challenge tokens...        
remote:  + Requesting certificate...        
remote: ERROR: Problem connecting to server (post for ; curl returned with 3)        
To nevii.torproject.org:/srv/letsencrypt.torproject.org/repositories/letsencrypt-domains
   e6762f5..ab7e53f  master -> master
updating local tracking ref 'refs/remotes/origin/master'

rerunning this by hand:

root@nevii:~# /srv/letsencrypt.torproject.org/bin/dehydrated-wrap --cron
[...]
Processing bridges-email.torproject.org
 + Signing domains...
 + Generating private key...
 + Generating signing request...
 + Requesting new certificate order from CA...
 + Received 1 authorizations URLs from the CA
 + Handling authorization for bridges-email.torproject.org
 + Found valid authorization for bridges-email.torproject.org
 + 0 pending challenge(s)
 + Requesting certificate...
ERROR: Problem connecting to server (post for ; curl returned with 3)

i had to actually edit the dehydrated script to remove the -s to the curl call to figure out what's going on (!!!), and it looks like we got banned from let's encrypt for 48 hours:

++ /srv/letsencrypt.torproject.org/bin/le-hook request_failure 429 '{
  "type": "urn:ietf:params:acme:error:rateLimited",
  "detail": "Error creating new order :: too many certificates (5) already issued for this exact set of domains in the last 168 hours: bridges-email.torproject.org, retry after 2024-06-14T04:23:08Z: see https://letsencrypt.org/docs/duplicate-certificate-limit/",
  "status": 429
}' post 'HTTP/2 429 

and it indeed seems like 10 certs were issued today:

https://crt.sh/?q=bridges-email.torproject.org

i managed to rebuild the certs by hand on nevii for that service and deploy the service, amazingly, from the transparency log above, specifically:

https://crt.sh/?id=13370289139

but right now, the forum cert is due to renew and the cron job is failing with the same error.

the timeline is something like this:

  1. 19:24: code added to puppet and letsencrypt-domains.git to deploy new cert, first failure documented above, two first certs issued in transparency logs
  2. 19:27: two more certs issued, possibly manual retries by @anarcat
  3. 19:38: attempt at issuing the cert through puppet
  4. 19:45-19:47: 6 more certs issued

Right now I'm worried our certs will just expire and we'll have no way to renew. I suspect this is related to the transition to hosting TLS certs in Puppet (#41610).

@weasel any idea what's going on?

Edited by anarcat