internal DNSSEC failures (#42308) · Issues · The Tor Project / TPA / TPA team

11:43:56 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://wiki.torproject.org/ is unreachable via HTTPS CRITICAL!
11:43:59 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://karma2.torproject.org/ is unreachable via HTTPS CRITICAL!
11:44:02 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://review.torproject.net/ is unreachable via HTTPS CRITICAL!
11:44:07 -ALERTOR1:#tor-alerts- EntireHosterDown [firing] All probes towards hoster hetzner-fsn1 are failing
11:44:10 -ALERTOR1:#tor-alerts- JobDown [firing] Exporter job "mtail" on srs-dal-01.torproject.org:3903 is down
11:44:36 -ALERTOR1:#tor-alerts- JobDown [firing] Exporter job "minio-bucket" on minio-01.torproject.org:9000 is down
11:45:21 -ALERTOR1:#tor-alerts- JobDown [firing] Exporter job "minio-cluster" on minio-01.torproject.org:9000 is down
11:45:53 -ALERTOR1:#tor-alerts- SystemdFailedUnits [firing] Some systemd units are in failed state on dal-node-03.torproject.org
11:45:53 -ALERTOR1:#tor-alerts- SystemdFailedUnits [firing] Some systemd units are in failed state on metricsdb-01.torproject.org
11:46:25 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://bridges-email.torproject.org/ is unreachable via HTTPS CRITICAL!
11:47:28 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://pages.torproject.net/ is unreachable via HTTPS CRITICAL!
11:47:28 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://rdsys-frontend-01.torproject.org/ is unreachable via HTTPS CRITICAL!
11:47:28 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://tb-build-02.torproject.org/ is unreachable via HTTPS CRITICAL!
11:47:46 <anarchat> i'm going to silence issues
11:47:55 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://dockerhub-mirror.torproject.org/ is unreachable via HTTPS CRITICAL!
11:47:55 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://karma2.torproject.org/ is unreachable via HTTPS CRITICAL!
11:49:06 -ALERTOR1:#tor-alerts- JobDown [firing] Exporter job "mtail" on rdsys-frontend-01.torproject.org:3903 is down
11:49:06 -ALERTOR1:#tor-alerts- JobDown [firing] Exporter job "mtail" on srs-dal-01.torproject.org:3903 is down
11:50:53 -ALERTOR1:#tor-alerts- SystemdFailedUnits [firing] Some systemd units are in failed state on bungei.torproject.org
11:50:53 -ALERTOR1:#tor-alerts- SystemdFailedUnits [firing] Some systemd units are in failed state on dal-node-03.torproject.org
11:50:53 -ALERTOR1:#tor-alerts- SystemdFailedUnits [firing] Some systemd units are in failed state on metricsdb-01.torproject.org
11:52:28 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://pages.torproject.net/ is unreachable via HTTPS CRITICAL!
11:52:28 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://rdsys-frontend-01.torproject.org/ is unreachable via HTTPS CRITICAL!
11:52:28 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://survey.torproject.org/ is unreachable via HTTPS CRITICAL!
11:52:28 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://tb-build-02.torproject.org/ is unreachable via HTTPS CRITICAL!
11:56:25 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://metrics-api.torproject.org/ is unreachable via HTTPS CRITICAL!
11:56:25 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://bridges-email.torproject.org/ is unreachable via HTTPS CRITICAL!
11:57:55 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://test.crm.torproject.org/ is unreachable via HTTPS CRITICAL!
11:57:55 -ALERTOR1:#tor-alerts- HTTPSUnreachable [firing] Website https://dockerhub-mirror.torproject.org/ is unreachable via HTTPS CRITICAL!
11:58:48 -ALERTOR1:#tor-alerts- EntireHosterDown [firing] All probes towards hoster hetzner-fsn1 are failing
11:58:48 -ALERTOR1:#tor-alerts- EntireHosterDown [firing] All probes towards hoster hetzner-nbg1 are failing
11:58:48 -ALERTOR1:#tor-alerts- EntireHosterDown [firing] All probes towards hoster safespring are failing
internal DNSSEC failures

First diagnostic

Current status

Roles

Next steps

Post-mortem

Timeline

Root cause analysis

What went well?

What could have gone better?

Recommendations and related issues