promtheus2 ran out of disk space

Date: Mon, 24 Oct 2022 11:30:23 +0000
From: nagios@hetzner-hel1-01.torproject.org
To: anarcat+rapports@orangeseeds.org
Subject: ** PROBLEM Service Alert: hetzner-nbg1-02/disk usage - all is CRITICAL **

***** Icinga *****

Notification Type: PROBLEM

Service: disk usage - all
Host: hetzner-nbg1-02
Address: 116.203.55.206
State: CRITICAL

Date/Time: Mon Oct 24 11:30:23 UTC 2022

Additional Info:

DISK CRITICAL - free space: / 1955 MB (5% inode=95%): /dev 1897 MB (100% inode=99%): /dev/shm 1917 MB (99% inode=99%): /run 383 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /tmp 512 MB (100% inode=99%): /boot 327 MB (77% inode=99%): /run/credentials 383 MB (99% inode=99%): /var/tmp 1955 MB (5% inode=95%):

anectodal reports of "prometheus 2 is down" as well

/cc @meskio @hiro

action points:

lint rules before merge (already filed as prometheus-alerts#1 (closed), will followup there)
don't restart prometheus forever (filed https://bugs.debian.org/1022724, https://salsa.debian.org/go-team/packages/prometheus/-/merge_requests/5, deployed an override through puppet)
prometheus shouldn't flood its logs with WAL notices if there's a syntax error (filed https://github.com/prometheus/prometheus/issues/11486)

Edited Oct 24, 2022 by anarcat