promtheus2 ran out of disk space
Date: Mon, 24 Oct 2022 11:30:23 +0000
From: nagios@hetzner-hel1-01.torproject.org
To: anarcat+rapports@orangeseeds.org
Subject: ** PROBLEM Service Alert: hetzner-nbg1-02/disk usage - all is CRITICAL **
***** Icinga *****
Notification Type: PROBLEM
Service: disk usage - all
Host: hetzner-nbg1-02
Address: 116.203.55.206
State: CRITICAL
Date/Time: Mon Oct 24 11:30:23 UTC 2022
Additional Info:
DISK CRITICAL - free space: / 1955 MB (5% inode=95%): /dev 1897 MB (100% inode=99%): /dev/shm 1917 MB (99% inode=99%): /run 383 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /tmp 512 MB (100% inode=99%): /boot 327 MB (77% inode=99%): /run/credentials 383 MB (99% inode=99%): /var/tmp 1955 MB (5% inode=95%):
anectodal reports of "prometheus 2 is down" as well
action points:
-
lint rules before merge (already filed as prometheus-alerts#1 (closed), will followup there) -
don't restart prometheus forever (filed https://bugs.debian.org/1022724, https://salsa.debian.org/go-team/packages/prometheus/-/merge_requests/5, deployed an override through puppet) -
prometheus shouldn't flood its logs with WAL notices if there's a syntax error (filed https://github.com/prometheus/prometheus/issues/11486)
Edited by anarcat