priority B metrics and alerts deployment
Deploy the following metrics/exporters with alerting rules:
- node exporter: load, uptime, swap, NTP, systemd
- blackbox: ping
- textfile: LDAP freshness
- ganeti exporter: running instances, cluster verification?
- unbound resolvers: ?
- puppet exporter: last run time, catalog failures
Quote from TPA-RFC-33:
We assign each Icinga check an exporter and a priority:
- A: must have, should be completed before Icinga is shutdown, as soon as possible
- B: should have, would ideally be done before Icinga is shutdown, but we can live without it for a while
- C: nice to have, we can live without it
- D: drop, we wouldn't even keep checking this in Icinga if we kept it
- E: what on earth is this thing and how do we deal with it, to review
In the appendix, the Icinga checks inventory lists every Icinga check and what should happen with it.
Summary:
Kind Checks A B C D E Exporters existing 8 4 4 1 missing, existing exporter 8 5 3 3 missing, new exporters 8 4 4 8 DNS 7 1 6 3? To investigate 4 2 1 1 1 existing, 2 new? dropped 8 8 0 delegated to service admins 4 4 4? new exporters 0 14 (priority C) Checks by alerting levels:
- warning: 31
- critical: 3
- dropped: 12
See also #41633