Skip to content
Snippets Groups Projects
Verified Commit 7af6da4b authored by anarcat's avatar anarcat
Browse files

review all nagios metrics (#40755)

parent 77c9f74a
No related branches found
No related tags found
No related merge requests found
Pipeline #167408 passed with warnings
......@@ -394,18 +394,6 @@ Basically, Prometheus is similar to Munin in many ways:
without sending duplicate alerts - `munin-limits` can only run on a
single server
## Migrating from Icinga / Nagios
Key metric equivalence:
* uptime: `time()-node_boot_time_seconds` ([source](https://github.com/m-lab/prometheus-support/issues/91#issuecomment-687785774)) also: count
reboots per day: `changes(process_start_time_seconds[1d])`, see
also [alerting on crash loops](https://www.robustperception.io/alerting-on-crash-loops-with-prometheus/)
* availability: `avg_over_time(up{job="node"}[7d])` ([source](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864#note_2540787))
More ideas in [this issue](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864), followup on the migration in [this
issue](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40755). See also [TPA-RFC-33: Monitoring](policy/tpa-rfc-33-monitoring).
## Push metrics to the Pushgateway
The [Pushgateway][] is setup on the secondary Prometheus server
......
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment