... | ... | @@ -387,6 +387,18 @@ Basically, Prometheus is similar to Munin in many ways: |
|
|
without sending duplicate alerts - `munin-limits` can only run on a
|
|
|
single server
|
|
|
|
|
|
## Migrating from Icinga / Nagios
|
|
|
|
|
|
Key metric equivalence:
|
|
|
|
|
|
* uptime: `time()-node_boot_time_seconds` ([source](https://github.com/m-lab/prometheus-support/issues/91#issuecomment-687785774)) also: count
|
|
|
reboots per day: `changes(process_start_time_seconds[1d])`, see
|
|
|
also [alerting on crash loops](https://www.robustperception.io/alerting-on-crash-loops-with-prometheus/)
|
|
|
* availability: `avg_over_time(up{job="node"}[7d])` ([source](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864#note_2540787))
|
|
|
|
|
|
More ideas in [this issue](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864), followup on the migration in [this
|
|
|
issue](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40755). See also [TPA-RFC-33: Monitoring](policy/tpa-rfc-33-monitoring).
|
|
|
|
|
|
## Push metrics to the Pushgateway
|
|
|
|
|
|
The [Pushgateway][] is setup on the secondary Prometheus server
|
... | ... | @@ -903,6 +915,7 @@ anarcat, but work still remains, see [upstream issue 32](https://github.com/voxp |
|
|
details.
|
|
|
|
|
|
[puppet-prometheus]: https://github.com/voxpupuli/puppet-prometheus/
|
|
|
|
|
|
## Monitoring and testing
|
|
|
|
|
|
Prometheus doesn't have specific tests, but there *is* a test suite in
|
... | ... | |