remove references to nagios in our docs (#41816) authored by anarcat's avatar anarcat
We stop short of rewriting all playbooks for Prometheus, and instead
add references to the task of adding playbooks for
everything (prometheus-alerts#16) where we found references to
nagios.
...@@ -1230,10 +1230,6 @@ made public. ...@@ -1230,10 +1230,6 @@ made public.
This section details how the alerting setup mentioned above works. This section details how the alerting setup mentioned above works.
Note that the [Icinga][] service is still in service, but it
is planned to eventually be shut down and replaced by the Prometheus +
Alertmanager setup ([issue 29864][]).
In general, the upstream documentation for alerting starts from [the In general, the upstream documentation for alerting starts from [the
Alerting Overview][] but it can be lacking at times. [This tutorial][] Alerting Overview][] but it can be lacking at times. [This tutorial][]
can be quite helpful in better understanding how things are working. can be quite helpful in better understanding how things are working.
...@@ -2201,10 +2197,8 @@ changed. ...@@ -2201,10 +2197,8 @@ changed.
### Alertmanager ### Alertmanager
The [Alertmanager][] is configured on the external Prometheus server The [Alertmanager][] is configured on the Prometheus servers and is
for the metrics and anti-censorship teams to monitor the health of the used to send alerts over IRC and email.
network. It may eventually also be used to replace or enhance
[Nagios][] ([issue 29864][]).
It is installed through Puppet, in It is installed through Puppet, in
`profile::prometheus::server::external`, but could be moved to its own `profile::prometheus::server::external`, but could be moved to its own
...@@ -2306,9 +2300,7 @@ As you can see, Prometheus is somewhat tailored towards ...@@ -2306,9 +2300,7 @@ As you can see, Prometheus is somewhat tailored towards
[Kubernetes][] but it can be used without it. We're deploying it with [Kubernetes][] but it can be used without it. We're deploying it with
the `file_sd` discovery mechanism, where Puppet collects all exporters the `file_sd` discovery mechanism, where Puppet collects all exporters
into the central server, which then scrapes those exporters every into the central server, which then scrapes those exporters every
`scrape_interval` (by default 15 seconds). The architecture graph also `scrape_interval` (by default 15 seconds).
shows the Alertmanager which could be used to (eventually) replace our
Nagios deployment.
[Kubernetes]: https://kubernetes.io/ [Kubernetes]: https://kubernetes.io/
...@@ -2990,14 +2982,15 @@ publicly. ...@@ -2990,14 +2982,15 @@ publicly.
It was originally thought Prometheus could completely replace It was originally thought Prometheus could completely replace
[Nagios][] as well [issue 29864][], but this turned out to be more [Nagios][] as well [issue 29864][], but this turned out to be more
difficult than planned. The main difficulty is that Nagios checks come difficult than planned.
with builtin threshold of acceptable performance. But Prometheus
metrics are just that: metrics, without thresholds... This makes it The main difficulty is that Nagios checks come with builtin threshold
more difficult to replace Nagios because a ton of alerts need to be of acceptable performance. But Prometheus metrics are just that:
rewritten to replace the existing ones. A lot of reports and metrics, without thresholds... This made it more difficult to replace
functionality built-in to Nagios, like availability reports, Nagios because a ton of alerts had to be rewritten to replace the
acknowledgments and other reports, would need to be re-implemented as existing ones.
well.
This was performed in [TPA-RFC-33][], over the course of 2024 and 2025.
## Security and risk assessment ## Security and risk assessment
... ...
......