Alert timing can be a hard topic to understand in Prometheus alerting,
because there are many components associated with it, and Prometheus
...
...
@@ -2429,6 +2365,106 @@ notification in a particularly flappy alert][].
[in `dispatch.go`, line 460, function `aggrGroup.run()`]:https://github.com/prometheus/alertmanager/blob/e9904f93a7efa063bac628ed0b74184acf1c7401/dispatch/dispatch.go#L460
[mysterious failure to send notification in a particularly flappy alert]:https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/issues/18
## Services
<!-- TODO: open ports, daemons, cron jobs -->
### Monitored services
Those are the actual services monitored by Prometheus.
### Internal server (`prometheus1`)
The "internal" server scrapes all hosts managed by Puppet for
TPA. Puppet installs a [`node_exporter`][] on *all* servers, which
takes care of metrics like CPU, memory, disk usage, time accuracy, and
so on. Then other exporters might be enabled on specific services,
like email or web servers.
Access to the internal server is fairly public: the metrics there are
not considered to be security sensitive and protected by