... | ... | @@ -249,6 +249,39 @@ Then TPA needs to hook those as part of a new node `job` in the |
|
|
`scrape_configs`, in `prometheus.yml`, from Puppet, in
|
|
|
`profile::prometheus::server`.
|
|
|
|
|
|
## Monitored services
|
|
|
|
|
|
Those are the actual services monitored by Prometheus.
|
|
|
|
|
|
### Internal server (prometheus1)
|
|
|
|
|
|
The "internal" server scrapes all hosts managed by Puppet for
|
|
|
TPA. Puppet installs a [`node_exporter`](https://github.com/prometheus/node_exporter) on *all* servers, which
|
|
|
takes care of metrics like CPU, memory, disk usage, time accuracy, and
|
|
|
so on. Then other exporters might be enabled on specific services,
|
|
|
like email or web servers.
|
|
|
|
|
|
Access to the internal server is fairly public: the metrics there are
|
|
|
not considered to be security sensitive and protected by
|
|
|
authentication only to keep bots away.
|
|
|
|
|
|
### External server (prometheus2)
|
|
|
|
|
|
The "external" server, on the other hand, is more restrictive and does
|
|
|
not allow public access. This is out of concern that specific metrics
|
|
|
might lead to timing attacks against the network and/or leak sensitive
|
|
|
information. The external server also explicitly does *not* scrape TPA
|
|
|
servers automatically: it only scrapes certain services that are
|
|
|
manually configured by TPA.
|
|
|
|
|
|
Those are the services currently monitored by the external server:
|
|
|
|
|
|
* [bridgestrap](https://bridges.torproject.org/bridgestrap-metrics)
|
|
|
* [rdsys](https://bridges.torproject.org/rdsys-backend-metrics)
|
|
|
* OnionPerf external nodes' `node_exporter`s
|
|
|
* connectivity test on (some?) bridges (using the
|
|
|
[`blackbox_exporter`](https://github.com/prometheus/blackbox_exporter/))
|
|
|
|
|
|
## SLA
|
|
|
|
|
|
Prometheus is currently not doing alerting so it doesn't have any sort
|
... | ... | |