Loading howto/prometheus.md +33 −0 Original line number Diff line number Diff line Loading @@ -249,6 +249,39 @@ Then TPA needs to hook those as part of a new node `job` in the `scrape_configs`, in `prometheus.yml`, from Puppet, in `profile::prometheus::server`. ## Monitored services Those are the actual services monitored by Prometheus. ### Internal server (prometheus1) The "internal" server scrapes all hosts managed by Puppet for TPA. Puppet installs a [`node_exporter`](https://github.com/prometheus/node_exporter) on *all* servers, which takes care of metrics like CPU, memory, disk usage, time accuracy, and so on. Then other exporters might be enabled on specific services, like email or web servers. Access to the internal server is fairly public: the metrics there are not considered to be security sensitive and protected by authentication only to keep bots away. ### External server (prometheus2) The "external" server, on the other hand, is more restrictive and does not allow public access. This is out of concern that specific metrics might lead to timing attacks against the network and/or leak sensitive information. The external server also explicitly does *not* scrape TPA servers automatically: it only scrapes certain services that are manually configured by TPA. Those are the services currently monitored by the external server: * [bridgestrap](https://bridges.torproject.org/bridgestrap-metrics) * [rdsys](https://bridges.torproject.org/rdsys-backend-metrics) * OnionPerf external nodes' `node_exporter`s * connectivity test on (some?) bridges (using the [`blackbox_exporter`](https://github.com/prometheus/blackbox_exporter/)) ## SLA Prometheus is currently not doing alerting so it doesn't have any sort Loading Loading
howto/prometheus.md +33 −0 Original line number Diff line number Diff line Loading @@ -249,6 +249,39 @@ Then TPA needs to hook those as part of a new node `job` in the `scrape_configs`, in `prometheus.yml`, from Puppet, in `profile::prometheus::server`. ## Monitored services Those are the actual services monitored by Prometheus. ### Internal server (prometheus1) The "internal" server scrapes all hosts managed by Puppet for TPA. Puppet installs a [`node_exporter`](https://github.com/prometheus/node_exporter) on *all* servers, which takes care of metrics like CPU, memory, disk usage, time accuracy, and so on. Then other exporters might be enabled on specific services, like email or web servers. Access to the internal server is fairly public: the metrics there are not considered to be security sensitive and protected by authentication only to keep bots away. ### External server (prometheus2) The "external" server, on the other hand, is more restrictive and does not allow public access. This is out of concern that specific metrics might lead to timing attacks against the network and/or leak sensitive information. The external server also explicitly does *not* scrape TPA servers automatically: it only scrapes certain services that are manually configured by TPA. Those are the services currently monitored by the external server: * [bridgestrap](https://bridges.torproject.org/bridgestrap-metrics) * [rdsys](https://bridges.torproject.org/rdsys-backend-metrics) * OnionPerf external nodes' `node_exporter`s * connectivity test on (some?) bridges (using the [`blackbox_exporter`](https://github.com/prometheus/blackbox_exporter/)) ## SLA Prometheus is currently not doing alerting so it doesn't have any sort Loading