diff --git a/tsa/howto/prometheus.mdwn b/tsa/howto/prometheus.mdwn index b12021250962b90b8c8d7a66331b4748196c1950..587f7ff2e47445db7031ba86654c2b9c97a3159d 100644 --- a/tsa/howto/prometheus.mdwn +++ b/tsa/howto/prometheus.mdwn @@ -12,6 +12,8 @@ layer on top (see [[Grafana]]). # Tutorial +## Looking at pretty graphs + The Prometheus web interface is available at: <https://prometheus.torproject.org> @@ -22,6 +24,9 @@ over the last two weeks for the known servers. [this link]: https://prometheus1.torproject.org/graph?g0.range_input=2w&g0.expr=node_load5&g0.tab=0 +The Prometheus web interface is crude: it's better to use [[grafana]] +dashboards for most purposes other than debugging. + # How-to ## Pager playbook @@ -138,6 +143,56 @@ policies. [allow scrape job collection]: https://github.com/voxpupuli/puppet-prometheus/pull/304 [Prometheus Puppet module]: https://github.com/voxpupuli/puppet-prometheus/ +### Manual node configuration + +External services can be monitored by Prometheus, as long as they +comply with the [OpenMetrics][] protocol, which is simply to expose +metrics such as this over HTTP: + + metric{label=label_val} value + +A real-life (simplified) example: + + node_filesystem_avail_bytes{alias="alberti.torproject.org",device="/dev/sda1",fstype="ext4",mountpoint="/"} 16160059392 + +The above says that the node alberti has the device `/dev/sda` mounted +on `/`, formatted as an `ext4` filesystem which has 16160059392 bytes +(~16GB) free. + + [OpenMetrics]: https://openmetrics.io/ + +System-level metrics can easily be monitored by the secondary +Prometheus server. This is usually done by installing the "node +exporter", with the following steps: + + * On Debian Buster and later: + + apt install prometheus-node-exporter + + * On Debian stretch: + + apt install -t stretch-backports prometheus-node-exporter + + ... assuming that backports is already configured. if it isn't, such a line in `/etc/apt/sources.list.d/backports.debian.org.list` should suffice: + + deb https://deb.debian.org/debian/ stretch-backports main contrib non-free + + ... followed by an `apt update`, naturally. + +The firewall on the machine needs to allow traffic on the exporter +port from the server `prometheus2.torproject.org`. Then [open a +ticket][new-ticket] for TPA to configure the target. Make sure to +mention: + + * the hostname for the exporter + * the port of the exporter (varies according to the exporter, 9100 + for the node exporter) + * how often to scrape the target, if non-default (default: 15s) + +Then TPA needs to hook those as part of a new node `job` in the +`scrape_configs`, in `prometheus.yml`, from Puppet, in +`profile::prometheus::server`. + ## SLA Prometheus is currently not doing alerting so it doesn't have any sort @@ -172,10 +227,10 @@ and the Alertmanager can be configured with High availability. ## Issues -There is no issue tracker specifically for this project, [File][] or +There is no issue tracker specifically for this project, [File][new-ticket] or [search][] for issues in the [generic internal services][search] component. - [File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team + [new-ticket]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team [search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team ## Monitoring and testing