prometheus: add more details about how blackbox jobs+targets are set authored by lelutin's avatar lelutin
We have some of those details in the `blackbox_exporter` section and
really it's related to how scrape targets are defined.

Also, to make the section more digestible, the section about configuring
blackbox targets is now split between scrape jobs vs targets and a bit
more context is given about the two.
......@@ -361,24 +361,49 @@ Those rules are declared on the server, in `prometheus::prometheus::server::inte
### Adding a `blackbox` target
Adding targets to the `blackbox` exporter is rather more involved and
complicated than a normal target.
Most exporters are pretty straightforward: a service binds to a port and exposes
metrics through HTTP requests on that port, generally on the `/metrics` URL.
For example, this is how the ssh scrape jobs (in
`modules/profile/manifests/ssh.pp`) are created:
The blackbox exporter is a special case for exporters: it is scraped by
Prometheus via multiple scrape jobs and each scrape job has targets defined.
@@prometheus::scrape_job { "blackbox_ssh_banner_${facts['networking']['fqdn']}":
job_name => 'blackbox_ssh_banner',
targets => [ "${facts['networking']['fqdn']}:22" ],
labels => {
'alias' => $facts['networking']['fqdn'],
'team' => 'TPA',
},
}
Each scrape job represents one type of check (e.g. TCP connections, HTTP
requests, ICMP ping, etc) that the blackbox exporter is
launching and each target is a host or URL or other "address" that the exporter
will try to reach. The check will be initiated from the host running the
blackbox exporter to the target at the moment the Prometheus server is scraping
the exporter.
The `blackbox_exporter` is rather peculiar and counter-intuitive, see
the [how to debug the `blackbox_exporter`](#debugging-blackbox_exporter) for
more information.
#### Scrape jobs
But because this is a `blackbox` configuration, the `scrape_configs`
configuration is more involved, as it needs to define the
`relabel_configs` element that make the `blackbox_exporter` work:
In Prometheus's point of view, two informations are needed:
* the address and port of the host where Prometheus can reach the blackbox exporter
* the target (and possibly the port tested) that the exporter will try to reach
Prometheus transfers the information above to the exporter via two labels:
* `__address__` is used to determine how Prometheus can reach the exporter. This
is standard, but because of how we create the blackbox targets, it will
initially contain the address of the blackbox target instead of the
exporter's. So we need to shuffle label values around in order for the
`__address__` label to contain the correct value.
* `__param_target` is used by the blackbox exporter to determine
what it should contact when running its test, i.e. what is the target of the
check. So that's the address (and port) of the blackbox target.
The reshuffling of labels mentioned above is achieved with the `relabel_configs`
option for the scrape job.
For TPA-managed services, we define this scrape jobs in hiera in
`common/prometheus.yml` under keys named `collect_scrape_jobs`. Jobs in those
keys expect targets to be exported by other parts of the puppet code.
For example, here's how the ssh scrape job is configured:
- job_name: 'blackbox_ssh_banner'
metrics_path: '/probe'
......@@ -396,8 +421,10 @@ configuration is more involved, as it needs to define the
replacement: 'localhost:9115'
Scrape jobs for non-TPA services are defined in Hiera under keys named
`scrape_configs` in `hiera/common/prometheus.yaml`. Here's one example of such a
scrape job definition:
`scrape_configs` in `hiera/common/prometheus.yaml`. Jobs in those keys expect to
find their targets in files on the Prometheus server, through the
`prometheus-alerts` repository. Here's one example of such a scrape job
definition:
profile::prometheus::server::external::scrape_configs:
# generic blackbox exporters from any team
......@@ -417,9 +444,41 @@ scrape job definition:
- target_label: __address__
replacement: localhost:9115
The `blackbox_exporter` is rather peculiar and counter-intuitive, see
the [`blackbox_exporter` reference section](#blackbox-exporter) for
more information.
In both of the examples, the `relabel_configs` starts by copying the target's
address into the `__param_target` label. It also populates the `instance` label
with the same value since that label is used in alerts and graphs to display
information. Finally, the `__address__` label is overridden with the address
where Prometheus can reach the exporter.
#### Targets
TPA-managed services use puppet exported resources in the appropriate profiles.
The `targets` parameter is used to convey information about the blackbox
exporter target (the host being tested by the exporter).
For example, this is how the ssh scrape jobs (in
`modules/profile/manifests/ssh.pp`) are created:
@@prometheus::scrape_job { "blackbox_ssh_banner_${facts['networking']['fqdn']}":
job_name => 'blackbox_ssh_banner',
targets => [ "${facts['networking']['fqdn']}:22" ],
labels => {
'alias' => $facts['networking']['fqdn'],
'team' => 'TPA',
},
}
For non-TPA services, the targets need to be defined in the `prometheus-alerts`
repository.
The targets defined this way for blackbox exporter look exactly like normal
prometheus targets, except that they define what the blackbox exporter will try
to reach. The targets can be `hostname:port` pairs or URLs, depending on the
nature of the type of check being defined.
See [documentation for targets in the
repository](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/main/targets.d/README.md)
for more details
## Writing an alert
......@@ -1966,43 +2025,7 @@ would otherwise be around long enough for Prometheus to scrape their
metrics. We use it as a workaround to bridge Metrics data with
Prometheus/Grafana.
## `blackbox_exporter`
Most exporters are pretty straightforward: a service binds to a port and exposes
metrics through HTTP requests on that port, generally on the `/metrics` URL.
The `blackbox_exporter`, however, is a little bit more contrived. The exporter can
be configured to run a bunch of different tests (e.g. TCP connections, HTTP
requests, ICMP ping, etc) for a list of targets of its own. So the Prometheus
server has one target, the host with the port for the `blackbox_exporter`, but
that exporter in turn is set to check other hosts.
This is done with the [`relabel_config`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) setting in the job
definition, like this:
relabel_configs:
- source_labels:
- '__address__'
target_label: '__param_target'
- source_labels:
- '__param_target'
target_label: 'instance'
- target_label: '__address__'
replacement: 'localhost:9115'
What this does is replace the `__address__` magic label with
`localhost:9115` so that Prometheus, instead of trying to scrape the
`instance` directly, will scrape the `blackbox_exporter` (here running
on `localhost:9115`), and *pass it the `instance` through the
`__param_target` parameter. (Actually, Prometheus converts
`__param_target` to `target=` on the scrape job.)
TODO: mention blackbox alias:
```
+ - source_labels: [__param_target]
+ target_label: alias
```
## Debugging `blackbox_exporter`
The [upstream documentation][] has some details that can help. We also
have examples [above][] for how to configure it in our setup.
......
......