... | ... | @@ -72,8 +72,11 @@ Once you have an exporter endpoint (say at |
|
|
curl http://example.com:9090/metrics
|
|
|
|
|
|
This should return a number of metrics that change (or not) at each
|
|
|
call. From there on, provide that endpoint to the sysadmins, which
|
|
|
will follow the next procedure to add the metric to Prometheus.
|
|
|
call.
|
|
|
|
|
|
From there on, provide that endpoint to the sysadmins (or someone with
|
|
|
access to the external monitoring server), which will follow the
|
|
|
procedure below to add the metric to Prometheus.
|
|
|
|
|
|
Once the exporter is hooked into Prometheus, you can browse the
|
|
|
metrics directly at: <https://prometheus.torproject.org>. Graphs
|
... | ... | @@ -82,7 +85,41 @@ those need to be created and committed into git by sysadmins to |
|
|
persist, see the [anarcat dashboard directory](https://gitlab.com/anarcat/grafana-dashboards) for more
|
|
|
information.
|
|
|
|
|
|
## Adding metrics for admins
|
|
|
## Adding targets on the external server
|
|
|
|
|
|
Alerts and scrape targets on the external server are managed through a
|
|
|
Git repository called [prometheus-alerts](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts). To add a scrape target:
|
|
|
|
|
|
1. clone the repository
|
|
|
|
|
|
git clone https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/
|
|
|
cd prometheus-alerts
|
|
|
|
|
|
2. assuming you're adding a node exporter, to add the target:
|
|
|
|
|
|
cat > targets.d/node_myproject.yaml <<EOF
|
|
|
# scrape the external node exporters for project Foo
|
|
|
---
|
|
|
- targets:
|
|
|
- targetone.example.com
|
|
|
- targettwo.example.com
|
|
|
|
|
|
3. add, commit, and push:
|
|
|
|
|
|
git checkout -b myproject
|
|
|
git add targets.d
|
|
|
git commit -m"add node exporter targets for my project"
|
|
|
git push origin -u myproject
|
|
|
|
|
|
The last push command should show you the URL where you can submit
|
|
|
your merge request.
|
|
|
|
|
|
After being merged, the changes should propagate within [4 to 6
|
|
|
hours](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/puppet/#cron-and-scheduling).
|
|
|
|
|
|
See also the [targets.d documentation in the git repository](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/tree/main/targets.d).
|
|
|
|
|
|
## Adding targets on the internal server
|
|
|
|
|
|
TODO: talk about `scrape_jobs` for in-puppet configurations.
|
|
|
|
... | ... | @@ -118,11 +155,12 @@ Alerting Overview](https://prometheus.io/docs/alerting/latest/overview/) but I h |
|
|
have instead been following [this tutorial](https://ashish.one/blogs/setup-alertmanager/) which was quite
|
|
|
helpful.
|
|
|
|
|
|
### Adding alerts
|
|
|
### Adding alerts in Puppet
|
|
|
|
|
|
The Alertmanager is currently managed through Puppet, in
|
|
|
`profile::prometheus::server::external`. An alerting rule is defined
|
|
|
like:
|
|
|
The Alertmanager can (but currently isn't, on the external server)
|
|
|
managed through Puppet, in `profile::prometheus::server::external`.
|
|
|
|
|
|
An alerting rule, in Puppet, is defined like:
|
|
|
|
|
|
{
|
|
|
'name' => 'bridgestrap',
|
... | ... | @@ -146,13 +184,21 @@ like: |
|
|
],
|
|
|
},
|
|
|
|
|
|
The key part of the alert is the `expr` setting which is a PromQL
|
|
|
expression that, when evaluated to "true" for more than `5m` (the
|
|
|
`for` settings), will fire an error at the Alertmanager. Also note
|
|
|
the `team` label which will route the message to the right team. Those
|
|
|
routes are defined later, in the `routes` and `receivers` settings.
|
|
|
Note that we might want to move those to Hiera so that we could use
|
|
|
YAML code directly, which would better match the syntax of the actual
|
|
|
alerting rules.
|
|
|
|
|
|
### Adding alerts through Git, on the external server
|
|
|
|
|
|
The external server pulls pulls a [git repository](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/) for alerting and
|
|
|
targets regularly. Alerts can be added through that repository by
|
|
|
adding a file in the `rules.d` directory, see [rules.d](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/tree/main/rules.d) directory
|
|
|
for more documentation on that.
|
|
|
|
|
|
Note that alerts (probably?) do not take effect until a sysadmin
|
|
|
reloads Prometheus.
|
|
|
|
|
|
Note that those might move to separate files and/or Hiera later on.
|
|
|
TODO: confirm how rules are deployed.
|
|
|
|
|
|
### Adding alert recipients
|
|
|
|
... | ... | |