prom: fix link references, again authored by anarcat's avatar anarcat
...@@ -15,20 +15,20 @@ layer on top (see [Grafana][]). ...@@ -15,20 +15,20 @@ layer on top (see [Grafana][]).
## Training course plan ## Training course plan
- Where can I find documentation? In the wiki, in [Prometheus](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus) and - Where can I find documentation? In the wiki, in [Prometheus service
[Grafana](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/grafana) page][] (this page) but also the [Grafana service page][]
- Where do I reach the different web sites for the monitoring service? - Where do I reach the different web sites for the monitoring service?
See the [web dashboards section](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#web-dashboards) See the [web dashboards section][]
- Where do i watch for alerts? Join the `#tor-alerts` IRC channel! See - Where do i watch for alerts? Join the `#tor-alerts` IRC channel! See
also [how to access alerting history](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#checking-alert-history) also [how to access alerting history][]
- How can we use silences to prevent some alerts from firing? See - How can we use silences to prevent some alerts from firing? See
[Silencing an alert in advance](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#silencing-an-alert-in-advance) and following [Silencing an alert in advance][] and following
- [Architecture overview](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#design) - [Architecture overview][]
- [Alerting philosophy](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alerting-philosophy) - [Alerting philosophy][]
- [Adding metrics](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#adding-metrics-to-applications) - [Adding metrics][]
- [How to add alerts](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#writing-an-alert) - [How to add alerts][]
- [Queries cheat sheet](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#queries-cheat-sheet) - [Queries cheat sheet][]
- [Alert debugging](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alert-debugging): - [Alert debugging][]:
- Alert unit tests - Alert unit tests
- Alert routing tests - Alert routing tests
- Ensuring the tags required for routing are there - Ensuring the tags required for routing are there
...@@ -38,6 +38,18 @@ layer on top (see [Grafana][]). ...@@ -38,6 +38,18 @@ layer on top (see [Grafana][]).
- %"TPA-RFC-33-B: Prometheus server merge, more exporters" - %"TPA-RFC-33-B: Prometheus server merge, more exporters"
- %"TPA-RFC-33-C: Prometheus high availability, long term metrics, other exporters" - %"TPA-RFC-33-C: Prometheus high availability, long term metrics, other exporters"
[Alert debugging]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alert-debugging
[Queries cheat sheet]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#queries-cheat-sheet
[How to add alerts]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#writing-an-alert
[Adding metrics]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#adding-metrics-to-applications
[Alerting philosophy]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alerting-philosophy
[Architecture overview]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#design
[Silencing an alert in advance]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#silencing-an-alert-in-advance
[how to access alerting history]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#checking-alert-history
[web dashboards section]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#web-dashboards
[Grafana service page]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/grafana
[Prometheus service page]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus
## Web dashboards ## Web dashboards
The main Prometheus web interface is available at: The main Prometheus web interface is available at:
...@@ -400,9 +412,11 @@ blackbox exporter to the target at the moment the Prometheus server is scraping ...@@ -400,9 +412,11 @@ blackbox exporter to the target at the moment the Prometheus server is scraping
the exporter. the exporter.
The blackbox exporter is rather peculiar and counter-intuitive, see The blackbox exporter is rather peculiar and counter-intuitive, see
the [how to debug the blackbox exporter](#debugging-blackbox-exporter) for the [how to debug the blackbox exporter][] for
more information. more information.
[how to debug the blackbox exporter]: #debugging-blackbox-exporter
#### Scrape jobs #### Scrape jobs
In Prometheus's point of view, two information are needed: In Prometheus's point of view, two information are needed:
...@@ -501,9 +515,9 @@ Prometheus targets, except that they define what the blackbox exporter will try ...@@ -501,9 +515,9 @@ Prometheus targets, except that they define what the blackbox exporter will try
to reach. The targets can be `hostname:port` pairs or URLs, depending on the to reach. The targets can be `hostname:port` pairs or URLs, depending on the
nature of the type of check being defined. nature of the type of check being defined.
See [documentation for targets in the See [documentation for targets in the repository][] for more details
repository](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/main/targets.d/README.md)
for more details [documentation for targets in the repository]: https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/main/targets.d/README.md
## Writing an alert ## Writing an alert
...@@ -527,7 +541,9 @@ Prometheus query that should evaluate to "true" (non-zero) for the ...@@ -527,7 +541,9 @@ Prometheus query that should evaluate to "true" (non-zero) for the
alert to fire. alert to fire.
Here is, for example, the first alert in the [`rules.d/tpa_node.rules` Here is, for example, the first alert in the [`rules.d/tpa_node.rules`
file](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/21d67a21ce9926b2eeef0e14b04bb317fb5c94c0/rules.d/tpa_node.rules): file][]:
[`rules.d/tpa_node.rules` file]: https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/21d67a21ce9926b2eeef0e14b04bb317fb5c94c0/rules.d/tpa_node.rules
``` ```
- alert: JobDown - alert: JobDown
...@@ -672,7 +688,7 @@ built-in functions][]. ...@@ -672,7 +688,7 @@ built-in functions][].
[Prometheus template reference]: https://prometheus.io/docs/prometheus/latest/configuration/template_reference/ [Prometheus template reference]: https://prometheus.io/docs/prometheus/latest/configuration/template_reference/
[Alertmanager template reference]: https://prometheus.io/docs/alerting/latest/notifications/ [Alertmanager template reference]: https://prometheus.io/docs/alerting/latest/notifications/
[limited set of built-in functions]: https://pkg.go.dev/text/template#hdr-Functions [Limited set of built-in functions]: https://pkg.go.dev/text/template#hdr-Functions
[Golang templates]: https://pkg.go.dev/text/template [Golang templates]: https://pkg.go.dev/text/template
### Writing a playbook ### Writing a playbook
...@@ -840,7 +856,6 @@ space left, to avoid warning about normal write spikes. ...@@ -840,7 +856,6 @@ space left, to avoid warning about normal write spikes.
[metrics in your application]: #adding-metrics-to-applications [metrics in your application]: #adding-metrics-to-applications
[scraped by Prometheus]: #adding-scrape-targets [scraped by Prometheus]: #adding-scrape-targets
[Alerting philosophy]: #alerting-philosophy
[alerting rule]: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/ [alerting rule]: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
[recording rules documentation]: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#recording-rules [recording rules documentation]: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#recording-rules
[aggregation operators]: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators [aggregation operators]: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators
...@@ -1024,9 +1039,11 @@ below. ...@@ -1024,9 +1039,11 @@ below.
If you can't access the dashboard at all or if the above seems too If you can't access the dashboard at all or if the above seems too
complicated, [Grafana][] can be used as a debugging tool for metrics complicated, [Grafana][] can be used as a debugging tool for metrics
as well. In the [Explore](https://grafana.torproject.org/explore) section, you can input Prometheus as well. In the [Explore][] section, you can input Prometheus
metrics, with auto-completion, and inspect the output directly. metrics, with auto-completion, and inspect the output directly.
[Explore]: https://grafana.torproject.org/explore
There's also the [Grafana availability dashboard][], see the [Alerting There's also the [Grafana availability dashboard][], see the [Alerting
dashboards][] section for details. dashboards][] section for details.
... ...
......