Changes
Page history
prom: fix link references, again
authored
Oct 05, 2024
by
anarcat
Show whitespace changes
Inline
Side-by-side
service/prometheus.md
View page @
173589f0
...
...
@@ -15,20 +15,20 @@ layer on top (see [Grafana][]).
## Training course plan
-
Where can I find documentation? In the wiki, in
[
Prometheus
](
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus
)
and
[
Grafana
](
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/grafana
)
-
Where can I find documentation? In the wiki, in
[
Prometheus
service
page
][]
(this page) but also the
[
Grafana service page
][]
-
Where do I reach the different web sites for the monitoring service?
See the
[
web dashboards section
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#web-dashboards
)
See the
[
web dashboards section
]
[]
-
Where do i watch for alerts? Join the
`#tor-alerts`
IRC channel! See
also
[
how to access alerting history
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#checking-alert-history
)
also
[
how to access alerting history
]
[]
-
How can we use silences to prevent some alerts from firing? See
[
Silencing an alert in advance
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#silencing-an-alert-in-advance
)
and following
-
[
Architecture overview
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#design
)
-
[
Alerting philosophy
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alerting-philosophy
)
-
[
Adding metrics
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#adding-metrics-to-applications
)
-
[
How to add alerts
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#writing-an-alert
)
-
[
Queries cheat sheet
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#queries-cheat-sheet
)
-
[
Alert debugging
]
(
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alert-debugging
)
:
[
Silencing an alert in advance
]
[]
and following
-
[
Architecture overview
]
[]
-
[
Alerting philosophy
]
[]
-
[
Adding metrics
]
[]
-
[
How to add alerts
]
[]
-
[
Queries cheat sheet
]
[]
-
[
Alert debugging
]
[]
:
-
Alert unit tests
-
Alert routing tests
-
Ensuring the tags required for routing are there
...
...
@@ -38,6 +38,18 @@ layer on top (see [Grafana][]).
-
%"TPA-RFC-33-B: Prometheus server merge, more exporters"
-
%"TPA-RFC-33-C: Prometheus high availability, long term metrics, other exporters"
[
Alert debugging
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alert-debugging
[
Queries cheat sheet
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#queries-cheat-sheet
[
How to add alerts
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#writing-an-alert
[
Adding metrics
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#adding-metrics-to-applications
[
Alerting philosophy
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#alerting-philosophy
[
Architecture overview
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#design
[
Silencing an alert in advance
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#silencing-an-alert-in-advance
[
how to access alerting history
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#checking-alert-history
[
web dashboards section
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus#web-dashboards
[
Grafana service page
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/grafana
[
Prometheus service page
]:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/prometheus
## Web dashboards
The main Prometheus web interface is available at:
...
...
@@ -400,9 +412,11 @@ blackbox exporter to the target at the moment the Prometheus server is scraping
the exporter.
The blackbox exporter is rather peculiar and counter-intuitive, see
the
[
how to debug the blackbox exporter
]
(
#debugging-blackbox-exporter
)
for
the
[
how to debug the blackbox exporter
]
[]
for
more information.
[
how to debug the blackbox exporter
]:
#debugging-blackbox-exporter
#### Scrape jobs
In Prometheus's point of view, two information are needed:
...
...
@@ -501,9 +515,9 @@ Prometheus targets, except that they define what the blackbox exporter will try
to reach. The targets can be
`hostname:port`
pairs or URLs, depending on the
nature of the type of check being defined.
See
[
documentation for targets in the
repository
](
https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/main/targets.d/README.md
)
for more details
See
[
documentation for targets in the
repository
][]
for more details
[
documentation for targets in the repository
]:
https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/main/targets.d/README.md
## Writing an alert
...
...
@@ -527,7 +541,9 @@ Prometheus query that should evaluate to "true" (non-zero) for the
alert to fire.
Here is, for example, the first alert in the
[
`rules.d/tpa_node.rules`
file
](
https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/21d67a21ce9926b2eeef0e14b04bb317fb5c94c0/rules.d/tpa_node.rules
)
:
file
][]
:
[
`rules.d/tpa_node.rules` file
]:
https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/blob/21d67a21ce9926b2eeef0e14b04bb317fb5c94c0/rules.d/tpa_node.rules
```
- alert: JobDown
...
...
@@ -672,7 +688,7 @@ built-in functions][].
[
Prometheus template reference
]:
https://prometheus.io/docs/prometheus/latest/configuration/template_reference/
[
Alertmanager template reference
]:
https://prometheus.io/docs/alerting/latest/notifications/
[
l
imited set of built-in functions
]:
https://pkg.go.dev/text/template#hdr-Functions
[
L
imited set of built-in functions
]:
https://pkg.go.dev/text/template#hdr-Functions
[
Golang templates
]:
https://pkg.go.dev/text/template
### Writing a playbook
...
...
@@ -840,7 +856,6 @@ space left, to avoid warning about normal write spikes.
[
metrics in your application
]:
#adding-metrics-to-applications
[
scraped by Prometheus
]:
#adding-scrape-targets
[
Alerting philosophy
]:
#alerting-philosophy
[
alerting rule
]:
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
[
recording rules documentation
]:
https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/#recording-rules
[
aggregation operators
]:
https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators
...
...
@@ -1024,9 +1039,11 @@ below.
If you can't access the dashboard at all or if the above seems too
complicated,
[
Grafana
][]
can be used as a debugging tool for metrics
as well. In the
[
Explore
]
(
https://grafana.torproject.org/explore
)
section, you can input Prometheus
as well. In the
[
Explore
]
[]
section, you can input Prometheus
metrics, with auto-completion, and inspect the output directly.
[
Explore
]:
https://grafana.torproject.org/explore
There's also the
[
Grafana availability dashboard
][]
, see the
[
Alerting
dashboards
][]
section for details.
...
...
...
...