move playbook/disaster recover below

this regroups the other Howtos

move playbook/disaster recover below
8d27cba3 · anarcat · 6b6e3aa1 · 8d27cba3
Unverified Commit 8d27cba3 authored 3 years ago by anarcat
--- a/howto/prometheus.md
+++ b/howto/prometheus.md
@@ -226,82 +226,6 @@ use the [amtool](https://manpages.debian.org/amtool.1) command. A few useful com
   --comment="working on it" ALERTNAME`: silence alert ALERTNAME for
   an hour, with some comments

-## Pager playbook
-
-TBD.
-
-### Troubleshooting missing metrics
-
-If metrics do not correctly show up in Grafana, it might be worth
-checking in the [Prometheus dashboard](https://prometheus.torproject.org/) itself for the same
-metrics. Typically, if they do not show up in Grafana, they won't show
-up in Prometheus either, but it's worth a try, even if only to see the
-raw data.
-
-Then, if data truly isn't present in Prometheus, you can track down
-the "target" (the exporter) responsible for it in the [`/targets`][]
-listing. If the target is "unhealthy", it will be marked in red and an
-error message will show up.
-
-[`/targets`]: https://prometheus.torproject.org/targets
-
-If the target is marked healthy, the next step is to scrape the
-metrics manually. This, for example, will scrape the Apache exporter
-from the host `gayi`:
-
-    curl -s http://gayi.torproject.org:9117/metrics | grep apache
-
-In the case of [this bug](https://github.com/voxpupuli/puppet-prometheus/pull/541), the metrics were not showing up at all:
-
-    root@hetzner-nbg1-01:~# curl -s http://gayi.torproject.org:9117/metrics | grep apache
-    # HELP apache_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which apache_exporter was built.
-    # TYPE apache_exporter_build_info gauge
-    apache_exporter_build_info{branch="",goversion="go1.7.4",revision="",version=""} 1
-    # HELP apache_exporter_scrape_failures_total Number of errors while scraping apache.
-    # TYPE apache_exporter_scrape_failures_total counter
-    apache_exporter_scrape_failures_total 18371
-    # HELP apache_up Could the apache server be reached
-    # TYPE apache_up gauge
-    apache_up 0
-
-Notice, however, the `apache_exporter_scrape_failures_total`, which
-was incrementing. From there, we reproduced the work the exporter was
-doing manually and fixed the issue, which involved passing the correct
-argument to the exporter.
-
-### Pushgateway errors
-
-The Pushgateway web interface provides some basic information about
-the metrics it collects, and allow you to view the pending metrics
-before they get scraped by Prometheus, which may be useful to
-troubleshoot issues with the gateway.
-
-To pull metrics by hand, you can pull directly from the pushgateway:
-
-    curl localhost:9091/metrics
-
-If you get this error while pulling metrics from the exporter:
-
-    An error has occurred while serving metrics:
-
-    collected metric "some_metric" { label:<name:"instance" value:"" > label:<name:"job" value:"some_job" > label:<name:"tag" value:"val1" > counter:<value:1 > } was collected before with the same name and label values
-
-It's because similar metrics were sent twice into the gateway, which
-corrupts the state of the pushgateway, a [known problems](https://github.com/prometheus/pushgateway/issues/232) in
-earlier versions and [fixed in 0.10](https://github.com/prometheus/pushgateway/pull/290) (Debian bullseye and later). A
-workaround is simply to restart the Pushgateway (and clear the
-storage, if persistence is enabled, see the `--persistence.file`
-flag).
-
-## Disaster recovery
-
-If a Prometheus/Grafana is destroyed, it should be compltely
-rebuildable from Puppet. Non-configuration data should be restored
-from backup, with `/var/lib/prometheus/` being sufficient to
-reconstruct history. If even backups are destroyed, history will be
-lost, but the server should still recover and start tracking new
-metrics.
-
 ## Migrating from Munin

 Here's a quick cheat sheet from people used to Munin and switching to
@@ -395,6 +319,82 @@ configured, inside Puppet, in `profile::prometheus::server::external`.
 Note that it's [not possible to push timestamps](https://github.com/prometheus/pushgateway#about-timestamps) into the
 Pushgateway, so it's not useful to ingest past historical data.

+## Pager playbook
+
+TBD.
+
+### Troubleshooting missing metrics
+
+If metrics do not correctly show up in Grafana, it might be worth
+checking in the [Prometheus dashboard](https://prometheus.torproject.org/) itself for the same
+metrics. Typically, if they do not show up in Grafana, they won't show
+up in Prometheus either, but it's worth a try, even if only to see the
+raw data.
+
+Then, if data truly isn't present in Prometheus, you can track down
+the "target" (the exporter) responsible for it in the [`/targets`][]
+listing. If the target is "unhealthy", it will be marked in red and an
+error message will show up.
+
+[`/targets`]: https://prometheus.torproject.org/targets
+
+If the target is marked healthy, the next step is to scrape the
+metrics manually. This, for example, will scrape the Apache exporter
+from the host `gayi`:
+
+    curl -s http://gayi.torproject.org:9117/metrics | grep apache
+
+In the case of [this bug](https://github.com/voxpupuli/puppet-prometheus/pull/541), the metrics were not showing up at all:
+
+    root@hetzner-nbg1-01:~# curl -s http://gayi.torproject.org:9117/metrics | grep apache
+    # HELP apache_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which apache_exporter was built.
+    # TYPE apache_exporter_build_info gauge
+    apache_exporter_build_info{branch="",goversion="go1.7.4",revision="",version=""} 1
+    # HELP apache_exporter_scrape_failures_total Number of errors while scraping apache.
+    # TYPE apache_exporter_scrape_failures_total counter
+    apache_exporter_scrape_failures_total 18371
+    # HELP apache_up Could the apache server be reached
+    # TYPE apache_up gauge
+    apache_up 0
+
+Notice, however, the `apache_exporter_scrape_failures_total`, which
+was incrementing. From there, we reproduced the work the exporter was
+doing manually and fixed the issue, which involved passing the correct
+argument to the exporter.
+
+### Pushgateway errors
+
+The Pushgateway web interface provides some basic information about
+the metrics it collects, and allow you to view the pending metrics
+before they get scraped by Prometheus, which may be useful to
+troubleshoot issues with the gateway.
+
+To pull metrics by hand, you can pull directly from the pushgateway:
+
+    curl localhost:9091/metrics
+
+If you get this error while pulling metrics from the exporter:
+
+    An error has occurred while serving metrics:
+
+    collected metric "some_metric" { label:<name:"instance" value:"" > label:<name:"job" value:"some_job" > label:<name:"tag" value:"val1" > counter:<value:1 > } was collected before with the same name and label values
+
+It's because similar metrics were sent twice into the gateway, which
+corrupts the state of the pushgateway, a [known problems](https://github.com/prometheus/pushgateway/issues/232) in
+earlier versions and [fixed in 0.10](https://github.com/prometheus/pushgateway/pull/290) (Debian bullseye and later). A
+workaround is simply to restart the Pushgateway (and clear the
+storage, if persistence is enabled, see the `--persistence.file`
+flag).
+
+## Disaster recovery
+
+If a Prometheus/Grafana is destroyed, it should be compltely
+rebuildable from Puppet. Non-configuration data should be restored
+from backup, with `/var/lib/prometheus/` being sufficient to
+reconstruct history. If even backups are destroyed, history will be
+lost, but the server should still recover and start tracking new
+metrics.
+
 # Reference

 ## Installation