# HELP apache_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which apache_exporter was built.
# HELP apache_exporter_scrape_failures_total Number of errors while scraping apache.
# TYPE apache_exporter_scrape_failures_total counter
apache_exporter_scrape_failures_total 18371
# HELP apache_up Could the apache server be reached
# TYPE apache_up gauge
apache_up 0
Notice, however, the `apache_exporter_scrape_failures_total`, which
was incrementing. From there, we reproduced the work the exporter was
doing manually and fixed the issue, which involved passing the correct
argument to the exporter.
### Pushgateway errors
The Pushgateway web interface provides some basic information about
the metrics it collects, and allow you to view the pending metrics
before they get scraped by Prometheus, which may be useful to
troubleshoot issues with the gateway.
To pull metrics by hand, you can pull directly from the pushgateway:
curl localhost:9091/metrics
If you get this error while pulling metrics from the exporter:
An error has occurred while serving metrics:
collected metric "some_metric" { label:<name:"instance" value:"" > label:<name:"job" value:"some_job" > label:<name:"tag" value:"val1" > counter:<value:1 > } was collected before with the same name and label values
It's because similar metrics were sent twice into the gateway, which
corrupts the state of the pushgateway, a [known problems](https://github.com/prometheus/pushgateway/issues/232) in
earlier versions and [fixed in 0.10](https://github.com/prometheus/pushgateway/pull/290)(Debian bullseye and later). A
workaround is simply to restart the Pushgateway (and clear the
storage, if persistence is enabled, see the `--persistence.file`
flag).
## Disaster recovery
If a Prometheus/Grafana is destroyed, it should be compltely
rebuildable from Puppet. Non-configuration data should be restored
from backup, with `/var/lib/prometheus/` being sufficient to
reconstruct history. If even backups are destroyed, history will be
lost, but the server should still recover and start tracking new
metrics.
## Migrating from Munin
Here's a quick cheat sheet from people used to Munin and switching to
...
...
@@ -395,6 +319,82 @@ configured, inside Puppet, in `profile::prometheus::server::external`.
Note that it's [not possible to push timestamps](https://github.com/prometheus/pushgateway#about-timestamps) into the
Pushgateway, so it's not useful to ingest past historical data.
## Pager playbook
TBD.
### Troubleshooting missing metrics
If metrics do not correctly show up in Grafana, it might be worth
checking in the [Prometheus dashboard](https://prometheus.torproject.org/) itself for the same
metrics. Typically, if they do not show up in Grafana, they won't show
up in Prometheus either, but it's worth a try, even if only to see the
raw data.
Then, if data truly isn't present in Prometheus, you can track down
the "target" (the exporter) responsible for it in the [`/targets`][]
listing. If the target is "unhealthy", it will be marked in red and an
# HELP apache_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which apache_exporter was built.
# HELP apache_exporter_scrape_failures_total Number of errors while scraping apache.
# TYPE apache_exporter_scrape_failures_total counter
apache_exporter_scrape_failures_total 18371
# HELP apache_up Could the apache server be reached
# TYPE apache_up gauge
apache_up 0
Notice, however, the `apache_exporter_scrape_failures_total`, which
was incrementing. From there, we reproduced the work the exporter was
doing manually and fixed the issue, which involved passing the correct
argument to the exporter.
### Pushgateway errors
The Pushgateway web interface provides some basic information about
the metrics it collects, and allow you to view the pending metrics
before they get scraped by Prometheus, which may be useful to
troubleshoot issues with the gateway.
To pull metrics by hand, you can pull directly from the pushgateway:
curl localhost:9091/metrics
If you get this error while pulling metrics from the exporter:
An error has occurred while serving metrics:
collected metric "some_metric" { label:<name:"instance" value:"" > label:<name:"job" value:"some_job" > label:<name:"tag" value:"val1" > counter:<value:1 > } was collected before with the same name and label values
It's because similar metrics were sent twice into the gateway, which
corrupts the state of the pushgateway, a [known problems](https://github.com/prometheus/pushgateway/issues/232) in
earlier versions and [fixed in 0.10](https://github.com/prometheus/pushgateway/pull/290)(Debian bullseye and later). A
workaround is simply to restart the Pushgateway (and clear the
storage, if persistence is enabled, see the `--persistence.file`
flag).
## Disaster recovery
If a Prometheus/Grafana is destroyed, it should be compltely
rebuildable from Puppet. Non-configuration data should be restored
from backup, with `/var/lib/prometheus/` being sufficient to
reconstruct history. If even backups are destroyed, history will be
lost, but the server should still recover and start tracking new