Some ideas from the `#postgresql` channel on Libera:
* look at `age(query_start)` and `state`, and if `state` is `waiting`,
`wait_event`, and `wait_event_type`, in [`pg_stat_activity`](https://www.postgresql.org/docs/14/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW),
possibly looking for locks here
* enable [`pg_stat_statements`](https://www.postgresql.org/docs/14/pgstatstatements.html) to see where the time is going,
and then dig into the queries/functions found there, possibly with
[`auto_explain`](https://www.postgresql.org/docs/current/auto-explain.html) and `auto_explain.log_nested_statements=on`
In general, we have a few Grafana dashboards specific to PostgreSQL
(see [logs and metrics](#logs-and-metrics), below) that might help tracing performance
issues as well. Obviously, system-level statistics (disk, CPU, memory
usage) can help pinpoint where the bottleneck is as well, so basic
node-level Grafana dashboards are useful there as well.
## Running a full backup
Backups are normally automatically ran on the backup server (currently
...
...
@@ -1267,9 +1284,9 @@ the GitLab omnibus package, but metrics are not collected on other
Prometheus servers. The [Grafana](howto/grafana) server has a handful of
dashboards in various working states:
*[GitLab Omnibus - PostgreSQL](https://grafana.torproject.org/d/c_LJgXfmk/gitlab-omnibus-postgresql) - broken
*[PostgreSQL Overview (Percona)](https://grafana.torproject.org/d/IvhES05ik/postgresql-overview-percona) - mostly working
*[Postgres Overview](https://grafana.torproject.org/d/wGgaPlciz/postgres-overview) - basic dashboard with minimal metrics
*[PostgreSQL Overview (Percona)](https://grafana.torproject.org/d/IvhES05ik/postgresql-overview-percona) - mostly working
*[GitLab Omnibus - PostgreSQL](https://grafana.torproject.org/d/c_LJgXfmk/gitlab-omnibus-postgresql) - broken
We do have a Puppet class (`profile::prometheus::postgres_exporter`
which can monitor PostgreSQL servers, but it is not deployed on all