Changes

This really helps figuring out #40525.
anarcat · b11fc4fa
--- a/howto/postgresql.md
+++ b/howto/postgresql.md
@@ -170,6 +170,77 @@ In our case, there were tens of millions of rows to clean:

 That is 200 *million* tuples to cleanup!

+We can see details of the vacuum operation with this funky query,
+taken from [this amazing blog post](https://dataegret.com/2017/10/deep-dive-into-postgres-stats-pg_stat_progress_vacuum/):
+
+```
+SELECT
+p.pid,
+now() - a.xact_start AS duration,
+coalesce(wait_event_type ||'.'|| wait_event, 'f') AS waiting,
+CASE
+WHEN a.query ~*'^autovacuum.*to prevent wraparound' THEN 'wraparound'
+WHEN a.query ~*'^vacuum' THEN 'user'
+ELSE 'regular'
+END AS mode,
+p.datname AS database,
+p.relid::regclass AS table,
+p.phase,
+pg_size_pretty(p.heap_blks_total * current_setting('block_size')::int) AS table_size,
+pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
+pg_size_pretty(p.heap_blks_scanned * current_setting('block_size')::int) AS scanned,
+pg_size_pretty(p.heap_blks_vacuumed * current_setting('block_size')::int) AS vacuumed,
+round(100.0 * p.heap_blks_scanned / p.heap_blks_total, 1) AS scanned_pct,
+round(100.0 * p.heap_blks_vacuumed / p.heap_blks_total, 1) AS vacuumed_pct,
+p.index_vacuum_count,
+round(100.0 * p.num_dead_tuples / p.max_dead_tuples,1) AS dead_pct
+FROM pg_stat_progress_vacuum p
+JOIN pg_stat_activity a using (pid)
+ORDER BY now() - a.xact_start DESC
+```
+
+For example, the above vacuum on the Bacula director is in this state
+at the time of writing:
+
+```
+bacula=# \x
+Expanded display is on.
+bacula=# SELECT [...]
+-[ RECORD 1 ]------+----------------
+pid                | 534
+duration           | 10:55:24.413986
+waiting            | f
+mode               | regular
+database           | bacula
+table              | file
+phase              | scanning heap
+table_size         | 55 GB
+total_size         | 103 GB
+scanned            | 29 GB
+vacuumed           | 16 GB
+scanned_pct        | 52.2
+vacuumed_pct       | 29.3
+index_vacuum_count | 1
+dead_pct           | 93.8
+```
+
+This is a lot of information, but basically the worker with PID 513
+has been running for 10h55m on the `bacula` database. It is in the
+`scanning heap` phase, second out of 8 phases of the [vacuuming
+process](https://www.postgresql.org/docs/11/progress-reporting.html#VACUUM-PROGRESS-REPORTING). It's working on the `file` table which has has 55GB of
+data on the "heap" and a total size of 103 GB (including indexes). It
+scanned 29 GB of data (52%), vacuumed 16GB out of that (29%). The
+`dead_pct` indicates that the [`maintenance_work_mem` buffer](https://www.postgresql.org/docs/11/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM) is
+94% full, which could indicate raising that buffer could improve
+performance. I am not sure what the `waiting` and `index_vacuum_count`
+fields are for.
+
+Naturally, this will return information for very large VACUUM
+operations, which typically do not take this long. This one VACUUM
+operation was especially slow because we suddenly removed almost half
+of the old clients in the Bacula database, see [ticket
+40525](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40525) for more information.
+
 One more trick: this will show last VACUUM dates on tables:

    SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables WHERE last_vacuum IS NOT NULL or last_autovacuum IS NOT NULL ORDER BY relname;