... | ... | @@ -170,6 +170,77 @@ In our case, there were tens of millions of rows to clean: |
|
|
|
|
|
That is 200 *million* tuples to cleanup!
|
|
|
|
|
|
We can see details of the vacuum operation with this funky query,
|
|
|
taken from [this amazing blog post](https://dataegret.com/2017/10/deep-dive-into-postgres-stats-pg_stat_progress_vacuum/):
|
|
|
|
|
|
```
|
|
|
SELECT
|
|
|
p.pid,
|
|
|
now() - a.xact_start AS duration,
|
|
|
coalesce(wait_event_type ||'.'|| wait_event, 'f') AS waiting,
|
|
|
CASE
|
|
|
WHEN a.query ~*'^autovacuum.*to prevent wraparound' THEN 'wraparound'
|
|
|
WHEN a.query ~*'^vacuum' THEN 'user'
|
|
|
ELSE 'regular'
|
|
|
END AS mode,
|
|
|
p.datname AS database,
|
|
|
p.relid::regclass AS table,
|
|
|
p.phase,
|
|
|
pg_size_pretty(p.heap_blks_total * current_setting('block_size')::int) AS table_size,
|
|
|
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
|
|
|
pg_size_pretty(p.heap_blks_scanned * current_setting('block_size')::int) AS scanned,
|
|
|
pg_size_pretty(p.heap_blks_vacuumed * current_setting('block_size')::int) AS vacuumed,
|
|
|
round(100.0 * p.heap_blks_scanned / p.heap_blks_total, 1) AS scanned_pct,
|
|
|
round(100.0 * p.heap_blks_vacuumed / p.heap_blks_total, 1) AS vacuumed_pct,
|
|
|
p.index_vacuum_count,
|
|
|
round(100.0 * p.num_dead_tuples / p.max_dead_tuples,1) AS dead_pct
|
|
|
FROM pg_stat_progress_vacuum p
|
|
|
JOIN pg_stat_activity a using (pid)
|
|
|
ORDER BY now() - a.xact_start DESC
|
|
|
```
|
|
|
|
|
|
For example, the above vacuum on the Bacula director is in this state
|
|
|
at the time of writing:
|
|
|
|
|
|
```
|
|
|
bacula=# \x
|
|
|
Expanded display is on.
|
|
|
bacula=# SELECT [...]
|
|
|
-[ RECORD 1 ]------+----------------
|
|
|
pid | 534
|
|
|
duration | 10:55:24.413986
|
|
|
waiting | f
|
|
|
mode | regular
|
|
|
database | bacula
|
|
|
table | file
|
|
|
phase | scanning heap
|
|
|
table_size | 55 GB
|
|
|
total_size | 103 GB
|
|
|
scanned | 29 GB
|
|
|
vacuumed | 16 GB
|
|
|
scanned_pct | 52.2
|
|
|
vacuumed_pct | 29.3
|
|
|
index_vacuum_count | 1
|
|
|
dead_pct | 93.8
|
|
|
```
|
|
|
|
|
|
This is a lot of information, but basically the worker with PID 513
|
|
|
has been running for 10h55m on the `bacula` database. It is in the
|
|
|
`scanning heap` phase, second out of 8 phases of the [vacuuming
|
|
|
process](https://www.postgresql.org/docs/11/progress-reporting.html#VACUUM-PROGRESS-REPORTING). It's working on the `file` table which has has 55GB of
|
|
|
data on the "heap" and a total size of 103 GB (including indexes). It
|
|
|
scanned 29 GB of data (52%), vacuumed 16GB out of that (29%). The
|
|
|
`dead_pct` indicates that the [`maintenance_work_mem` buffer](https://www.postgresql.org/docs/11/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM) is
|
|
|
94% full, which could indicate raising that buffer could improve
|
|
|
performance. I am not sure what the `waiting` and `index_vacuum_count`
|
|
|
fields are for.
|
|
|
|
|
|
Naturally, this will return information for very large VACUUM
|
|
|
operations, which typically do not take this long. This one VACUUM
|
|
|
operation was especially slow because we suddenly removed almost half
|
|
|
of the old clients in the Bacula database, see [ticket
|
|
|
40525](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40525) for more information.
|
|
|
|
|
|
One more trick: this will show last VACUUM dates on tables:
|
|
|
|
|
|
SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables WHERE last_vacuum IS NOT NULL or last_autovacuum IS NOT NULL ORDER BY relname;
|
... | ... | |