... | @@ -450,6 +450,63 @@ If you get this kind of errors, it's because you forgot to restore the |
... | @@ -450,6 +450,63 @@ If you get this kind of errors, it's because you forgot to restore the |
|
See also the "Direct restore procedure" troubleshooting section, which
|
|
See also the "Direct restore procedure" troubleshooting section, which
|
|
also applies here.
|
|
also applies here.
|
|
|
|
|
|
|
|
## Monitoring the VACUUM processes
|
|
|
|
|
|
|
|
In PostgreSQL, the [VACUUM command](https://www.postgresql.org/docs/current/sql-vacuum.html) "reclaims storage occupied by
|
|
|
|
dead tuples". To quote the excellent PostgreSQL documentation:
|
|
|
|
|
|
|
|
> In normal PostgreSQL operation, tuples that are deleted or obsoleted
|
|
|
|
> by an update are not physically removed from their table; they
|
|
|
|
> remain present until a VACUUM is done. Therefore it's necessary to
|
|
|
|
> do VACUUM periodically, especially on frequently-updated tables.
|
|
|
|
|
|
|
|
By default, the [autovacuum launcher](https://www.postgresql.org/docs/9.1/routine-vacuuming.html#AUTOVACUUM) is enabled in PostgreSQL (and
|
|
|
|
in our deployments), which should automatically take care of this
|
|
|
|
problem.
|
|
|
|
|
|
|
|
This will show that the autovacuum daemon is running:
|
|
|
|
|
|
|
|
# ps aux | grep [v]acuum
|
|
|
|
postgres 534 0.5 4.7 454920 388012 ? Ds 05:31 3:08 postgres: 11/main: autovacuum worker bacula
|
|
|
|
postgres 17259 0.0 0.1 331376 10984 ? Ss Nov12 0:10 postgres: 11/main: autovacuum launcher
|
|
|
|
|
|
|
|
In the above, the `launcher` is running, and we can see a worker has
|
|
|
|
been started to vacuum the `bacula` table.
|
|
|
|
|
|
|
|
If you don't see the launcher, check that it's enabled:
|
|
|
|
|
|
|
|
bacula=# SELECT name, setting FROM pg_settings WHERE name='autovacuum' or name='track_counts';
|
|
|
|
autovacuum | on
|
|
|
|
track_counts | on
|
|
|
|
|
|
|
|
Both need to be `on` for the autovacuum workers to operate. It's
|
|
|
|
possible that some tables might have autovacuum disabled, however,
|
|
|
|
see:
|
|
|
|
|
|
|
|
SELECT reloptions FROM pg_class WHERE relname='my_table';
|
|
|
|
|
|
|
|
In the above scenario, the `autovacuum worker bacula` process had been
|
|
|
|
running for *hours*, which was concerning. One way to diagnose is to
|
|
|
|
figure out how much data there is to vacuum.
|
|
|
|
|
|
|
|
This query will show the tables with dead tuples that need to be
|
|
|
|
cleaned up by the VACUUM process:
|
|
|
|
|
|
|
|
SELECT relname, n_dead_tup FROM pg_stat_user_tables where n_dead_tup > 0 order by n_dead_tup DESC LIMIT 1;
|
|
|
|
|
|
|
|
In our case, there were tens of millions of rows to clean:
|
|
|
|
|
|
|
|
bacula=# SELECT relname, n_dead_tup FROM pg_stat_user_tables where n_dead_tup > 0 order by n_dead_tup DESC LIMIT 1;
|
|
|
|
file | 183278595
|
|
|
|
|
|
|
|
That is 200 *million* tuples to cleanup!
|
|
|
|
|
|
|
|
One more trick: this will show last VACUUM dates on tables:
|
|
|
|
|
|
|
|
SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables WHERE last_vacuum IS NOT NULL or last_autovacuum IS NOT NULL ORDER BY relname;
|
|
|
|
|
|
|
|
Some of the ideas above were found on [this datadog post](https://www.datadoghq.com/blog/postgresql-vacuum-monitoring/).
|
|
|
|
|
|
## Monitoring warnings
|
|
## Monitoring warnings
|
|
|
|
|
|
TODO: there's some more information about backup handling in the
|
|
TODO: there's some more information about backup handling in the
|
... | | ... | |