... | ... | @@ -624,8 +624,73 @@ it isn't available in Debian. |
|
|
|
|
|
## Pager playbook
|
|
|
|
|
|
TODO: there's some more information about backup handling in the
|
|
|
[Debian DSA documentation](https://dsa.debian.org/howto/postgres-backup/).
|
|
|
### BASE-IS-OLD
|
|
|
|
|
|
WARNING: this procedure was taken from the [DSA wiki](https://dsa.debian.org/howto/postgres-backup/) and is
|
|
|
untested.
|
|
|
|
|
|
Our nagios check warns us when a backup server has not successfully
|
|
|
fetched a base backup recently. The causes often are that either the
|
|
|
postgres server or the backup host went down or was down during the
|
|
|
time of the weekly cronjob.
|
|
|
|
|
|
To re-run a base backup for a specific cluster, log into the backup
|
|
|
server(s) (currently `bungei`), then find the port for the affected
|
|
|
cluster:
|
|
|
|
|
|
cat /etc/dsa/postgresql-backup/base-backup-clusters
|
|
|
|
|
|
Then run this backup in a screen:
|
|
|
|
|
|
sudo -u torbackup /usr/local/bin/postgres-make-base-backups <host>:<port>
|
|
|
|
|
|
### MISSING-BASE
|
|
|
|
|
|
WARNING: this procedure was taken from the [DSA wiki](https://dsa.debian.org/howto/postgres-backup/) and is
|
|
|
untested.
|
|
|
|
|
|
If you get a Nagios warning like this:
|
|
|
|
|
|
[fasolo, dak] MISSING-BASE: dak.BASE.backuphost.debian.org-20180211-012002-fasolo.debian.org-dak-9.6-backup.tar.gz
|
|
|
|
|
|
This means that we started doing a base backup (as witnessed by a
|
|
|
.backup file next to a WAL), but for some reason we don't have the
|
|
|
corresponding base file.
|
|
|
|
|
|
root@backuphost:/srv/backups/pg/fasolo# ls -l *backup*
|
|
|
-rw------- 1 debbackup debbackup 9201093916 Jan 14 06:18 dak.BASE.backuphost.debian.org-20180114-012001-fasolo.debian.org-dak-9.6-backup.tar.gz
|
|
|
-rw------- 1 debbackup debbackup 9227651542 Jan 21 06:25 dak.BASE.backuphost.debian.org-20180121-012001-fasolo.debian.org-dak-9.6-backup.tar.gz
|
|
|
-rw------- 1 debbackup debbackup 9266306750 Jan 28 07:59 dak.BASE.backuphost.debian.org-20180128-012001-fasolo.debian.org-dak-9.6-backup.tar.gz
|
|
|
-rw------- 1 debbackup debbackup 9312602089 Feb 5 11:00 dak.BASE.backuphost.debian.org-20180204-012001-fasolo.debian.org-dak-9.6-backup.tar.gz
|
|
|
-rw------- 1 debbackup debbackup 9346830509 Feb 12 10:25 dak.BASE.backuphost.debian.org-20180212-094930-fasolo.debian.org-dak-9.6-backup.tar.gz
|
|
|
-rw------- 1 debbackup debbackup 353 Jan 14 06:18 dak.WAL.0000000100000033000000A6.00000028.backup
|
|
|
-rw------- 1 debbackup debbackup 350 Jan 20 11:20 dak.WAL.00000001000000350000008C.00000028.backup
|
|
|
-rw------- 1 debbackup debbackup 353 Jan 21 06:25 dak.WAL.000000010000003600000068.00000028.backup
|
|
|
-rw------- 1 debbackup debbackup 353 Jan 28 07:59 dak.WAL.0000000100000038000000E3.00000028.backup
|
|
|
-rw------- 1 debbackup debbackup 353 Feb 5 11:00 dak.WAL.000000010000003B00000090.00000028.backup
|
|
|
-rw------- 1 debbackup debbackup 350 Feb 5 15:49 dak.WAL.000000010000003B0000009B.00000108.backup
|
|
|
-rw------- 1 debbackup debbackup 353 Feb 11 10:09 dak.WAL.000000010000003D000000AC.00000028.backup
|
|
|
-rw------- 1 debbackup debbackup 353 Feb 12 10:25 dak.WAL.000000010000003E00000027.00000178.backup
|
|
|
|
|
|
`.backup` files are created on the postgres server and shipped to the
|
|
|
backup hosts whenever a base backup is initiated. We do some
|
|
|
labelling, so we know which backup host the corresponding tarball
|
|
|
should end up with. For example:
|
|
|
|
|
|
root@backuphost:/srv/backups/pg/fasolo# cat dak.WAL.000000010000003B00000090.00000028.backup
|
|
|
START WAL LOCATION: 3B/90000028 (file 000000010000003B00000090)
|
|
|
STOP WAL LOCATION: 3B/97CF2138 (file 000000010000003B00000097)
|
|
|
CHECKPOINT LOCATION: 3B/90000098
|
|
|
BACKUP METHOD: streamed
|
|
|
BACKUP FROM: master
|
|
|
START TIME: 2018-02-05 10:25:28 UTC
|
|
|
LABEL: backuphost.debian.org-20180204-012001-fasolo.debian.org-dak-9.6-backup
|
|
|
STOP TIME: 2018-02-05 10:59:50 UTC
|
|
|
|
|
|
To fix this, verify we have a later base tarball, or that we are fine
|
|
|
for some other reason, and remove the corresponding `.backup` file from
|
|
|
the backup host. In the case above, we would remove
|
|
|
`dak.WAL.000000010000003D000000AC.00000028.backup`.
|
|
|
|
|
|
### WAL-MISSING-AFTER
|
|
|
|
... | ... | @@ -1010,7 +1075,29 @@ packages. For example, in Debian bullseye: |
|
|
apt install postgresql-13
|
|
|
|
|
|
To ensure backups are properly performed, the server must also have
|
|
|
the `profile::postgres` class.
|
|
|
the `profile::postgres` class. The storage backup server also needs
|
|
|
`REPLICATION` access to make full backups. First, create a backup
|
|
|
user:
|
|
|
|
|
|
sudo -u postgres createuser -D -E -P -R -S tor-backup
|
|
|
|
|
|
Give the role replication access:
|
|
|
|
|
|
sudo -u postgres psql -c 'ALTER ROLE "tor-backup" REPLICATION;'
|
|
|
|
|
|
Add an entry to `pg_hba.conf` to allow access from the storage server
|
|
|
(currently `bungei`):
|
|
|
|
|
|
hostssl replication tor-backup 2a01:4f9:2b:1a05::2/128 md5
|
|
|
hostssl replication tor-backup 95.216.116.122/32 md5
|
|
|
|
|
|
Ensure pg is listening on all interfaces, that it is using TLS and a
|
|
|
proper auto-ca cert.
|
|
|
|
|
|
Reload db server.
|
|
|
|
|
|
Test running `postgres-make-base-backups host:port` on the storage
|
|
|
server.
|
|
|
|
|
|
### Prometheus PostgreSQL exporter deployment
|
|
|
|
... | ... | @@ -1208,11 +1295,16 @@ We use upstream's [Continuous Archiving and Point-in-Time Recovery |
|
|
(PITR)](https://www.postgresql.org/docs/11/continuous-archiving.html) which relies on postgres's "write-ahead log" (WAL) to write
|
|
|
regular "transaction logs" of the cluster to the backup host. (Think
|
|
|
of transaction logs as incremental backups.) This is configured in
|
|
|
`postgresql.conf`, using a line like this:
|
|
|
`postgresql.conf`, using a configuration like this:
|
|
|
|
|
|
track_counts = yes
|
|
|
archive_mode = on
|
|
|
wal_level = archive
|
|
|
max_wal_senders = 3
|
|
|
archive_timeout = 6h
|
|
|
archive_command = '/usr/local/bin/pg-backup-file main WAL %p'
|
|
|
|
|
|
That is a site-specific script which reads a config file in
|
|
|
The latter is a site-specific script which reads a config file in
|
|
|
`/etc/dsa/pg-backup-file.conf` where the backup host is specified
|
|
|
(currently `torbackup@bungei.torproject.org`). That command passes the
|
|
|
WAL logs onto the backup server, over SSH. A WAL file is shipped
|
... | ... | |