Skip to content
Snippets Groups Projects
Unverified Commit d166edce authored by anarcat's avatar anarcat
Browse files

analyze the PostgreSQL backup system design

This is based on a chat with weasel and an audit of the Puppet code
and scripts.
parent eddcfd5d
No related branches found
No related tags found
No related merge requests found
...@@ -25,6 +25,55 @@ In our configuration, the *Admin workstation*, *Database server*and ...@@ -25,6 +25,55 @@ In our configuration, the *Admin workstation*, *Database server*and
See the [introductio to Bacula](https://www.bacula.org/9.4.x-manuals/en/main/What_is_Bacula.html#SECTION00220000000000000000) for more information on those See the [introductio to Bacula](https://www.bacula.org/9.4.x-manuals/en/main/What_is_Bacula.html#SECTION00220000000000000000) for more information on those
distinctions. distinctions.
## PostgreSQL backup system
Database backups are handled specially. We use PostgreSQL (postgres)
everywhere apart from a few rare exceptions (currently only CiviCRM)
and therefore use postgres-specific configurations to do backups of
all our servers.
One mechanism we use is upstream's [Continuous Archiving and
Point-in-Time Recovery (PITR)](https://www.postgresql.org/docs/9.3/continuous-archiving.html) which relies on postgres's
"write-ahead log" (WAL) to write regular "snapshots" of the database
to the backup host. This is configured in `postgresql.conf`, using a
line like this:
archive_command = '/usr/local/bin/pg-backup-file main WAL %p'
That is a site-specific script which reads a config file in
`/etc/dsa/pg-backup-file.conf` where the backup host is specified
(currently `torbackup@bungei.torproject.org`). That command passes the
WAL logs, which rotate at most every 6h (`archive_timeout`), onto the
backup server, over SSH. On the backup server, the `command` is set to
`debbackup-ssh-wrap` in the `authorized_keys` file and takes the
`store-file pg` argument to write the file to the right location.
WAL files are written to `/srv/backups/pg/$HOSTNAME` where `$HOSTNAME`
(without `.torproject.org`). WAL files are prefixed with `main.WAL.`
with a long unique string after,
e.g. `main.WAL.00000001000000A40000007F`.
For that system to work, we also need *full* backups to happen on a
regular basis. This happens straight from the backup server (still
`bungei`) which connects to the various postgres servers and runs a
[pg_basebackup](https://manpages.debian.org/pg_basebackup) to get a complete snapshot of the database. This
happens *weekly* in the wrapper `postgres-make-base-backups`, which is
a wrapper (based on a Puppet `concat::fragment` template) that calls
`postgres-make-one-base-backup` for each postgres server.
The base files are written to the same directory as WAL file and are
named using the template:
$CLUSTER.BASE.$SERVER_FQDN-$DATE-$ID-$CLIENT_FQDN-$CLUSTER-$VERSION-backup.tar.gz
... for example:
main.BASE.bungei.torproject.org-20190804-214510-troodi.torproject.org-main-9.6-backup.tar.gz
Backups are checked for freshness in Nagios using the
`dsa-check-backuppg` plugin with its configuration stored in
`/etc/dsa/postgresql-backup/dsa-check-backuppg.conf.d/`, per cluster.
Basic commands Basic commands
============== ==============
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment