PostgreSQL is an advanced database server that is robust and fast, although possibly less well-known and popular than it's eternal rival in the free software world, MySQL.
Tutorial
Those are quick reminders on easy things to do in a cluster.
Connecting
Our PostgreSQL setup is fairly standard so connecting to the database is like any other Debian machine:
sudo -u postres psql
This drops you in a psql shell where you can issue SQL queries and so on.
Howto
Deployment
A PostgreSQL server should be deployed with puppet, using the
postgresql
module. Make sure you also include the
postgres::backup_source
class so that it's registered with the
backup server, see the dip::database
class for an example on how to
deploy a server.
See the howto/puppet documentation for more information on how to deploy Puppet things.
TODO: turn this in a Tutorial by making it more straightforward to deploy a cluster, for example just by including a single class.
Find what is taking up space
This will report size information for all tables, in both raw bytes and "pretty" form:
SELECT *, pg_size_pretty(total_bytes) AS total
, pg_size_pretty(index_bytes) AS INDEX
, pg_size_pretty(toast_bytes) AS toast
, pg_size_pretty(table_bytes) AS TABLE
FROM (
SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
, c.reltuples AS row_estimate
, pg_total_relation_size(c.oid) AS total_bytes
, pg_indexes_size(c.oid) AS index_bytes
, pg_total_relation_size(reltoastrelid) AS toast_bytes
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE relkind = 'r'
) a
) a ORDER BY total_bytes;
Same with databases:
SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS SIZE
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
LIMIT 20;
Source: PostgreSQL wiki. See also the upstream manual.
Running a full backup
Backups are normally automatically ran on the backup server (currently
bungei
) but you can also run a backup by hand. This will run a
backup of meronense
, for example:
sudo -u torbackup postgres-make-one-base-backup $(grep ^meronense.torproject.org $(which postgres-make-base-backups ))
See the reference section for details.
Picking a restore procedures
We have two ways of restoring PostgreSQL backups, indirect and direct.
The indirect procedures first transfers the backup files to the server and then runs the restore, so it require more space than the direct procedure. It might also be slower than the direct restore procedure because files are copied around twice: once from the backup server, and again loaded in the database.
The direct procedure streams the backup files directly from the
backup server. The BASE
file is streamed into the tar
command for
restore of the base snapshot, then the PostgreSQL restore command
copies each log directly from the backup server as well. It requires
minimal amount of space, as files are not copied to a temporary
location on the new server. The downside is it might be more difficult
to deploy and diagnose, as it has more moving parts.
Both procedures are adaptations of the official recovery procedure, which can be refered to for more information.
Indirect restore procedures
Reminder: this procedure copies the backup files from the backup server to the database server and restores from those.
-
First, you will need to give the backup server access to the new postgresql server, which we'll call the "client" for now. First, login to the client and allow the backup server to connect, and show the public hostkey:
iptables -I INPUT -s $BACKUP_SERVER -j ACCEPT cat /etc/ssh/ssh_host_rsa_key.pub
-
Then load the server's private key in an agent and show it to allow on the client. On the server, assuming
$IP
is the IP of the client and$HOSTKEY
is its hostkey:ssh-agent bash ssh-add /etc/ssh/ssh_host_rsa_key mkdir -p ~/.ssh echo "$IP $HOSTKEY" >> ~/.ssh/known_hosts cat /etc/ssh/ssh_host_rsa_key.pub
-
And on the client, allow the server
$HOSTKEY
(the abovecat /etc/ssh/ssh_host_rsa_key.pub
on the backup server):echo "$HOSTKEY" >> /etc/ssh/userkeys/root.more
-
Then, we can transfer files over from the backup server to the client:
cd /srv/backups/pg rsync -aP $CLIENT $CLIENT:/var/lib/postgresql/restore
-
Disable Puppet so you have control on when PostgreSQL is running:
puppet agent --disable 'keeping control of postgresql startup -- anarcat 2019-10-09'
-
Then, on the client, install the software, stop the server and move the template cluster out of the way:
apt install postgres rsync service postgresql stop mv /var/lib/postgresql/*/main{,.orig} su -c 'mkdir -m 0700 /var/lib/postgresql/9.6/main' postgres
We'll be restoring files in that
main
directory.Make sure you run the SAME MAJOR VERSION of PostgreSQL than the backup! You cannot restore across versions. This might mean installing from backports or an older version of Debian.
-
Then you need to find the right
BASE
file to restore from. EachBASE
file has a timestamp in its filename, so just sorting them by name should be enough to find the latest one. Uncompress theBASE
file in place:cat ~postgres/restore/$CLIENT/main.BASE.bungei.torproject.org-20190805-145239-$CLIENT.torproject.org-main-9.6-backup.tar.gz | su postgres -c 'tar -C /var/lib/postgresql/9.6/main -x -z -f -'
(Use
pv
instead ofcat
for a progress bar with large backups.) -
Make sure the
pg_xlog
directory doesn't contain any files. -
Then you need to create a
recovery.conf
file in/var/lib/postgresql/9.6/main
that will tell postgres where to find the WAL files. At least therestore_command
need to be specified. Something like this should work:restore_command = 'cp /var/lib/postgresql/restore/subnotablie/main.WAL.%f %p'
You can specify a specific recovery point in the
recovery.conf
, see the upstream documentation for more information. -
Then start the server and look at the logs to follow the recovery process:
service postgresql start tail -f /var/log/postgresql/*
-
re-enable Puppet, which should clean up the extra SSH key and firewall rules:
puppet agent -t
make sure it's okay in
/etc/ssh/userkeys/root
andiptables -L
.
When the restore succeeds, the recovery.conf
file will be renamed to
recovery.done
and you will see something like:
2019-08-12 21:36:53.453 UTC [16901] LOG: selected new timeline ID: 2
2019-08-12 21:36:53.470 UTC [16901] LOG: archive recovery complete
cp: cannot stat '/var/lib/postgresql/restore/subnotablie/main.WAL.00000001.history': No such file or directory
2019-08-12 21:36:53.577 UTC [16901] LOG: MultiXact member wraparound protections are now enabled
2019-08-12 21:36:53.584 UTC [16900] LOG: database system is ready to accept connections
Ignore the error from cp
complaining about the .history
file, it's
harmless.
Troubleshooting
If you find the following error in the logs:
FATAL: could not locate required checkpoint record
It's because postgres cannot find the WAL logs to restore from. There could be many causes for this, but the ones I stumbled upon were:
- wrong location for
recovery.conf
(it's in/var
, not/etc
), probably the case if you don't see an error fromcp
- wrong permissions on the archive (put the WAL files in
~postgres
, not~root
) - wrong path or pattern for
restore_command
(double-check the path and make sure to include the right prefix, e.g.main.WAL
)
Direct restore procedure
Reminder: this procedure streams the files from the backup server to the database server and restore directly. See above for more information on how to pick a restore procedure.
-
First, disable Puppet so you have control on when PostgreSQL is running:
puppet agent --disable 'keeping control of postgresql startup -- anarcat 2019-10-09'
-
Then install the right PostgreSQL version and stop the server:
apt install postgres-9.6 service postgresql stop
Make sure you run the SAME MAJOR VERSION of PostgreSQL than the backup! You cannot restore across versions. This might mean installing from backports or an older version of Debian.
-
On that new PostgreSQL server, show the
postgres
server public key, creating it if missing:[ -f ~postgres/.ssh/id_rsa.pub ] || sudo -u postgres ssh-keygen cat ~postgres/.ssh/*.pub
-
Then on the backup server, allow the user to access backups of the old server:
echo "command="/usr/local/bin/debbackup-ssh-wrap --read-allow=/srv/backups/pg/$OLDSERVER $CLIENT",restrict $HOSTKEY" > /etc/ssh/userkeys/torbackup.more
This assumes we connect to a previous server's backups, named
$OLDSERVER
(e.g.dictyotum
). The$HOSTKEY
is the public key found on the postgres server above.Warning: the above will fail if the key is already present in
/etc/ssh/userkeys/torbackup
, edit the key in there instead in that case. -
Then you need to find the right
BASE
file to restore from. EachBASE
file has a timestamp in its filename, so just sorting them by name should be enough to find the latest one. Uncompress theBASE
file in place, as thepostgres
user:sudo -u postgres -i mkdir -m 0700 /var/lib/postgresql/9.6/main ssh torbackup@$BACKUPSERVER $(hostname) retrieve-file pg $OLDSERVER $BASEFILE \ | sed '0,/^$/d' \ | tar -C /var/lib/postgresql/9.6/main -x -z -f -
Add a
pv
before thetar
call in the pipeline for a progress bar with large backups, and replace:-
$BACKUPSERVER
with the backupserver name and username (currentlybungei.torproject.org
) -
$OLDSERVER
with the old server's (short) hostname (e.g.dictyotum
) -
$CLUSTERNAME
with the name of the cluster to restore (e.g. usuallymain
) -
$BASEFILE
with something like$CLUSTERNAME.BASE.$BACKUPSERVER-20191004-062226-$OLDSERVER.torproject.org-$CLUSTERNAME-9.6-backup.tar.gz
or, fully expanded:bacula.BASE.bungei.torproject.org-20191010-184205-dictyotum.torproject.org-bacula-9.6-backup.tar.gz
where the$CLUSTERNAME
is the non-defaultbacula
The above might hang for a while (yes, maybe even a minute) in the beginning, but it should eventually get started and complete. The "hang" is because
retrieve-file
sends a header which includes asha512sum
and it takes a while to compute. Thesed
command in that pipeline is there to skip that header. Example of a successful restore operated in #31786:ssh torbackup@bungei.torproject.org bacula-director-01 retrieve-file pg dictyotum bacula.BASE.bungei.torproject.org-20191010-184205-dictyotum.torproject.org-bacula-9.6-backup.tar.gz | pv -s 33G | sed '0,/^$/d' | tar -C /var/lib/postgresql/9.6/main -x -z -f -
If the above doesn't work, use the indirect procedure to restore the
BASE
file. -
-
Make sure the
pg_xlog
directory doesn't contain any files.rm -rf -- /var/lib/postgresql/9.6/main/pg_xlog/*
-
Then you need to create a
recovery.conf
file in/var/lib/postgresql/9.6/main
that will tell postgres where to find the WAL files. At least therestore_command
need to be specified. Something like this should work:restore_command = '/usr/local/bin/pg-receive-file-from-backup $OLDSERVER $CLUSTERNAME.WAL.%f %p'
... where:
-
$OLDSERVER
should be replaced by the previous postgresql server name (e.g.dictyotum
) -
$CLUSTERNAME
should be replaced by the previous cluster name (e.g.main
, generally)
You can specify a specific recovery point in the
recovery.conf
, see the upstream documentation for more information. Also make sure the file is owned by postgres:$EDITOR /var/lib/postgresql/9.6/main/recovery.conf chown postgres /var/lib/postgresql/9.6/main/recovery.conf
-
-
Then start the server and look at the logs to follow the recovery process:
service postgresql start tail -f /var/log/postgresql/*
You should see something like this:
2019-10-09 21:17:47.335 UTC [9632] LOG: database system was interrupted; last known up at 2019-10-04 08:12:28 UTC 2019-10-09 21:17:47.517 UTC [9632] LOG: starting archive recovery 2019-10-09 21:17:47.524 UTC [9633] [unknown]@[unknown] LOG: incomplete startup packet 2019-10-09 21:17:48.032 UTC [9639] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:17:48.538 UTC [9642] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:17:49.046 UTC [9645] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:17:49.354 UTC [9632] LOG: restored log file "00000001000005B200000074" from archive 2019-10-09 21:17:49.552 UTC [9648] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:17:50.058 UTC [9651] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:17:50.565 UTC [9654] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:17:50.836 UTC [9632] LOG: redo starts at 5B2/74000028 2019-10-09 21:17:51.071 UTC [9659] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:17:51.577 UTC [9665] postgres@postgres FATAL: the database system is starting up 2019-10-09 21:20:35.790 UTC [9632] LOG: restored log file "00000001000005B20000009F" from archive 2019-10-09 21:20:37.745 UTC [9632] LOG: restored log file "00000001000005B2000000A0" from archive 2019-10-09 21:20:39.648 UTC [9632] LOG: restored log file "00000001000005B2000000A1" from archive 2019-10-09 21:20:41.738 UTC [9632] LOG: restored log file "00000001000005B2000000A2" from archive 2019-10-09 21:20:43.773 UTC [9632] LOG: restored log file "00000001000005B2000000A3" from archive
... and so on.
TODO: this warning comes up from time to time, problem?
2019-10-09 23:47:13.446 UTC [16973] LOG: could not link file "pg_xlog/00000001000005B3000000C3" to "pg_xlog/00000001000005B3000000F9": File exists
Then the recovery will complete with something like this:
2019-10-10 01:30:55.460 UTC [16953] LOG: redo done at 5B8/9C5BE738 2019-10-10 01:30:55.460 UTC [16953] LOG: last completed transaction was at log time 2019-10-10 01:04:23.238233+00 2019-10-10 01:31:03.536 UTC [16953] LOG: restored log file "00000001000005B80000009C" from archive 2019-10-10 01:31:06.458 UTC [16953] LOG: selected new timeline ID: 2 2019-10-10 01:31:17.485 UTC [16953] LOG: archive recovery complete 2019-10-10 01:32:11.975 UTC [16953] LOG: MultiXact member wraparound protections are now enabled 2019-10-10 01:32:12.438 UTC [16950] LOG: database system is ready to accept connections 2019-10-10 01:32:12.439 UTC [26501] LOG: autovacuum launcher started
-
Then remove the temporary SSH access on the backup server, either by removing the
.more
key file or restoring the previous key configuration:rm /etc/ssh/userkeys/torbackup.more
-
re-enable Puppet:
puppet agent -t
Troubleshooting
If you get this error in the PostgreSQL restore logs:
No directories from which read is allowed given on cmdline.
Unkown format in line 1 () when getting dictyotum:main.WAL.00000005.history from backup host
It's because the SSH keys deployed on the server does not have the
--read-allowed
argument.
If the restore doesn't work, try to connect by hand:
sudo -u postgres /usr/local/bin/pg-receive-file-from-backup dictyotum bacula.WAL.00000001000005AB0000003A /tmp/foo
If you get this kind of errors, it's because you forgot to restore the
BASE
file first:
2019-10-08 21:25:43.711 UTC [14320] [unknown]@[unknown] LOG: incomplete startup packet
2019-10-08 21:25:44.215 UTC [14326] postgres@postgres FATAL: the database system is starting up
2019-10-08 21:25:44.451 UTC [14316] WARNING: WAL was generated with wal_level=minimal, data may be missing
2019-10-08 21:25:44.451 UTC [14316] HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup.
2019-10-08 21:25:44.451 UTC [14316] LOG: consistent recovery state reached at 0/153A3F0
2019-10-08 21:25:44.451 UTC [14316] LOG: invalid record length at 0/153A3F0: wanted 24, got 0
2019-10-08 21:25:44.451 UTC [14316] LOG: redo is not required
2019-10-08 21:25:44.721 UTC [14334] postgres@postgres FATAL: the database system is starting up
2019-10-08 21:25:45.226 UTC [14340] postgres@postgres FATAL: the database system is starting up
2019-10-08 21:25:45.549 UTC [14316] LOG: selected new timeline ID: 6
2019-10-08 21:25:45.732 UTC [14343] postgres@postgres FATAL: the database system is starting up
2019-10-08 21:25:45.765 UTC [14316] LOG: archive recovery complete
2019-10-08 21:25:46.238 UTC [14349] postgres@postgres FATAL: the database system is starting up
2019-10-08 21:25:46.466 UTC [14316] LOG: MultiXact member wraparound protections are now enabled
2019-10-08 21:25:46.467 UTC [14315] LOG: database system is ready to accept connections
2019-10-08 21:25:46.467 UTC [14351] LOG: autovacuum launcher started
See also the "Direct restore procedure" troubleshooting section, which also applies here.
Monitoring warnings
TODO: there's some more information about backup handling in the Debian DSA documentation.
WAL-MISSING-AFTER
Example message:
[troodi, main] WAL-MISSING-AFTER: troodi/main.WAL.00000001000000D9000000AD
This means that a WAL file is missing after the specified file. Specifically, in the above scenario, the following files are present, in chronological order:
-rw------- 1 torbackup torbackup 16777216 May 10 05:08 main.WAL.00000001000000D9000000AA
-rw------- 1 torbackup torbackup 16777216 May 10 05:47 main.WAL.00000001000000D9000000AB
-rw------- 1 torbackup torbackup 16777216 May 10 06:20 main.WAL.00000001000000D9000000AC
-rw------- 1 torbackup torbackup 16777216 May 10 06:26 main.WAL.00000001000000D9000000AD
-rw------- 1 torbackup torbackup 16777216 May 10 13:57 main.WAL.00000001000000D9000000B5
Notice the jump from ...AD
to ...B5
. We're missing AE
, AF
,
B1
, B2
, B3
, B4
, specifically. We can also tell that something
happened between 6:26 and 13:57 on that day.
-
List the files in chronological order:
ls -ltr /srv/backups/pg/troodi/ | less
-
Find the file warned about, using
/
then the filename (main.WAL.00000001000000D9000000AD
), above -
Look for a
.BASE.
file following the missing file, using/
again -
Either:
-
if a
.BASE.
backup is present after the missing files, it is harmless to the extent that the missing timeframe is not necessary. to remove the warnings, previous WAL files do need to be removed, by hand. -
if a
.BASE.
backup is not present after the missing files, the backup integrity is faulty, and a new base backup needs to be performed. See Running a full backup above.
-
Reference
Backup design
We use upstream's Continuous Archiving and Point-in-Time Recovery
(PITR)
which relies on postgres's "write-ahead log" (WAL) to write regular
"transaction logs" of the cluster to the backup host. (Think of
transaction logs as incremental backups.) This is configured in
postgresql.conf
, using a line like this:
archive_command = '/usr/local/bin/pg-backup-file main WAL %p'
That is a site-specific script which reads a config file in
/etc/dsa/pg-backup-file.conf
where the backup host is specified
(currently torbackup@bungei.torproject.org
). That command passes the
WAL logs onto the backup server, over SSH. A WAL file is shipped
immediately when it is full (16MB of data by default) but no later
than 6 hours (varies, see archive_timeout
on each host) after it was
first written to. On the backup server, the command
is set to
debbackup-ssh-wrap
in the authorized_keys
file and takes the
store-file pg
argument to write the file to the right location.
WAL files are written to /srv/backups/pg/$HOSTNAME
where $HOSTNAME
(without .torproject.org
). WAL files are prefixed with main.WAL.
(where main
is the cluster name) with a long unique string after,
e.g. main.WAL.00000001000000A40000007F
.
For that system to work, we also need full backups to happen on a
regular basis. This happens straight from the backup server (still
bungei
) which connects to the various postgres servers and runs a
pg_basebackup to get a complete snapshot of the cluster. This
happens weekly (every 7 to 10 days) in the wrapper
postgres-make-base-backups
, which is a wrapper (based on a Puppet
concat::fragment
template) that calls
postgres-make-one-base-backup
for each postgres server.
The base files are written to the same directory as WAL file and are named using the template:
$CLUSTER.BASE.$SERVER_FQDN-$DATE-$ID-$CLIENT_FQDN-$CLUSTER-$VERSION-backup.tar.gz
... for example:
main.BASE.bungei.torproject.org-20190804-214510-troodi.torproject.org-main-9.6-backup.tar.gz
All of this works because SSH public keys and postgres credentials are
passed around between servers. That is handled in the Puppet
postgresql
module for the most part, but some bits might still be
configured manually on some servers.
Backups are checked for freshness in Nagios using the
dsa-check-backuppg
plugin with its configuration stored in
/etc/dsa/postgresql-backup/dsa-check-backuppg.conf.d/
, per
cluster. The Nagios plugin also takes care of expiring backups when
they are healthy.
Other documentation
See also: