Skip to content
Snippets Groups Projects
Verified Commit a4747f47 authored by anarcat's avatar anarcat
Browse files

review and itemize the direct restore procedure, which now seems to work

parent 58cadd0c
No related branches found
No related tags found
No related merge requests found
......@@ -184,116 +184,119 @@ harmless.
Direct restore procedure
------------------------
TODO: this procedure does not work.
The above procedure assumes a bare-bones recovery, on a new server,
but it's also possible to sync an existing server from backups. The
following, therefore, assume postgres is already configured, with
something like:
but it's also possible to sync an existing server from backups. This
is also an adaptation of the [official recovery
procedure](https://www.postgresql.org/docs/9.3/continuous-archiving.html#BACKUP-PITR-RECOVERY).
apt install postgres-11
1. First install the right PostgreSQL version:
Make sure you run the SAME MAJOR VERSION of PostgreSQL than the
backup! You cannot restore across versions. This might mean installing
from backports or an older version of Debian.
apt install postgres-9.6
Make sure you run the SAME MAJOR VERSION of PostgreSQL than the
backup! You cannot restore across versions. This might mean
installing from backports or an older version of Debian.
On the postgres server:
2. On that new PostgreSQL server, show the `postgres` server public
key, creating it if missing:
[ -f ~postgres/.ssh/id_rsa.pub ] || sudo -u postgres ssh-keygen
cat ~postgres/.ssh/*.pub
Then on the backup server:
3. Then on the backup server, allow the user to access backups of the
old server:
echo "command="/usr/local/bin/debbackup-ssh-wrap --read-allow=/srv/backups/pg/$OLDSERVER $CLIENT",restrict $HOSTKEY" > /etc/ssh/userkeys/torbackup.more
This assumes we connect to a *previous* server's backups, named
`$OLDSERVER` (e.g. `dictyotum`). The `$HOSTKEY` is the public key
found on the postgres server above.
Warning: the above will fail if the key is already present in
`/etc/ssh/userkeys/torbackup`, edit the key in there instead in that
case.
Then you need to find the right `BASE` file to restore from. Each
`BASE` file has a timestamp in its filename, so just sorting them by
name should be enough to find the latest one. Uncompress the `BASE`
file in place, as the `postgres` user:
sudo -u postgres -i
sudo -u postgres ssh torbackup@$BACKUPSERVER $(hostname) retrieve-file pg $OLDSERVER bacula.BASE.$BACKUPSERVER-20191004-062226-$OLDSERVER.torproject.org-$CLUSTERNAME-9.6-backup.tar.gz | tar -C /var/lib/postgresql/9.6/main -x -z -f -
Add a `pv` before the `tar` call in the pipeline for a progress bar
with large backups, and replace:
1. `$BACKUPSERVER` with the backupserver name and username (currently
`bungei.torproject.org`)
2. `$OLDSERVER` with the old server's (short) hostname
(e.g. `dictyotum`)
3. `$CLUSTERNAME` with the name of the cluster to restore
(e.g. usually `main`)
TODO: The above might hang for a while, but it should complete. It
`retrieve-file` sends a header which includes a `sha512sum` which
takes a while to compute. If it doesn't work, use the indirect
procedure to restore the BASE, which there is hopefully space for
without the logs...
Make sure the `pg_xlog` directory doesn't contain any files.
Then you need to create a `recovery.conf` file in
`/var/lib/postgresql/9.6/main` that will tell postgres where to find
the WAL files. At least the `restore_command` need to be
specified. Something like this should work:
restore_command = '/usr/local/bin/pg-receive-file-from-backup $OLDSERVER $CLUSTERNAME.WAL.%f %p'
... where:
* `$OLDSERVER` should be replaced by the previous postgresql server
name (e.g. `dictyotum`)
* `$CLUSTERNAME` should be replaced by the previous cluster name
(e.g. `main`, generally)
You can specify a specific recovery point in the `recovery.conf`, see
the [upstream documentation](https://www.postgresql.org/docs/9.3/recovery-target-settings.html) for more information. Make sure the
file is owned by postgres:
$EDITOR /var/lib/postgresql/9.6/main/recovery.conf
chown postgres /var/lib/postgresql/9.6/main/recovery.conf
Then start the server and look at the logs to follow the recovery
process:
service postgresql start
tail -f /var/log/postgresql/*
You should see something like this:
2019-10-09 21:17:47.335 UTC [9632] LOG: database system was interrupted; last known up at 2019-10-04 08:12:28 UTC
2019-10-09 21:17:47.517 UTC [9632] LOG: starting archive recovery
2019-10-09 21:17:47.524 UTC [9633] [unknown]@[unknown] LOG: incomplete startup packet
2019-10-09 21:17:48.032 UTC [9639] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:48.538 UTC [9642] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:49.046 UTC [9645] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:49.354 UTC [9632] LOG: restored log file "00000001000005B200000074" from archive
2019-10-09 21:17:49.552 UTC [9648] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:50.058 UTC [9651] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:50.565 UTC [9654] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:50.836 UTC [9632] LOG: redo starts at 5B2/74000028
2019-10-09 21:17:51.071 UTC [9659] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:51.577 UTC [9665] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:20:35.790 UTC [9632] LOG: restored log file "00000001000005B20000009F" from archive
2019-10-09 21:20:37.745 UTC [9632] LOG: restored log file "00000001000005B2000000A0" from archive
2019-10-09 21:20:39.648 UTC [9632] LOG: restored log file "00000001000005B2000000A1" from archive
2019-10-09 21:20:41.738 UTC [9632] LOG: restored log file "00000001000005B2000000A2" from archive
2019-10-09 21:20:43.773 UTC [9632] LOG: restored log file "00000001000005B2000000A3" from archive
... and so on.
Then remove the temporary SSH access on the backup server, either by
removing the `.more` key file or restoring the previous key
configuration:
rm /etc/ssh/userkeys/torbackup.more
This assumes we connect to a *previous* server's backups, named
`$OLDSERVER` (e.g. `dictyotum`). The `$HOSTKEY` is the public key
found on the postgres server above.
Warning: the above will fail if the key is already present in
`/etc/ssh/userkeys/torbackup`, edit the key in there instead in
that case.
4. Then you need to find the right `BASE` file to restore from. Each
`BASE` file has a timestamp in its filename, so just sorting them
by name should be enough to find the latest one. Uncompress the
`BASE` file in place, as the `postgres` user:
sudo -u postgres ssh torbackup@$BACKUPSERVER $(hostname) retrieve-file pg $OLDSERVER bacula.BASE.$BACKUPSERVER-20191004-062226-$OLDSERVER.torproject.org-$CLUSTERNAME-9.6-backup.tar.gz | sudo -u postgres tar -C /var/lib/postgresql/9.6/main -x -z -f -
Add a `pv` before the `tar` call in the pipeline for a progress bar
with large backups, and replace:
* `$BACKUPSERVER` with the backupserver name and username
(currently `bungei.torproject.org`)
* `$OLDSERVER` with the old server's (short) hostname
(e.g. `dictyotum`)
* `$CLUSTERNAME` with the name of the cluster to restore
(e.g. usually `main`)
The above might hang for a while, but it should complete. The
"hang" is because `retrieve-file` sends a header which includes a
`sha512sum` and it takes a while to compute. If it doesn't work,
use the indirect procedure to restore the `BASE` file.
5. Make sure the `pg_xlog` directory doesn't contain any files.
rm -f -- /var/lib/postgresql/9.6/main/pg_xlog/*
6. Then you need to create a `recovery.conf` file in
`/var/lib/postgresql/9.6/main` that will tell postgres where to
find the WAL files. At least the `restore_command` need to be
specified. Something like this should work:
restore_command = '/usr/local/bin/pg-receive-file-from-backup $OLDSERVER $CLUSTERNAME.WAL.%f %p'
... where:
* `$OLDSERVER` should be replaced by the previous postgresql
server name (e.g. `dictyotum`)
* `$CLUSTERNAME` should be replaced by the previous cluster name
(e.g. `main`, generally)
You can specify a specific recovery point in the `recovery.conf`,
see the [upstream documentation](https://www.postgresql.org/docs/9.3/recovery-target-settings.html) for more information. Also
make sure the file is owned by postgres:
$EDITOR /var/lib/postgresql/9.6/main/recovery.conf
chown postgres /var/lib/postgresql/9.6/main/recovery.conf
7. Then start the server and look at the logs to follow the recovery
process:
service postgresql start
tail -f /var/log/postgresql/*
You should see something like this:
2019-10-09 21:17:47.335 UTC [9632] LOG: database system was interrupted; last known up at 2019-10-04 08:12:28 UTC
2019-10-09 21:17:47.517 UTC [9632] LOG: starting archive recovery
2019-10-09 21:17:47.524 UTC [9633] [unknown]@[unknown] LOG: incomplete startup packet
2019-10-09 21:17:48.032 UTC [9639] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:48.538 UTC [9642] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:49.046 UTC [9645] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:49.354 UTC [9632] LOG: restored log file "00000001000005B200000074" from archive
2019-10-09 21:17:49.552 UTC [9648] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:50.058 UTC [9651] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:50.565 UTC [9654] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:50.836 UTC [9632] LOG: redo starts at 5B2/74000028
2019-10-09 21:17:51.071 UTC [9659] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:17:51.577 UTC [9665] postgres@postgres FATAL: the database system is starting up
2019-10-09 21:20:35.790 UTC [9632] LOG: restored log file "00000001000005B20000009F" from archive
2019-10-09 21:20:37.745 UTC [9632] LOG: restored log file "00000001000005B2000000A0" from archive
2019-10-09 21:20:39.648 UTC [9632] LOG: restored log file "00000001000005B2000000A1" from archive
2019-10-09 21:20:41.738 UTC [9632] LOG: restored log file "00000001000005B2000000A2" from archive
2019-10-09 21:20:43.773 UTC [9632] LOG: restored log file "00000001000005B2000000A3" from archive
... and so on.
8. Then remove the temporary SSH access on the backup server, either
by removing the `.more` key file or restoring the previous key
configuration:
rm /etc/ssh/userkeys/torbackup.more
### Troubleshooting
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment