From 46324531cacc15528dce8f7ccfaf5b1544d84d49 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org> Date: Mon, 4 Dec 2023 14:55:45 -0500 Subject: [PATCH] clarify actual timeline split procedure The actual error message is *NOT* UNEXPECTED-TIMELINE, that happens only after the `.history` file is removed. See tpo/tpa/team#41421 --- howto/postgresql.md | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/howto/postgresql.md b/howto/postgresql.md index c246cc04..7653ddc3 100644 --- a/howto/postgresql.md +++ b/howto/postgresql.md @@ -1266,15 +1266,16 @@ including the former were removed by hand. Then a full backup was performed. The reason why the BASE backup was missing is this was following a failed upgrade (see [tpo/tpa/team#40809](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40809)). -### UNEXPECTED-TIMELINE +### CANNOT-PARSE-WAL: example-01/main.WAL.00000002.history If the backup check script is complaining like this: - [rude, main] UNEXPECTED-TIMELINE: rude/main.WAL.000000020000010200000015 + [survey-01, main] CANNOT-PARSE-WAL: survey-01/main.WAL.00000002.history It's likely because the [timeline](https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES) was bumped, which can happen on certain restore scenarios. The check script doesn't handle this very -well. You need to inform said script of the timeline change, by adding +well, as it complains about the `.history` file it doesn't +recognize. You need to inform said script of the timeline change, by adding a `timeline` entry in the `/etc/nagios/dsa-check-backuppg.conf` script, for example, the entry for rude was changed from: @@ -1291,6 +1292,27 @@ To: timeline: 2 ``` +Once that is done, you'll see this warning: + + [survey-01, main] UNEXPECTED-TIMELINE: survey-01/main.WAL.0000000100000008000000FE + [survey-01, main] NOT-EXPIRING-DUE-TO-WARNINGS: have seen warnings, will not expire anything + +The `UNEXPECTED-TIMELINE` will be repeated for *every* WAL file of the +previous timeline. You should move those files out of the way and mark +them for expiry. The simplest way to do this is to run a full backup, +then move all files prefixed `main.WAL.00000001*` into another +directory and schedule a purge of that with `at`. Example command set: + + sudo -u torbackup postgres-make-one-base-backup $(grep ^survey-01.torproject.org $(which postgres-make-base-backups )) + mkdir ../survey-01-old + mv main.WAL.000000010000000* ../survey-01-old + mv main.WAL.00000002.history ../survey-01-old/ + dsa-check-backuppg + mv main.BASE.bungei.torproject.org-202311* ../survey-01-old/ + +Careful! The above `BASE` backup list was established from the output +of `dsa-check-backuppg` and will vary according to the date, obviously. + Alternatively, a dump/restore will reset the timeline to the normal "1", but then you'd need to move the directory out of the way and make a new full backup. -- GitLab