From 46324531cacc15528dce8f7ccfaf5b1544d84d49 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
Date: Mon, 4 Dec 2023 14:55:45 -0500
Subject: [PATCH] clarify actual timeline split procedure

The actual error message is *NOT* UNEXPECTED-TIMELINE, that happens
only after the `.history` file is removed.

See tpo/tpa/team#41421
---
 howto/postgresql.md | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/howto/postgresql.md b/howto/postgresql.md
index c246cc04..7653ddc3 100644
--- a/howto/postgresql.md
+++ b/howto/postgresql.md
@@ -1266,15 +1266,16 @@ including the former were removed by hand. Then a full backup was
 performed. The reason why the BASE backup was missing is this was
 following a failed upgrade (see [tpo/tpa/team#40809](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40809)).
 
-### UNEXPECTED-TIMELINE
+### CANNOT-PARSE-WAL: example-01/main.WAL.00000002.history
 
 If the backup check script is complaining like this:
 
-    [rude, main] UNEXPECTED-TIMELINE: rude/main.WAL.000000020000010200000015
+    [survey-01, main] CANNOT-PARSE-WAL: survey-01/main.WAL.00000002.history
 
 It's likely because the [timeline](https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-TIMELINES) was bumped, which can happen on
 certain restore scenarios. The check script doesn't handle this very
-well. You need to inform said script of the timeline change, by adding
+well, as it complains about the `.history` file it doesn't
+recognize. You need to inform said script of the timeline change, by adding
 a `timeline` entry in the `/etc/nagios/dsa-check-backuppg.conf`
 script, for example, the entry for rude was changed from:
 
@@ -1291,6 +1292,27 @@ To:
      timeline: 2
 ```
 
+Once that is done, you'll see this warning:
+
+    [survey-01, main] UNEXPECTED-TIMELINE: survey-01/main.WAL.0000000100000008000000FE
+    [survey-01, main] NOT-EXPIRING-DUE-TO-WARNINGS: have seen warnings, will not expire anything
+
+The `UNEXPECTED-TIMELINE` will be repeated for *every* WAL file of the
+previous timeline. You should move those files out of the way and mark
+them for expiry. The simplest way to do this is to run a full backup,
+then move all files prefixed `main.WAL.00000001*` into another
+directory and schedule a purge of that with `at`. Example command set:
+
+    sudo -u torbackup postgres-make-one-base-backup $(grep ^survey-01.torproject.org $(which postgres-make-base-backups ))
+    mkdir ../survey-01-old
+    mv main.WAL.000000010000000* ../survey-01-old
+    mv main.WAL.00000002.history ../survey-01-old/
+    dsa-check-backuppg
+    mv main.BASE.bungei.torproject.org-202311* ../survey-01-old/
+
+Careful! The above `BASE` backup list was established from the output
+of `dsa-check-backuppg` and will vary according to the date, obviously.
+
 Alternatively, a dump/restore will reset the timeline to the normal
 "1", but then you'd need to move the directory out of the way and make
 a new full backup.
-- 
GitLab