Verified Commit d8daf703 authored by anarcat's avatar anarcat
Browse files

document how do deal with WAL-MISSING-AFTER warnings

to the best of my knowledge...
parent 644a93db
Loading
Loading
Loading
Loading
+46 −4
Original line number Diff line number Diff line
@@ -422,11 +422,53 @@ If you get this kind of errors, it's because you forgot to restore the
See also the "Direct restore procedure" troubleshooting section, which
also applies here.

Dealing with Nagios warnings
----------------------------
Monitoring warnings
-------------------

TODO: there's some more information about backup handling in the
[Debian DSA documentation](https://dsa.debian.org/howto/postgres-backup/).

### WAL-MISSING-AFTER

Example message:

    [troodi, main] WAL-MISSING-AFTER: troodi/main.WAL.00000001000000D9000000AD

This means that a WAL file is missing after the specified
file. Specifically, in the above scenario, the following files are
present, in chronological order:

    -rw------- 1 torbackup torbackup   16777216 May 10 05:08 main.WAL.00000001000000D9000000AA
    -rw------- 1 torbackup torbackup   16777216 May 10 05:47 main.WAL.00000001000000D9000000AB
    -rw------- 1 torbackup torbackup   16777216 May 10 06:20 main.WAL.00000001000000D9000000AC
    -rw------- 1 torbackup torbackup   16777216 May 10 06:26 main.WAL.00000001000000D9000000AD
    -rw------- 1 torbackup torbackup   16777216 May 10 13:57 main.WAL.00000001000000D9000000B5

Notice the jump from `...AD` to `...B5`. We're missing `AE`, `AF`,
`B1`, `B2`, `B3`, `B4`, specifically. We can also tell that something
happened between 6:26 and 13:57 on that day. It could be that the
backup server went down during that time.

 1. List the files in chronological order:

        ls -ltr /srv/backups/pg/troodi/ | less

 2. Find the file warned about, using `/` then the filename
    (`main.WAL.00000001000000D9000000AD`), above

 3. Look for a `.BASE.` file *following* the missing file, using `/`
    again

 4. Either:

    * if a `.BASE.` backup is present after the missing files, it is
      harmless insofar as the missing timeframe is not
      necessary. TODO: how do we fix the warning anyways?

TODO: there's some information about backup handling in the [Debian
DSA documentation](https://dsa.debian.org/howto/postgres-backup/).
    * if a `.BASE.` backup is *not* present after the missing files,
      the backup integrity is faulty, and a new base backup needs to
      be performed. See [Running a full
      backup](#running-a-full-backup) above.

Reference
=========