Skip to content
Snippets Groups Projects
postgresql.mdwn 21.7 KiB
Newer Older
  • Learn to ignore specific revisions
  • [PostgreSQL](https://www.postgresql.org/) is an advanced database server that is robust and
    fast, although possibly less well-known and popular than it's eternal
    rival in the free software world, MySQL.
    
    Tutorial
    ========
    
    Those are quick reminders on easy things to do in a cluster.
    
    Connecting
    ----------
    
    Our PostgreSQL setup is fairly standard so connecting to the database
    is like any other Debian machine:
    
        sudo -u postres psql
    
    This drops you in a [psql shell](https://www.postgresql.org/docs/9.6/app-psql.html) where you can issue SQL queries
    and so on.
    
    
    Deployment
    ----------
    
    A PostgreSQL server should be deployed with puppet, using the
    `postgresql` module. Make sure you also include the
    `postgres::backup_source` class so that it's registered with the
    backup server, see the `dip::database` class for an example on how to
    deploy a server.
    
    See the [[puppet]] documentation for more information on how to deploy
    Puppet things.
    
    TODO: turn this in a Tutorial by making it more straightforward to
    deploy a cluster, for example just by including a single class.
    
    
    anarcat's avatar
    anarcat committed
    Find what is taking up space
    ----------------------------
    
    This will report size information for all tables, in both raw bytes and "pretty" form:
    
        SELECT *, pg_size_pretty(total_bytes) AS total
            , pg_size_pretty(index_bytes) AS INDEX
            , pg_size_pretty(toast_bytes) AS toast
            , pg_size_pretty(table_bytes) AS TABLE
          FROM (
          SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
              SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
                      , c.reltuples AS row_estimate
                      , pg_total_relation_size(c.oid) AS total_bytes
                      , pg_indexes_size(c.oid) AS index_bytes
                      , pg_total_relation_size(reltoastrelid) AS toast_bytes
                  FROM pg_class c
                  LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
                  WHERE relkind = 'r'
          ) a
        ) a ORDER BY total_bytes;
    
    Same with databases:
    
        SELECT d.datname AS Name,  pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
            CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
                THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
                ELSE 'No Access'
            END AS SIZE
        FROM pg_catalog.pg_database d
            ORDER BY
            CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
                THEN pg_catalog.pg_database_size(d.datname)
                ELSE NULL
            END DESC -- nulls first
            LIMIT 20;
    
    Source: [PostgreSQL wiki](https://wiki.postgresql.org/wiki/Disk_Usage). See also [the upstream manual](https://www.postgresql.org/docs/9.1/disk-usage.html).
    
    
    Running a full backup
    ---------------------
    
    Backups are normally automatically ran on the backup server (currently
    `bungei`) but you can also run a backup by hand. This will run a
    backup of `meronense`, for example:
    
        sudo -u torbackup postgres-make-one-base-backup $(grep ^meronense.torproject.org $(which postgres-make-base-backups ))
    
    See the [reference](#Reference) section for details.
    
    
    Picking a restore procedures
    ----------------------------
    
    We have two ways of restoring PostgreSQL backups, **indirect** and
    **direct**.
    
    The **indirect** procedures first transfers the backup files to the
    server and then runs the restore, so it require more space than the
    direct procedure. It might also be slower than the direct restore
    procedure because files are copied around twice: once from the backup
    server, and again loaded in the database.
    
    The **direct** procedure *streams* the backup files directly from the
    backup server. The `BASE` file is streamed into the `tar` command for
    restore of the base snapshot, then the PostgreSQL restore command
    copies each log directly from the backup server as well. It requires
    minimal amount of space, as files are not copied to a temporary
    location on the new server. The downside is it might be more difficult
    to deploy and diagnose, as it has more moving parts.
    
    Both procedures are adaptations of the [official recovery
    procedure](https://www.postgresql.org/docs/9.3/continuous-archiving.html#BACKUP-PITR-RECOVERY), which can be refered to for more information.
    
    
    Indirect restore procedures
    ---------------------------
    
    Reminder: this procedure **copies** the backup files from the backup
    server to the database server and restores from those.
    
     1. First, you will need to give the backup server access to the new
        postgresql server, which we'll call the "client" for now. First,
        login to the client and allow the backup server to connect, and
        show the public hostkey:
    
            iptables -I INPUT -s $BACKUP_SERVER -j ACCEPT
            cat /etc/ssh/ssh_host_rsa_key.pub
    
     2. Then load the server's private key in an agent and show it to
        allow on the client. On the server, assuming `$IP` is the IP of
        the client and `$HOSTKEY` is its hostkey:
    
            ssh-agent bash
            ssh-add /etc/ssh/ssh_host_rsa_key
            mkdir -p ~/.ssh
            echo "$IP $HOSTKEY" >> ~/.ssh/known_hosts
            cat /etc/ssh/ssh_host_rsa_key.pub
    
     3. And on the client, allow the server `$HOSTKEY` (the above `cat
        /etc/ssh/ssh_host_rsa_key.pub` on the backup server):
    
    anarcat's avatar
    anarcat committed
            echo "$HOSTKEY" >> /etc/ssh/userkeys/root.more
    
     4. Then, we can transfer files over from the backup server to the
        client:
    
            cd /srv/backups/pg
            rsync -aP $CLIENT $CLIENT:/var/lib/postgresql/restore
    
     5. Disable Puppet so you have control on when PostgreSQL is running:
    
            puppet agent --disable 'keeping control of postgresql startup -- anarcat 2019-10-09'
    
     6. Then, on the client, install the software, stop the server and
    
        move the template cluster out of the way:
    
            apt install postgres rsync
            service postgresql stop
            mv /var/lib/postgresql/*/main{,.orig}
            su -c 'mkdir -m 0700 /var/lib/postgresql/9.6/main' postgres
    
        We'll be restoring files in that `main` directory.
    
        Make sure you run the SAME MAJOR VERSION of PostgreSQL than the
        backup! You cannot restore across versions. This might mean
        installing from backports or an older version of Debian.
    
    
     7. Then you need to find the right `BASE` file to restore from. Each
    
        `BASE` file has a timestamp in its filename, so just sorting them
        by name should be enough to find the latest one. Uncompress the
        `BASE` file in place:
    
            cat ~postgres/restore/$CLIENT/main.BASE.bungei.torproject.org-20190805-145239-$CLIENT.torproject.org-main-9.6-backup.tar.gz | su postgres -c 'tar -C /var/lib/postgresql/11/main -x -z -f -'
    
        (Use `pv` instead of `cat` for a progress bar with large backups.)
    
    
     8. Make sure the `pg_xlog` directory doesn't contain any files.
    
     9. Then you need to create a `recovery.conf` file in
    
        `/var/lib/postgresql/9.6/main` that will tell postgres where to
        find the WAL files. At least the `restore_command` need to be
        specified. Something like this should work:
    
            restore_command = 'cp /var/lib/postgresql/restore/subnotablie/main.WAL.%f %p'
    
        You can specify a specific recovery point in the `recovery.conf`,
        see the [upstream documentation](https://www.postgresql.org/docs/9.3/recovery-target-settings.html) for more information.
    
     10. Then start the server and look at the logs to follow the recovery
         process:
    
            service postgresql start
            tail -f /var/log/postgresql/*
    
    anarcat's avatar
    anarcat committed
     11. re-enable Puppet, which should clean up the extra SSH key and
         firewall rules:
    
    
            puppet agent -t
    
    anarcat's avatar
    anarcat committed
        
    
    anarcat's avatar
    anarcat committed
         make sure it's okay in `/etc/ssh/userkeys/root` and `iptables -L`.
    
    When the restore succeeds, the `recovery.conf` file will be renamed to
    `recovery.done` and you will see something like:
    
        2019-08-12 21:36:53.453 UTC [16901] LOG:  selected new timeline ID: 2
        2019-08-12 21:36:53.470 UTC [16901] LOG:  archive recovery complete
        cp: cannot stat '/var/lib/postgresql/restore/subnotablie/main.WAL.00000001.history': No such file or directory
        2019-08-12 21:36:53.577 UTC [16901] LOG:  MultiXact member wraparound protections are now enabled
        2019-08-12 21:36:53.584 UTC [16900] LOG:  database system is ready to accept connections
    
    Ignore the error from `cp` complaining about the `.history` file, it's
    harmless.
    
    ### Troubleshooting
    
    
    If you find the following error in the logs:
    
        FATAL:  could not locate required checkpoint record
    
    It's because postgres cannot find the WAL logs to restore from. There
    could be many causes for this, but the ones I stumbled upon were:
    
     * wrong location for `recovery.conf` (it's in `/var`, not `/etc`),
       probably the case if you don't see an error from `cp`
     * wrong permissions on the archive (put the WAL files in `~postgres`,
       not `~root`)
     * wrong path or pattern for `restore_command` (double-check the path
       and make sure to include the right prefix, e.g. `main.WAL`)
    
    
    Direct restore procedure
    ------------------------
    
    
    Reminder: this procedure **streams** the files from the backup server
    to the database server and restore directly. See above for more
    information on how to pick a restore procedure.
    
     1. First, disable Puppet so you have control on when PostgreSQL is
        running:
     
            puppet agent --disable 'keeping control of postgresql startup -- anarcat 2019-10-09'
    
     2. Then install the right PostgreSQL version and stop the server:
    
            service postgresql stop
    
    
        Make sure you run the SAME MAJOR VERSION of PostgreSQL than the
        backup! You cannot restore across versions. This might mean
        installing from backports or an older version of Debian.
    
     3. On that new PostgreSQL server, show the `postgres` server public
    
    anarcat's avatar
    anarcat committed
            [ -f ~postgres/.ssh/id_rsa.pub ] || sudo -u postgres ssh-keygen
            cat ~postgres/.ssh/*.pub
    
     4. Then on the backup server, allow the user to access backups of the
    
    anarcat's avatar
    anarcat committed
            echo "command="/usr/local/bin/debbackup-ssh-wrap --read-allow=/srv/backups/pg/$OLDSERVER $CLIENT",restrict $HOSTKEY" > /etc/ssh/userkeys/torbackup.more
    
        This assumes we connect to a *previous* server's backups, named
        `$OLDSERVER` (e.g. `dictyotum`). The `$HOSTKEY` is the public key
        found on the postgres server above.
    
        Warning: the above will fail if the key is already present in
        `/etc/ssh/userkeys/torbackup`, edit the key in there instead in
        that case.
    
    
     5. Then you need to find the right `BASE` file to restore from. Each
    
        `BASE` file has a timestamp in its filename, so just sorting them
        by name should be enough to find the latest one. Uncompress the
        `BASE` file in place, as the `postgres` user:
    
    
            sudo -u postgres -i
            mkdir -m 0700 /var/lib/postgresql/9.6/main
            ssh torbackup@$BACKUPSERVER $(hostname) retrieve-file pg $OLDSERVER $BASEFILE \
              | sed '0,/^$/d' \
              | tar -C /var/lib/postgresql/9.6/main -x -z -f -
    
    
        Add a `pv` before the `tar` call in the pipeline for a progress bar
        with large backups, and replace:
    
         * `$BACKUPSERVER` with the backupserver name and username
            (currently `bungei.torproject.org`)
         * `$OLDSERVER` with the old server's (short) hostname
           (e.g. `dictyotum`)
         * `$CLUSTERNAME` with the name of the cluster to restore
           (e.g. usually `main`)
    
         * `$BASEFILE` with something like
           `$CLUSTERNAME.BASE.$BACKUPSERVER-20191004-062226-$OLDSERVER.torproject.org-$CLUSTERNAME-9.6-backup.tar.gz`
    
    anarcat's avatar
    anarcat committed
           or, fully expanded:
           `bacula.BASE.bungei.torproject.org-20191010-184205-dictyotum.torproject.org-bacula-9.6-backup.tar.gz`
           where the `$CLUSTERNAME` is the non-default `bacula`
    
    anarcat's avatar
    anarcat committed
        The above might hang for a while (yes, maybe even a minute) in the
        beginning, but it should eventually get started and complete. The
    
        "hang" is because `retrieve-file` sends a header which includes a
    
    anarcat's avatar
    anarcat committed
        `sha512sum` and it takes a while to compute. The `sed` command in
    
    anarcat's avatar
    anarcat committed
        that pipeline is there to skip that header. Example of a
        successful restore operated in [#31786][]:
       
            ssh torbackup@bungei.torproject.org bacula-director-01 retrieve-file pg dictyotum bacula.BASE.bungei.torproject.org-20191010-184205-dictyotum.torproject.org-bacula-9.6-backup.tar.gz | pv -s 33G | sed '0,/^$/d' | tar -C /var/lib/postgresql/9.6/main -x -z -f -
    
    anarcat's avatar
    anarcat committed
    
        If the above doesn't work, use the indirect procedure to restore
        the `BASE` file.
    
    anarcat's avatar
    anarcat committed
    [#31786]: https://bugs.torproject.org/31786
    
    
     6. Make sure the `pg_xlog` directory doesn't contain any files.
    
    anarcat's avatar
    anarcat committed
            rm -rf -- /var/lib/postgresql/9.6/main/pg_xlog/*
    
     7. Then you need to create a `recovery.conf` file in
    
        `/var/lib/postgresql/9.6/main` that will tell postgres where to
        find the WAL files. At least the `restore_command` need to be
        specified. Something like this should work:
    
            restore_command = '/usr/local/bin/pg-receive-file-from-backup $OLDSERVER $CLUSTERNAME.WAL.%f %p'
    
        ... where:
    
         * `$OLDSERVER` should be replaced by the previous postgresql
           server name (e.g. `dictyotum`)
         * `$CLUSTERNAME` should be replaced by the previous cluster name
           (e.g. `main`, generally)
    
        You can specify a specific recovery point in the `recovery.conf`,
        see the [upstream documentation](https://www.postgresql.org/docs/9.3/recovery-target-settings.html) for more information. Also
        make sure the file is owned by postgres:
    
            $EDITOR /var/lib/postgresql/9.6/main/recovery.conf
            chown postgres /var/lib/postgresql/9.6/main/recovery.conf
    
    
     8. Then start the server and look at the logs to follow the recovery
    
        process:
    
            service postgresql start
            tail -f /var/log/postgresql/*
    
        You should see something like this:
    
            2019-10-09 21:17:47.335 UTC [9632] LOG:  database system was interrupted; last known up at 2019-10-04 08:12:28 UTC
            2019-10-09 21:17:47.517 UTC [9632] LOG:  starting archive recovery
            2019-10-09 21:17:47.524 UTC [9633] [unknown]@[unknown] LOG:  incomplete startup packet
            2019-10-09 21:17:48.032 UTC [9639] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:17:48.538 UTC [9642] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:17:49.046 UTC [9645] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:17:49.354 UTC [9632] LOG:  restored log file "00000001000005B200000074" from archive
            2019-10-09 21:17:49.552 UTC [9648] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:17:50.058 UTC [9651] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:17:50.565 UTC [9654] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:17:50.836 UTC [9632] LOG:  redo starts at 5B2/74000028
            2019-10-09 21:17:51.071 UTC [9659] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:17:51.577 UTC [9665] postgres@postgres FATAL:  the database system is starting up
            2019-10-09 21:20:35.790 UTC [9632] LOG:  restored log file "00000001000005B20000009F" from archive
            2019-10-09 21:20:37.745 UTC [9632] LOG:  restored log file "00000001000005B2000000A0" from archive
            2019-10-09 21:20:39.648 UTC [9632] LOG:  restored log file "00000001000005B2000000A1" from archive
            2019-10-09 21:20:41.738 UTC [9632] LOG:  restored log file "00000001000005B2000000A2" from archive
            2019-10-09 21:20:43.773 UTC [9632] LOG:  restored log file "00000001000005B2000000A3" from archive
    
        ... and so on.
    
    
    anarcat's avatar
    anarcat committed
        TODO: this warning comes up from time to time, problem?
    
            2019-10-09 23:47:13.446 UTC [16973] LOG:  could not link file "pg_xlog/00000001000005B3000000C3" to "pg_xlog/00000001000005B3000000F9": File exists
    
    
    anarcat's avatar
    anarcat committed
        Then the recovery will complete with something like this:
    
            2019-10-10 01:30:55.460 UTC [16953] LOG:  redo done at 5B8/9C5BE738
            2019-10-10 01:30:55.460 UTC [16953] LOG:  last completed transaction was at log time 2019-10-10 01:04:23.238233+00
            2019-10-10 01:31:03.536 UTC [16953] LOG:  restored log file "00000001000005B80000009C" from archive
            2019-10-10 01:31:06.458 UTC [16953] LOG:  selected new timeline ID: 2
            2019-10-10 01:31:17.485 UTC [16953] LOG:  archive recovery complete
            2019-10-10 01:32:11.975 UTC [16953] LOG:  MultiXact member wraparound protections are now enabled
            2019-10-10 01:32:12.438 UTC [16950] LOG:  database system is ready to accept connections
            2019-10-10 01:32:12.439 UTC [26501] LOG:  autovacuum launcher started
    
    
     9. Then remove the temporary SSH access on the backup server, either
    
        by removing the `.more` key file or restoring the previous key
        configuration:
    
            rm /etc/ssh/userkeys/torbackup.more
    
     10. re-enable Puppet:
    
            puppet agent -t
    
    
    ### Troubleshooting
    
    If you get this error in the PostgreSQL restore logs:
    
        No directories from which read is allowed given on cmdline.
        Unkown format in line 1 () when getting dictyotum:main.WAL.00000005.history from backup host
    
    It's because the SSH keys deployed on the server does not have the
    `--read-allowed` argument.
    
    If the restore doesn't work, try to connect by hand:
    
        sudo -u postgres /usr/local/bin/pg-receive-file-from-backup dictyotum bacula.WAL.00000001000005AB0000003A /tmp/foo
    
    
    anarcat's avatar
    anarcat committed
    If you get this kind of errors, it's because you forgot to restore the
    `BASE` file first:
    
        2019-10-08 21:25:43.711 UTC [14320] [unknown]@[unknown] LOG:  incomplete startup packet
        2019-10-08 21:25:44.215 UTC [14326] postgres@postgres FATAL:  the database system is starting up
        2019-10-08 21:25:44.451 UTC [14316] WARNING:  WAL was generated with wal_level=minimal, data may be missing
        2019-10-08 21:25:44.451 UTC [14316] HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.
        2019-10-08 21:25:44.451 UTC [14316] LOG:  consistent recovery state reached at 0/153A3F0
        2019-10-08 21:25:44.451 UTC [14316] LOG:  invalid record length at 0/153A3F0: wanted 24, got 0
        2019-10-08 21:25:44.451 UTC [14316] LOG:  redo is not required
        2019-10-08 21:25:44.721 UTC [14334] postgres@postgres FATAL:  the database system is starting up
        2019-10-08 21:25:45.226 UTC [14340] postgres@postgres FATAL:  the database system is starting up
        2019-10-08 21:25:45.549 UTC [14316] LOG:  selected new timeline ID: 6
        2019-10-08 21:25:45.732 UTC [14343] postgres@postgres FATAL:  the database system is starting up
        2019-10-08 21:25:45.765 UTC [14316] LOG:  archive recovery complete
        2019-10-08 21:25:46.238 UTC [14349] postgres@postgres FATAL:  the database system is starting up
        2019-10-08 21:25:46.466 UTC [14316] LOG:  MultiXact member wraparound protections are now enabled
        2019-10-08 21:25:46.467 UTC [14315] LOG:  database system is ready to accept connections
        2019-10-08 21:25:46.467 UTC [14351] LOG:  autovacuum launcher started
    
    
    See also the "Direct restore procedure" troubleshooting section, which
    also applies here.
    
    
    Dealing with Nagios warnings
    ----------------------------
    
    TODO: there's some information about backup handling in the [Debian
    DSA documentation](https://dsa.debian.org/howto/postgres-backup/).
    
    
    Reference
    =========
    
    Backup design
    -------------
    
    We use upstream's [Continuous Archiving and Point-in-Time Recovery
    (PITR)](https://www.postgresql.org/docs/9.3/continuous-archiving.html)
    which relies on postgres's "write-ahead log" (WAL) to write regular
    "transaction logs" of the cluster to the backup host. (Think of
    transaction logs as incremental backups.) This is configured in
    `postgresql.conf`, using a line like this:
    
        archive_command = '/usr/local/bin/pg-backup-file main WAL %p'
    
    That is a site-specific script which reads a config file in
    `/etc/dsa/pg-backup-file.conf` where the backup host is specified
    (currently `torbackup@bungei.torproject.org`). That command passes the
    WAL logs onto the backup server, over SSH. A WAL file is shipped
    immediately when it is full (16MB of data by default) but no later
    than 6 hours (varies, see `archive_timeout` on each host) after it was
    first written to. On the backup server, the `command` is set to
    `debbackup-ssh-wrap` in the `authorized_keys` file and takes the
    `store-file pg` argument to write the file to the right location.
    
    WAL files are written to `/srv/backups/pg/$HOSTNAME` where `$HOSTNAME`
    (without `.torproject.org`). WAL files are prefixed with `main.WAL.`
    (where `main` is the cluster name) with a long unique string after,
    e.g. `main.WAL.00000001000000A40000007F`.
    
    For that system to work, we also need *full* backups to happen on a
    regular basis. This happens straight from the backup server (still
    `bungei`) which connects to the various postgres servers and runs a
    [pg_basebackup](https://manpages.debian.org/pg_basebackup) to get a complete snapshot of the cluster. This
    happens *weekly* (every 7 to 10 days) in the wrapper
    `postgres-make-base-backups`, which is a wrapper (based on a Puppet
    `concat::fragment` template) that calls
    `postgres-make-one-base-backup` for each postgres server.
    
    The base files are written to the same directory as WAL file and are
    named using the template:
    
        $CLUSTER.BASE.$SERVER_FQDN-$DATE-$ID-$CLIENT_FQDN-$CLUSTER-$VERSION-backup.tar.gz
    
    ... for example:
    
        main.BASE.bungei.torproject.org-20190804-214510-troodi.torproject.org-main-9.6-backup.tar.gz
    
    All of this works because SSH public keys and postgres credentials are
    passed around between servers. That is handled in the Puppet
    `postgresql` module for the most part, but some bits might still be
    configured manually on some servers.
    
    Backups are checked for freshness in Nagios using the
    `dsa-check-backuppg` plugin with its configuration stored in
    `/etc/dsa/postgresql-backup/dsa-check-backuppg.conf.d/`, per
    cluster. The Nagios plugin also takes care of expiring backups when
    they are healthy.
    
    
    Other documentation
    -------------------
    
    See also:
    
     * [PostgreSQL official documentation](https://www.postgresql.org/docs/manuals/)
     * [PostgreSQL wiki](https://wiki.postgresql.org/wiki/Main_Page)
     * [Debian DSA documentation](https://dsa.debian.org/howto/postgres-backup/)