... | @@ -558,19 +558,98 @@ also applies here. |
... | @@ -558,19 +558,98 @@ also applies here. |
|
|
|
|
|
## Installation
|
|
## Installation
|
|
|
|
|
|
<!-- how to setup the service from scratch -->
|
|
PostgreSQL servers are installed by hand on TPO servers, from Debian
|
|
|
|
packages. For example, in Debian bullseye:
|
|
|
|
|
|
|
|
apt install postgresql-13
|
|
|
|
|
|
|
|
To ensure backups are properly performed, the server must also have
|
|
|
|
the `postgres::backup_source` class.
|
|
|
|
|
|
## SLA
|
|
## SLA
|
|
|
|
|
|
<!-- this describes an acceptable level of service for this service -->
|
|
No service level is defined for this service.
|
|
|
|
|
|
## Design
|
|
## Design
|
|
|
|
|
|
|
|
We use PostgreSQL for a handful of services. Each service has its own
|
|
|
|
PostgreSQL server installed, with no high availability or replication,
|
|
|
|
currently, although we use the "write-ahead log" to keep a binary dump
|
|
|
|
of databases on the backup server.
|
|
|
|
|
|
|
|
It should be noted for people unfamiliar with PostgreSQL that it (or
|
|
|
|
at least the Debian package) can manage multiple "clusters" of
|
|
|
|
distinct databases with overlapping namespaces, running on different
|
|
|
|
ports. To quote the [upstream documentation](https://www.postgresql.org/docs/11/tutorial-concepts.html):
|
|
|
|
|
|
|
|
> PostgreSQL is a *relational database management system*
|
|
|
|
> (RDBMS). That means it is a system for managing data stored in
|
|
|
|
> *relations*. Relation is essentially a mathematical term for
|
|
|
|
> *table*. \[...\]
|
|
|
|
>
|
|
|
|
> Each table is a named collection of *rows*. Each row of a given table
|
|
|
|
> has the same set of named *columns*, and each column is of a specific
|
|
|
|
> data type. \[...\]
|
|
|
|
>
|
|
|
|
> Tables are grouped into databases, and a collection of databases
|
|
|
|
> managed by a single PostgreSQL server instance constitutes a
|
|
|
|
> database *cluster*.
|
|
|
|
|
|
|
|
See also the [PostgreSQL architecture fundamentals](https://www.postgresql.org/docs/11/tutorial-arch.html).
|
|
|
|
|
|
|
|
## Issues
|
|
|
|
|
|
|
|
There is no issue tracker specifically for this project, [File][] or
|
|
|
|
[search][] for issues in the [team issue tracker][search].
|
|
|
|
|
|
|
|
[File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
|
|
|
|
[search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues
|
|
|
|
|
|
|
|
## Maintainer, users, and upstream
|
|
|
|
|
|
|
|
PostgreSQL services are part of the core services maintained by
|
|
|
|
TPA. The `postgres` Puppet module and associated backup
|
|
|
|
synchronisation code was written by Peter Palfrader.
|
|
|
|
|
|
|
|
The [PostgreSQL project](https://www.postgresql.org/) itself is a major database, free software
|
|
|
|
project, which calls itself "The World's Most Advanced Open Source
|
|
|
|
Relational Database, with regular releases and a healthy community.
|
|
|
|
|
|
|
|
## Monitoring and testing
|
|
|
|
|
|
|
|
The [Nagios](howto/nagios) monitoring server makes sure that a PostgreSQL process
|
|
|
|
is running on the host when the host is added to the relevant host
|
|
|
|
group (e.g. `postgres11-hosts`).
|
|
|
|
|
|
|
|
It also monitors the [bacula](howto/backups) storage server to make sure snapshots
|
|
|
|
are up to date. Backups are checked for freshness in Nagios using the
|
|
|
|
`dsa-check-backuppg` plugin with its configuration stored in
|
|
|
|
`/etc/dsa/postgresql-backup/dsa-check-backuppg.conf.d/`, per
|
|
|
|
cluster. The Nagios plugin also takes care of expiring backups when
|
|
|
|
they are healthy.
|
|
|
|
|
|
|
|
## Logs and metrics
|
|
|
|
|
|
|
|
PostgreSQL keeps log files in `/var/log/postgresql/`, one per
|
|
|
|
"cluster". Since it logs failed queries, logs may contain PII in the
|
|
|
|
form of SQL queries. The log rotation policy is the one set by the
|
|
|
|
Debian package and keeps logs for 10 weeks.
|
|
|
|
|
|
|
|
A [Prometheus](howto/prometheus) exporter is installed on the [GitLab](howto/gitlab) server by
|
|
|
|
the GitLab omnibus package, but metrics are not collected on other
|
|
|
|
Prometheus servers. The [Grafana](howto/grafana) server has a handful of
|
|
|
|
dashboards in various working states:
|
|
|
|
|
|
|
|
* [GitLab Omnibus - PostgreSQL](https://grafana.torproject.org/d/c_LJgXfmk/gitlab-omnibus-postgresql) - broken
|
|
|
|
* [PostgreSQL Overview (Percona)](https://grafana.torproject.org/d/IvhES05ik/postgresql-overview-percona) - mostly working
|
|
|
|
* [Postgres Overview](https://grafana.torproject.org/d/wGgaPlciz/postgres-overview) - basic dashboard with minimal metrics
|
|
|
|
|
|
|
|
## Backups
|
|
|
|
|
|
We use upstream's [Continuous Archiving and Point-in-Time Recovery
|
|
We use upstream's [Continuous Archiving and Point-in-Time Recovery
|
|
(PITR)](https://www.postgresql.org/docs/11/continuous-archiving.html)
|
|
(PITR)](https://www.postgresql.org/docs/11/continuous-archiving.html) which relies on postgres's "write-ahead log" (WAL) to write
|
|
which relies on postgres's "write-ahead log" (WAL) to write regular
|
|
regular "transaction logs" of the cluster to the backup host. (Think
|
|
"transaction logs" of the cluster to the backup host. (Think of
|
|
of transaction logs as incremental backups.) This is configured in
|
|
transaction logs as incremental backups.) This is configured in
|
|
|
|
`postgresql.conf`, using a line like this:
|
|
`postgresql.conf`, using a line like this:
|
|
|
|
|
|
archive_command = '/usr/local/bin/pg-backup-file main WAL %p'
|
|
archive_command = '/usr/local/bin/pg-backup-file main WAL %p'
|
... | @@ -613,45 +692,7 @@ passed around between servers. That is handled in the Puppet |
... | @@ -613,45 +692,7 @@ passed around between servers. That is handled in the Puppet |
|
`postgresql` module for the most part, but some bits might still be
|
|
`postgresql` module for the most part, but some bits might still be
|
|
configured manually on some servers.
|
|
configured manually on some servers.
|
|
|
|
|
|
Backups are checked for freshness in Nagios using the
|
|
The Nagios plugin (see above) takes care of expiring snapshots.
|
|
`dsa-check-backuppg` plugin with its configuration stored in
|
|
|
|
`/etc/dsa/postgresql-backup/dsa-check-backuppg.conf.d/`, per
|
|
|
|
cluster. The Nagios plugin also takes care of expiring backups when
|
|
|
|
they are healthy.
|
|
|
|
|
|
|
|
## Issues
|
|
|
|
|
|
|
|
<!-- such projects are never over. add a pointer to well-known issues -->
|
|
|
|
<!-- and show how to report problems. usually a link to the bugtracker -->
|
|
|
|
|
|
|
|
There is no issue tracker specifically for this project, [File][] or
|
|
|
|
[search][] for issues in the [team issue tracker][search].
|
|
|
|
|
|
|
|
[File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
|
|
|
|
[search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues
|
|
|
|
|
|
|
|
## Maintainer, users, and upstream
|
|
|
|
|
|
|
|
<!-- document who deployed and operates this service, who the users -->
|
|
|
|
<!-- are, who the upstreams are, if they are still active, -->
|
|
|
|
<!-- collaborative, how do we keep up to date, -->
|
|
|
|
|
|
|
|
## Monitoring and testing
|
|
|
|
|
|
|
|
<!-- describe how this service is monitored and how it can be tested -->
|
|
|
|
<!-- after major changes like IP address changes or upgrades. describe -->
|
|
|
|
<!-- CI, test suites, linting, how security issues and upgrades are -->
|
|
|
|
<!-- tracked -->
|
|
|
|
|
|
|
|
## Logs and metrics
|
|
|
|
|
|
|
|
<!-- where are the logs? how long are they kept? any PII? -->
|
|
|
|
<!-- what about performance metrics? same questions -->
|
|
|
|
|
|
|
|
## Backups
|
|
|
|
|
|
|
|
<!-- does this service need anything special in terms of backups? -->
|
|
|
|
<!-- e.g. locking a database? special recovery procedures? -->
|
|
|
|
|
|
|
|
## Other documentation
|
|
## Other documentation
|
|
|
|
|
... | | ... | |