static-component.md

The "static component" or "static mirror" system is a set of servers,
scripts and services designed to publish content over the world wide
web (HTTP/HTTPS). It is designed to be highly available and
distributed, a sort of content distribution network (CDN).

[[_TOC_]]

# Tutorial

This documentation is about administrating the static site components,
from a sysadmin perspective. User documentation lives in [doc/static-sites](doc/static-sites).

# How-to

## Adding a new component

 1. add the component to Puppet, in `modules/roles/misc/static-components.yaml`:
    
        onionperf.torproject.org:
          master: staticiforme.torproject.org
          source: staticiforme.torproject.org:/srv/onionperf.torproject.org/htdocs/

 2. create the directory on `staticiforme`:
 
        ssh staticiforme "mkdir -p /srv/onionperf.torproject.org/htdocs/ \
            && chown torwww:torwww /srv/onionperf.torproject.org/{,htdocs}" \
            && chmod 770 /srv/onionperf.torproject.org/{,htdocs}"

 3. add the host to DNS, if not already present, see [howto/dns](howto/dns), for
    example add this line in `dns/domains/torproject.org`:

        onionperf	IN	CNAME	static

 4. add an Apache virtual host, by adding a line like this in
    [howto/puppet](howto/puppet) to
    `modules/roles/templates/static-mirroring/vhost/static-vhosts.erb`:

        vhost(lines, 'onionperf.torproject.org')

 5. add an SSL service, by adding a line in [howto/puppet](howto/puppet) to
    `modules/roles/manifests/static_mirror_web.pp`:

        ssl::service { onionperf.torproject.org': ensure => 'ifstatic', notify  => Exec['service apache2 reload'], key => true, }

    This also requires generating an X509 certificate, for which we use
    Let's encrypt. See [howto/letsencrypt](howto/letsencrypt) for details.

 6. add an onion service, by adding another `onion::service` line in
    [howto/puppet](howto/puppet) to `modules/roles/manifests/static_mirror_onion.pp`:

        onion::service {
            [...]
            'onionperf.torproject.org',
            [...]
        }

 7. run Puppet on the master and mirrors:
 
        ssh staticiforme puppet agent -t
        cumin 'C:roles::static_mirror_web' 'puppet agent -t'

    The latter is done with [howto/cumin](howto/cumin), see also [howto/puppet](howto/puppet) for a way
    to do jobs on all hosts.

 8. consider creating a new role and group for the component if none
    match its purpose, see [howto/create-a-new-user](howto/create-a-new-user) for details:
    
        ssh alberti.torproject.org ldapvi -ZZ --encoding=ASCII --ldap-conf -h db.torproject.org -D "uid=$USER,ou=users,dc=torproject,dc=org"

 9. if you created a new group, you will probably need to modify the
    `sudoers` file to grant a user access to the role/group, see
    `modules/sudo/files/sudoers` in the `tor-puppet` repository (and
    [howto/puppet](howto/puppet) to learn about how to make changes to
    Puppet). `onionperf` is a good example of how to create a
    `sudoers` file. edit the file with `visudo` so it checks the
    syntax:
    
        visudo -f modules/sudo/files/sudoers

    This, for example, is the line that was added for `onionperf`:
    
        %torwww,%metrics		STATICMASTER=(mirroradm)	NOPASSWD: /usr/local/bin/static-master-update-component onionperf.torproject.org, /usr/local/bin/static-update-component onionperf.torproject.org

 10. add to Nagios monitoring, in `tor-nagios/config/nagios-master.cfg`:

         -
             name: mirror static sync - atlas
             check: "dsa_check_staticsync!atlas.torproject.org"
             hosts: global
             servicegroups: mirror

## Removing a component

 1. remove the component to Puppet, in `modules/roles/misc/static-components.yaml`

 2. remove the host to DNS, if not already present, see [howto/dns](howto/dns). this
    can be either in `dns/domains.git` or `dns/auto-dns.git`

 3. remove the Apache virtual host, by removing a line like this in
    [howto/puppet](howto/puppet) to
    `modules/roles/templates/static-mirroring/vhost/static-vhosts.erb`:

        vhost(lines, 'onionperf.torproject.org')

 4. remove an SSL service, by removing a line in [howto/puppet](howto/puppet) to
    `modules/roles/manifests/static_mirror_web.pp`:

        ssl::service { onionperf.torproject.org': ensure => 'ifstatic', notify  => Exec['service apache2 reload'], key => true, }

 5. remove the Let's encrypt certificate, see [howto/letsencrypt](howto/letsencrypt) for details

 6. remove onion service, by removing another `onion::service` line in
    [howto/puppet](howto/puppet) to `modules/roles/manifests/static_mirror_onion.pp`:

        onion::service {
            [...]
            'onionperf.torproject.org',
            [...]
        }

 7. remove the `sudo` rules for the role user

 8. remove the home directory specified on the server (often
    `staticiforme`, but can be elsewhere) and mirrors, for example:
 
        ssh staticiforme "mv /home/ooni /home/ooni-OLD ; echo rm -rf /home/ooni-OLD | at now + 7 days"
        cumin -o txt 'C:roles::static_mirror_web' 'mv /srv/static.torproject.org/mirrors/ooni.torproject.org /srv/static.torproject.org/mirrors/ooni.torproject.org-OLD'
        cumin -o txt 'C:roles::static_mirror_web' 'echo rm -rf /srv/static.torproject.org/mirrors/ooni.torproject.org-OLD | at now + 7 days'

 9. consider removing the role user and group in LDAP, if there are no
    files left owned by that user

 10. remove from Nagios, e.g.:
 
        -
         name: mirror static sync - atlas
         check: "dsa_check_staticsync!atlas.torproject.org"
         hosts: global
         servicegroups: mirror

## Pager playbook

TODO: add a pager playbook.

<!-- information about common errors from the monitoring system and -->
<!-- how to deal with them. this should be easy to follow: think of -->
<!-- your future self, in a stressful situation, tired and hungry. -->

## Disaster recovery

TODO: add a disaster recovery.

<!-- what to do if all goes to hell. e.g. restore from backups? -->
<!-- rebuild from scratch? not necessarily those procedures (e.g. see -->
<!-- "Installation" below but some pointers. -->

# Reference

## Installation

Servers are mostly configured in [Puppet](puppet), with some
exceptions. See the [design section](#design) section below for
details on the Puppet classes in use. Typically, a web mirror will use
`roles::static_mirror_web`, for example.

### Web mirror setup

To setup a web mirror, create a new server with the following entries
in LDAP:

    allowedGroups: mirroradm
    allowedGroups: weblogsync

This will ensure the `mirroradm` user  is created on the host.

Then the host needs the following Puppet configuration in Hiera:

```
classes:
  - roles::static_mirror_web
staticsync::static_mirror::get_triggered: false
```

The `get_triggered` parameter ensure the host will not block static
site updates while it's doing its first sync.

Then Puppet can be ran on the host, after `apache2` is installed to
make sure the `apache2` puppet module picks it up:

    apt install apache2
    puppet agent -t

You might need to reboot to get some firewall rules to load correctly:

    reboot

The server should start a sync after reboot. However, it's likely that
the SSH keys it uses to sync have not been propagated to the master
server. If the sync fails, you might receive an email with lots of
lines like:

    [MSM] STAGE1-START (2021-03-11 19:38:59+00:00 on web-chi-03.torproject.org)

It might be worth running the sync by hand, with:

    screen sudo -u mirroradm static-mirror-run-all

The server may also need to be added to the static component
configuration in `modules/roles/misc/static-components.yaml`, if it is
to carry a full mirror, or exclude some components. For example,
`web-fsn-01` and `web-chi-03` both carry all components, so they need
to be added to all `limit-mirrors` statements, like this:

```
components:
  # [...]
  dist.torproject.org:
    master: static-master-fsn.torproject.org
    source: staticiforme.torproject.org:/srv/dist-master.torproject.org/htdocs
    limit-mirrors:
      - archive-01.torproject.org
      - web-cymru-01.torproject.org
      - web-fsn-01.torproject.org
      - web-fsn-02.torproject.org
      - web-chi-03.torproject.org
```

Once that is changed, the `static-mirror-run-all` command needs to be
rerun (although it will also run on the next reboot).

When the sync is finished, you can remove this line:

    staticsync::static_mirror::get_triggered: false

... and the node can be added to the various files in
`dns/auto-dns.git`.

Then, to be added to Fastly, this was also added to Hiera:

    roles::cdn_torproject_org::fastly_backend: true

Once that change is propagated, you need to change the Fastly
configuration using the tools in the [cdn-config-fastly
repository](https://gitlab.torproject.org/tpo/tpa/cdn-config-fastly/). Note that only one of the nodes is a "backend" for
Fastly, and typically not the nodes that are in the main rotation (so
that the Fastly frontend survives if the main rotation dies). But the
main rotation servers act as a backup for the main backend.

## SLA

This service is designed to be highly available. All web sites should
keep working (maybe with some performance degradation) even if one of
the hosts goes down. It should also absorb and tolerate moderate
denial of service attacks.

## Design

The static mirror system is built of three kinds of hosts:

 * `source` - builds and hosts the original content
   (`roles::static_source` in Puppet)
 * `master` - receives the contents from the source, dispatches it
   (atomically) to the mirrors (`roles::static_source` in Puppet)
 * `mirror` - serves the contents to the user
   (`roles::static_mirror_web` in Puppet)

Content is split into different "components", which are units of
content that get synchronized atomically across the different
hosts. Those components are defined in a YAML file in the
`tor-puppet.git` repository
(`modules/roles/misc/static-components.yaml` at the time of writing,
but it might move to Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).

The Jenkins server is also used to build and push websites to static
source servers.

This diagram summarizes how those components talk to each other
graphically:

![Static mirrors architecture diagram](static-component/architecture.png)

A narrative of how changes get propagated through the mirror network
is detailed below.

<!-- this is a rephrased copy of -->
<!-- https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/roles/README.static-mirroring.txt -->

A key advantage of that infrastructure is the higher availability it
provides: whereas individual virtual machines are power-cycled for
scheduled maintenance (e.g. kernel upgrades), static mirroring
machines are removed from the DNS during their maintenance.

### Change process

When data changes, the `source` is responsible for running
`static-update-component`, which instructs the `master` via SSH to run
`static-master-update-component`, transfers a new copy of the source
data to the `master` using rsync(1) and, upon successful copy, swaps
it with the current copy.

The current copy on the `master` is then distributed to all actual
`mirror`s, again placing a new copy alongside their current copy using
`rsync(1)`.

Once the data successfully made it to all mirrors, the mirrors are
instructed to swap the new copy with their current copy, at which
point the updated data will be served to end users.

<!-- end of the copy -->

### Source code inventory

The source code of the static mirror system is spread out in different
files and directories in the `tor-puppet.git` repository:

 * `modules/roles/misc/static-components.yaml` lists the "components"
 * `modules/roles/manifests/` holds the different Puppet roles:
   * `roles::static_mirror` - a generic mirror, see
     `staticsync::static_mirror` below
   * `roles::static_mirror_web` - a web mirror, including most (but
     not necessarily all) components defined in the YAML
     configuration. configures Apache (which the above
     doesn't). includes `roles::static_mirror` (and therefore
     `staticsync::static_mirror`)
   * `roles::static_mirror_onion` - configures the hidden services for
     the web mirrors defined above
   * `roles::static_source` - a generic static source, see
     `staticsync::static_source`, below
   * `roles::static_master` - a generic static master, see
     `staticsync::static_master` below
 * `modules/staticsync/` is the core Puppet module holding most of the
   source code:
   * `staticsync::static_source` - source, which:
     * exports the static user SSH key to the master, punching a hole
       in the firewall
     * collects the SSH keys from the master(s)
   * `staticsync::static_mirror` - a mirror which does the above and:
     * deploys the `static-mirror-run` and `static-mirror-run-all`
       scripts (see below)
     * configures a cron job for `static-mirror-run-all`
     * exports a configuration snippet of `/etc/static-clients.conf`
       for the **master**
   * `staticsync::static_master` - a master which:
     * deploys the `static-master-run` and
       `static-master-update-component` scripts (see below)
     * collects the `static-clients.conf` configuration file, which
       is the hostname (`$::fqdn`) of each of the
       `static_sync::static_mirror` exports
     * configures the `basedir` (currently
       `/srv/static.torproject.org`) and `user` home directory
       (currently `/home/mirroradm`)
     * collects the SSH keys from sources, mirrors and other masters
     * exports the SSH key to the mirrors and sources
   * `staticsync::base`, included by all of the above, deploys:
     * `/etc/static-components.conf`: a file derived from the
       `static-components.yaml` configuration file
     * `/etc/staticsync.conf`: polyglot (bash and Python)
       configuration file propagating the `base` (currently
       `/srv/static.torproject.org`, `masterbase` (currently
       `$base/master`) and `staticuser` (currently `mirroradm`)
       settings
     * `staticsync-ssh-wrap` and `static-update-component` (see below)

TODO: try to figure out why we have `/etc/static-components.conf` and
not directly the `YAML` file shipped to hosts, in
`staticsync::base`. See the `static-components.conf.erb` Puppet
template.

### Scripts walk through

<!-- this is a reformatted copy of the `OVERVIEW` in the staticsync
puppet module -->

- `static-update-component` is run by the user on the **source** host.

  If not run under sudo as the `staticuser` already, it `sudo`'s to the
  `staticuser`, re-executing itself.  It then SSH to the `static-master`
  for that component to run `static-master-update-component`.

  LOCKING: none, but see `static-master-update-component`

- `static-master-update-component` is run on the **master** host

  It `rsync`'s the contents from the **source** host to the static
  **master**, and then triggers `static-master-run` to push the
  content to the mirrors.

  The sync happens to a new `<component>-updating.incoming-XXXXXX`
  directory.  On sync success, `<component>` is replaced with that new
  tree, and the `static-master-run` trigger happens.

  LOCKING: exclusive locks are held on `<component>.lock`

- `static-master-run` triggers all the mirrors for a component to
  initiate syncs. 
  
  When all mirrors have an up-to-date tree, they are
  instructed to update the `cur` symlink to the new tree.

  To begin with, `static-master-run` copies `<component>` to
  `<component>-current-push`.
  
  This is the tree all the mirrors then sync from.  If the push was
  successful, `<component>-current-push` is renamed to
  `<component>-current-live`.

  LOCKING: exclusive locks are held on `<component>.lock`

- `static-mirror-run` runs on a mirror and syncs components.

  There is a symlink called `cur` that points to either `tree-a` or
  `tree-b` for each component.  the `cur` tree is the one that is
  live, the other one usually does not exist, except when a sync is
  ongoing (or a previous one failed and we keep a partial tree).

  During a sync, we sync to the `tree-<X>` that is not the live one.
  When instructed by `static-master-run`, we update the symlink and
  remove the old tree.

  `static-mirror-run` `rsync`'s either `-current-push` or `-current-live`
  for a component.

  LOCKING: during all of `static-mirror-run`, we keep an exclusive
    lock on the `<component>` directory, i.e., the directory that holds
    `tree-[ab]` and `cur`.

- `static-mirror-run-all`

  Run `static-mirror-run` for all components on this mirror, fetching
  the `-live-` tree.

  LOCKING: none, but see `static-mirror-run`.

- `staticsync-ssh-wrap`

  wrapper for ssh job dispatching on source, master, and mirror.

  LOCKING: on **master**, when syncing `-live-` trees, a shared lock
  is held on `<component>.lock` during the rsync process.

<!-- end of the copy -->

The scripts are written in bash except `static-master-run`, written in
Python 2.

### Authentication

The authentication between the static site hosts is entirely done through
SSH. The source hosts are accessible by normal users, which can `sudo`
to a "role" user which has privileges to run the static sync scripts
as sync user. That user then has privileges to contact the master
server which, in turn, can login to the mirrors over SSH as well.

The user's `sudo` configuration is therefore critical and that
`sudoers` configuration could also be considered part of the static
mirror system.

Jenkins has SSH access to the `torwww` user in the static
infrastructure, so it can build and push websites, see below.

### Jenkins build jobs

Jenkins is used to build some websites and push them to the static
mirror infrastructure. The Jenkins jobs get triggered from `git-rw`
git hooks, and are (partially) defined in [jenkins/tools.git](https://gitweb.torproject.org/project/jenkins/tools.git/) and
[jenkins/jobs.git](https://gitweb.torproject.org/project/jenkins/jobs.git/). Those are fed into [jenkins-job-builder](https://docs.openstack.org/infra/jenkins-job-builder/) to
build the actual job. Those jobs actually build the site with hugo or
lektor and package an archive that is then fetched by the static
source.

The [build scripts](https://gitweb.torproject.org/admin/static-builds.git/) are deployed on `staticiforme`, in the
`~torwww` home directory. Those get triggered through the
`~torwww/bin/ssh-wrap` program, hardcoded in
`/etc/ssh/userkeys/torwww`, which picks the right build job based on
the argument provided by the Jenkins job, for example:

        - shell: "cat incoming/output.tar.gz | ssh torwww@staticiforme.torproject.org hugo-website-{site}"

Then the wrapper eventually does something like this to update the
static component on the static source:

    rsync --delete -v -r "${tmpdir}/incoming/output/." "${basedir}"
    static-update-component "$component"

## Issues

There is no issue tracker specifically for this project, [File][] or
[search][] for issues in the [team issue tracker][search].

 [File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
 [search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues

## Monitoring and testing

Static site synchronisation is monitored in Nagios, using a block in
`nagios-master.cfg` which looks like:

    -
        name: mirror static sync - extra
        check: "dsa_check_staticsync!extra.torproject.org"
        hosts: global
        servicegroups: mirror

That script (actually called `dsa-check-mirrorsync`) actually makes an
HTTP request to every mirror and checks the timestamp inside a "trace"
file (`.serial`) to make sure everyone has the same copy of the site.

There's also a miniature reimplementation of [Nagios](howto/nagios) called
[mininag](https://gitweb.torproject.org/admin/dns/mini-nag.git/) which runs on the DNS server. It performs health checks
on the mirrors and takes them out of the DNS zonefiles if they become
unavailable or have a scheduled reboot. This makes it possible to
reboot a server and have the server taken out of rotation
automatically.

## Logs and metrics

All tor webservers keep a minimal amount of logs. The IP address and
time (but not the date) are clear (`00:00:00`). The referrer is
disabled on the client side by sending the `Referrer-Policy
"no-referrer"` header.

The IP addresses are replaced with:

 * `0.0.0.0` - HTTP request
 * `0.0.0.1` - HTTPS request
 * `0.0.0.2` - hidden service request

Logs are kept for two weeks.

Errors may be sent by email.

Metrics are scraped by [Prometheus](prometheus) using the "Apache"
exporter.

## Backups

The `source` hosts are backed up with [Bacula](backups) without any special
provision. 

TODO: check if master / mirror nodes need to be backup. Probably not?

## Other documentation

 * [DSA wiki](https://dsa.debian.org/howto/static-mirroring/)
 * [scripts overview](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/staticsync/files/OVERVIEW)
 * [README.static-mirroring](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/roles/README.static-mirroring.txt)

# Discussion

## Overview

The goal of this discussion section is to consider improvements to the
static site mirror system at torproject.org. It might also apply to
debian.org, but the focus is currently on TPO.

The static site mirror system has been designed for hosting Debian.org
content. Interestingly, it is not used for the operating system
mirrors itself, which are synchronized using another, separate system
([archvsync](https://salsa.debian.org/mirror-team/archvsync/)).

The static mirror system was written for Debian.org by Peter
Palfrader. It has also been patches by other DSA members (Stephen
Gran and Julien Cristau both have more than 100 commits on the old
code base).

This service is critical: it distributes the main torproject.org
websites, but also software releases like the tor project source code
and other websites.

## Limitations

The maintenance status of the mirror code is unclear: while it is
still in use at Debian.org, it is made of a few sets of components
which are not bundled in a single package. This makes it hard to
follow "upstream", although, in theory, it should be possible to
follow the [`dsa-puppet`](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/) repository. In practice, that's pretty
difficult because the `dsa-puppet` and `tor-puppet` repositories have
disconnected histories. Even if they would have a common ancestor, the
code is spread in multiple directories, which makes it hard to
track. There has been some refactoring to move most of the code in a
`staticsync` module, but we still have files strewn over other
modules.

For certain sites, the static site system requires Jenkins to build
websites, which further complicates deployments. A static site
deployment requiring Jenkins needs updates on 5 different
repositories, across 4 different services:

 * a new static component in the (private) `tor-puppet.git` repository
 * a [build script](https://gitweb.torproject.org/project/jenkins/tools.git/tree/slaves/linux/) in the [jenkins/tools.git](https://gitweb.torproject.org/project/jenkins/tools.git/) repository
 * a build job in the [jenkins/jobs.git](https://gitweb.torproject.org/project/jenkins/jobs.git/) repository
 * a [new entry](https://gitweb.torproject.org/admin/static-builds.git/commit/?id=b2344aa1d68f4f065764c6f23d14494020b81f86) in the [ssh wrapper](https://gitweb.torproject.org/admin/static-builds.git/tree/ssh-wrap?id=b2344aa1d68f4f065764c6f23d14494020b81f86) in the
   [admin/static-builds.git](https://gitweb.torproject.org/admin/static-builds.git/) repository
 * a new entry in the `gitolite-admin.git` repository

The static site system has no unit tests, linting, release process, or
CI. Code is deployed directly through Puppet, on the live servers.

There hasn't been a security audit of the system, as far as we could
tell.

Python 2 porting is probably the most pressing issue in this project:
the `static-master-run` program is written in old Python 2.4
code. Thankfully it is fairly short and should be easy to port.

The YAML configuration duplicates the YAML parsing and data structures
present in Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).

## Goals

### Must have

 * high availability: continue serving content even if one (or a few?)
   servers go down
 * atomicity: the deployed content must be coherent
 * high performance: should be able to saturate a gigabit link and
   withstand simple DDOS attacks

### Nice to have

 * cache-busting: changes to a CSS or JavaScript file must be
   propagated to the client reasonably quickly
 * possibly host Debian and RPM package repositories

### Non-Goals

 * implement our own global content distribution network

## Approvals required

Should be approved by TPA.

## Proposed Solution

The static mirror system certainly has its merits: it's flexible,
powerful and provides a reasonably easy to deploy, high availability
service, at the cost of some level of obscurity, complexity, and high
disk space requirements.

## Cost

Staff, mostly. We expect a reduction in cost if we reduce the number
of copies of the sites we have to keep around.

## Alternatives considered

<!-- include benchmarks and procedure if relevant -->

 * [GitLab pages](https://docs.gitlab.com/ee/administration/pages/) could be used as a source?
 * the [cache system](cache) could be used as a replacement in the
   front-end

TODO: benchmark gitlab pages vs (say) apache or nginx.

### GitLab pages replacement

It should be possible to replace parts or the entirety of the system
progressively, however. A few ideas:

 * the **mirror** hosts could be replaced by the [cache
   system](cache). this would possibly require shifting the web service
   from the **mirror** to the **master** or at least some significant
   re-architecture
 * the **source** hosts could be replaced by some parts of the [GitLab
   Pages](https://docs.gitlab.com/ee/administration/pages/) system. unfortunately, that system relies on a custom
   webserver, but it might be possible to bypass that and directly
   access the on-disk files provided by the CI.

The architecture would look something like this:

![Static system redesign architecture diagram](static-component/architecture-gitlab-pages.png
)

Details of the GitLab pages design and installation is available [in
our GitLab documentation](howto/gitlab#gitlab-pages).

Concerns about this approach:

 * GitLab pages is a custom webserver which issues TLS certs for the
   custom domains and serves the content, it's unclear how reliable or
   performant that server is
 * The pages design assumes the existence of a shared filesystem to
   deploy content, currently NFS, but they are switching to S3 (as
   explained above), which introduces significant complexity and moves
   away from the classic "everything is a file" approach
 * The new design also introduces a dependency on the main GitLab
   rails API for availability, which could be a concern, especially
   since that is [usually a "non-free" feature](https://about.gitlab.com/pricing/self-managed/feature-comparison/) (e.g. [PostgreSQL
   replication and failover](https://docs.gitlab.com/ee/administration/postgresql/replication_and_failover.html), [Database load-balancing](https://docs.gitlab.com/ee/administration/database_load_balancing.html),
   [traffic load balancer](https://docs.gitlab.com/ee/administration/reference_architectures/#traffic-load-balancer), [Geo disaster recovery](https://docs.gitlab.com/ee/administration/geo/disaster_recovery/index.html) and,
   generally, [all of Geo](https://about.gitlab.com/solutions/geo/) and most [availability components](https://docs.gitlab.com/ee/administration/reference_architectures/#availability-components)
   are non-free).
 * In general, this increases dependency on GitLab for deployments

Next steps:

 1. [ ] check if the GitLab Pages subsystem provides atomic updates
 2. [x] see how GitLab Pages can be distributed to multiple hosts and
        how scalable it actually is or if we'll need to run the cache
        frontend in front of it. **update**: it can, but with
        significant caveats in terms of complexity, see above
 3. [ ] setup GitLab pages to test with small, non-critical websites
        (e.g. API documentation, etc)
 4. [ ] test the [GitLab pages API-based configuration](https://docs.gitlab.com/ee/administration/pages/#gitlab-api-based-configuration) and see how
        it handles outages of the main rails API
 5. [ ] test the [object storage system](https://docs.gitlab.com/ee/administration/pages/#using-object-storage) and see if it is usable,
        debuggable, highly available and performant enough for our
        needs
 6. [ ] keep track of upstream development of the GitLab pages
        architecture, [see this comment from anarcat](https://gitlab.com/groups/gitlab-org/-/epics/1316#note_496404589) outlining
        some of those concerns

### Replacing Jenkins with GitLab CI as a builder

See the [Jenkins documentation](service/jenkins#gitlab-ci-replacement)
for more information on that front.

<!--  LocalWords:  atomicity DDOS YAML Hiera webserver NFS CephFS TLS
 -->
<!--  LocalWords:  filesystem GitLab scalable frontend CDN HTTPS DNS
 -->
<!--  LocalWords:  howto Nagios SSL TOC dns letsencrypt sudo LDAP SLA
 -->
<!--  LocalWords:  rsync cron hostname symlink webservers Bacula DSA
 -->
<!--  LocalWords:  torproject debian TPO Palfrader Julien Cristau TPA
 -->
<!--  LocalWords:  LocalWords
 -->