Skip to content
Snippets Groups Projects
static-component.md 29.1 KiB
Newer Older
  • Learn to ignore specific revisions
  • anarcat's avatar
    anarcat committed
    The "static component" or "static mirror" system is a set of servers,
    scripts and services designed to publish content over the world wide
    web (HTTP/HTTPS). It is designed to be highly available and
    distributed, a sort of content distribution network (CDN).
    
    [[_TOC_]]
    
    # Tutorial
    
    anarcat's avatar
    anarcat committed
    This documentation is about administrating the static site components,
    
    anarcat's avatar
    anarcat committed
    from a sysadmin perspective. User documentation lives in [doc/static-sites](doc/static-sites).
    
    anarcat's avatar
    anarcat committed
    
    
    anarcat's avatar
    anarcat committed
    # How-to
    
    ## Adding a new component
    
    
     1. add the component to Puppet, in `modules/roles/misc/static-components.yaml`:
        
            onionperf.torproject.org:
              master: staticiforme.torproject.org
              source: staticiforme.torproject.org:/srv/onionperf.torproject.org/htdocs/
    
     2. create the directory on `staticiforme`:
     
    
            ssh staticiforme "mkdir -p /srv/onionperf.torproject.org/htdocs/ \
    
                && chown torwww:torwww /srv/onionperf.torproject.org/{,htdocs}" \
                && chmod 770 /srv/onionperf.torproject.org/{,htdocs}"
    
    anarcat's avatar
    anarcat committed
     3. add the host to DNS, if not already present, see [howto/dns](howto/dns), for
    
    anarcat's avatar
    anarcat committed
        example add this line in `dns/domains/torproject.org`:
    
            onionperf	IN	CNAME	static
    
    anarcat's avatar
    anarcat committed
    
    
     4. add an Apache virtual host, by adding a line like this in
    
    anarcat's avatar
    anarcat committed
        [howto/puppet](howto/puppet) to
    
        `modules/roles/templates/static-mirroring/vhost/static-vhosts.erb`:
    
            vhost(lines, 'onionperf.torproject.org')
    
    
    anarcat's avatar
    anarcat committed
     5. add an SSL service, by adding a line in [howto/puppet](howto/puppet) to
    
        `modules/roles/manifests/static_mirror_web.pp`:
    
            ssl::service { onionperf.torproject.org': ensure => 'ifstatic', notify  => Exec['service apache2 reload'], key => true, }
    
    
    anarcat's avatar
    anarcat committed
        This also requires generating an X509 certificate, for which we use
    
    anarcat's avatar
    anarcat committed
        Let's encrypt. See [howto/letsencrypt](howto/letsencrypt) for details.
    
    anarcat's avatar
    anarcat committed
    
    
     6. add an onion service, by adding another `onion::service` line in
    
    anarcat's avatar
    anarcat committed
        [howto/puppet](howto/puppet) to `modules/roles/manifests/static_mirror_onion.pp`:
    
    
            onion::service {
                [...]
                'onionperf.torproject.org',
                [...]
            }
    
    
     7. run Puppet on the master and mirrors:
     
            ssh staticiforme puppet agent -t
            cumin 'C:roles::static_mirror_web' 'puppet agent -t'
    
    
    anarcat's avatar
    anarcat committed
        The latter is done with [howto/cumin](howto/cumin), see also [howto/puppet](howto/puppet) for a way
    
        to do jobs on all hosts.
    
     8. consider creating a new role and group for the component if none
    
    anarcat's avatar
    anarcat committed
        match its purpose, see [howto/create-a-new-user](howto/create-a-new-user) for details:
    
        
            ssh alberti.torproject.org ldapvi -ZZ --encoding=ASCII --ldap-conf -h db.torproject.org -D "uid=$USER,ou=users,dc=torproject,dc=org"
    
    
     9. if you created a new group, you will probably need to modify the
    
        `sudoers` file to grant a user access to the role/group, see
        `modules/sudo/files/sudoers` in the `tor-puppet` repository (and
    
    anarcat's avatar
    anarcat committed
        [howto/puppet](howto/puppet) to learn about how to make changes to
    
        Puppet). `onionperf` is a good example of how to create a
        `sudoers` file. edit the file with `visudo` so it checks the
        syntax:
        
            visudo -f modules/sudo/files/sudoers
    
        This, for example, is the line that was added for `onionperf`:
        
            %torwww,%metrics		STATICMASTER=(mirroradm)	NOPASSWD: /usr/local/bin/static-master-update-component onionperf.torproject.org, /usr/local/bin/static-update-component onionperf.torproject.org
    
    anarcat's avatar
    anarcat committed
     10. add to Nagios monitoring, in `tor-nagios/config/nagios-master.cfg`:
    
    
             -
                 name: mirror static sync - atlas
                 check: "dsa_check_staticsync!atlas.torproject.org"
                 hosts: global
                 servicegroups: mirror
    
    
    anarcat's avatar
    anarcat committed
    ## Removing a component
    
    anarcat's avatar
    anarcat committed
    
     1. remove the component to Puppet, in `modules/roles/misc/static-components.yaml`
    
    
    anarcat's avatar
    anarcat committed
     2. remove the host to DNS, if not already present, see [howto/dns](howto/dns). this
    
    anarcat's avatar
    anarcat committed
        can be either in `dns/domains.git` or `dns/auto-dns.git`
    
    
     3. remove the Apache virtual host, by removing a line like this in
    
    anarcat's avatar
    anarcat committed
        [howto/puppet](howto/puppet) to
    
    anarcat's avatar
    anarcat committed
        `modules/roles/templates/static-mirroring/vhost/static-vhosts.erb`:
    
            vhost(lines, 'onionperf.torproject.org')
    
    
    anarcat's avatar
    anarcat committed
     4. remove an SSL service, by removing a line in [howto/puppet](howto/puppet) to
    
    anarcat's avatar
    anarcat committed
        `modules/roles/manifests/static_mirror_web.pp`:
    
            ssl::service { onionperf.torproject.org': ensure => 'ifstatic', notify  => Exec['service apache2 reload'], key => true, }
    
    
    anarcat's avatar
    anarcat committed
     5. remove the Let's encrypt certificate, see [howto/letsencrypt](howto/letsencrypt) for details
    
     6. remove onion service, by removing another `onion::service` line in
    
    anarcat's avatar
    anarcat committed
        [howto/puppet](howto/puppet) to `modules/roles/manifests/static_mirror_onion.pp`:
    
    anarcat's avatar
    anarcat committed
    
            onion::service {
                [...]
                'onionperf.torproject.org',
                [...]
            }
    
    
    anarcat's avatar
    anarcat committed
     7. remove the `sudo` rules for the role user
    
    
     8. remove the home directory specified on the server (often
        `staticiforme`, but can be elsewhere) and mirrors, for example:
    
            ssh staticiforme "mv /home/ooni /home/ooni-OLD ; echo rm -rf /home/ooni-OLD | at now + 7 days"
            cumin -o txt 'C:roles::static_mirror_web' 'mv /srv/static.torproject.org/mirrors/ooni.torproject.org /srv/static.torproject.org/mirrors/ooni.torproject.org-OLD'
            cumin -o txt 'C:roles::static_mirror_web' 'echo rm -rf /srv/static.torproject.org/mirrors/ooni.torproject.org-OLD | at now + 7 days'
    
    
    anarcat's avatar
    anarcat committed
     9. consider removing the role user and group in LDAP, if there are no
        files left owned by that user
    
    anarcat's avatar
    anarcat committed
     10. remove from Nagios, e.g.:
    
     
            -
             name: mirror static sync - atlas
             check: "dsa_check_staticsync!atlas.torproject.org"
             hosts: global
             servicegroups: mirror
    
    anarcat's avatar
    anarcat committed
    
    ## Pager playbook
    
    
    anarcat's avatar
    anarcat committed
    <!-- information about common errors from the monitoring system and -->
    <!-- how to deal with them. this should be easy to follow: think of -->
    <!-- your future self, in a stressful situation, tired and hungry. -->
    
    ## Disaster recovery
    
    
    TODO: add a disaster recovery.
    
    
    anarcat's avatar
    anarcat committed
    <!-- what to do if all goes to hell. e.g. restore from backups? -->
    <!-- rebuild from scratch? not necessarily those procedures (e.g. see -->
    <!-- "Installation" below but some pointers. -->
    
    # Reference
    
    ## Installation
    
    Servers are mostly configured in [Puppet](puppet), with some
    exceptions. See the [design section](#design) section below for
    details on the Puppet classes in use. Typically, a web mirror will use
    `roles::static_mirror_web`, for example.
    
    anarcat's avatar
    anarcat committed
    
    
    ### Web mirror setup
    
    To setup a web mirror, create a new server with the following entries
    in LDAP:
    
        allowedGroups: mirroradm
        allowedGroups: weblogsync
    
    This will ensure the `mirroradm` user  is created on the host.
    
    Then the host needs the following Puppet configuration in Hiera:
    
    ```
    classes:
      - roles::static_mirror_web
    staticsync::static_mirror::get_triggered: false
    ```
    
    The `get_triggered` parameter ensure the host will not block static
    site updates while it's doing its first sync.
    
    Then Puppet can be ran on the host, after `apache2` is installed to
    make sure the `apache2` puppet module picks it up:
    
        apt install apache2
        puppet agent -t
    
    You might need to reboot to get some firewall rules to load correctly:
    
        reboot
    
    The server should start a sync after reboot. However, it's likely that
    the SSH keys it uses to sync have not been propagated to the master
    server. If the sync fails, you might receive an email with lots of
    lines like:
    
        [MSM] STAGE1-START (2021-03-11 19:38:59+00:00 on web-chi-03.torproject.org)
    
    It might be worth running the sync by hand, with:
    
        screen sudo -u mirroradm static-mirror-run-all
    
    The server may also need to be added to the static component
    configuration in `modules/roles/misc/static-components.yaml`, if it is
    to carry a full mirror, or exclude some components. For example,
    `web-fsn-01` and `web-chi-03` both carry all components, so they need
    to be added to all `limit-mirrors` statements, like this:
    
    ```
    components:
      # [...]
      dist.torproject.org:
        master: static-master-fsn.torproject.org
        source: staticiforme.torproject.org:/srv/dist-master.torproject.org/htdocs
        limit-mirrors:
          - archive-01.torproject.org
          - web-cymru-01.torproject.org
          - web-fsn-01.torproject.org
          - web-fsn-02.torproject.org
          - web-chi-03.torproject.org
    ```
    
    Once that is changed, the `static-mirror-run-all` command needs to be
    rerun (although it will also run on the next reboot).
    
    anarcat's avatar
    anarcat committed
    When the sync is finished, you can remove this line:
    
        staticsync::static_mirror::get_triggered: false
    
    ... and the node can be added to the various files in
    `dns/auto-dns.git`.
    
    Then, to be added to Fastly, this was also added to Hiera:
    
    
        roles::cdn_torproject_org::fastly_backend: true
    
    
    anarcat's avatar
    anarcat committed
    Once that change is propagated, you need to change the Fastly
    configuration using the tools in the [cdn-config-fastly
    repository](https://gitlab.torproject.org/tpo/tpa/cdn-config-fastly/). Note that only one of the nodes is a "backend" for
    Fastly, and typically not the nodes that are in the main rotation (so
    that the Fastly frontend survives if the main rotation dies). But the
    main rotation servers act as a backup for the main backend.
    
    anarcat's avatar
    anarcat committed
    ## SLA
    
    
    This service is designed to be highly available. All web sites should
    keep working (maybe with some performance degradation) even if one of
    the hosts goes down. It should also absorb and tolerate moderate
    denial of service attacks.
    
    anarcat's avatar
    anarcat committed
    
    ## Design
    
    
    The static mirror system is built of three kinds of hosts:
    
    anarcat's avatar
    anarcat committed
    
    
     * `source` - builds and hosts the original content
    
    anarcat's avatar
    anarcat committed
       (`roles::static_source` in Puppet)
    
     * `master` - receives the contents from the source, dispatches it
    
    anarcat's avatar
    anarcat committed
       (atomically) to the mirrors (`roles::static_source` in Puppet)
    
     * `mirror` - serves the contents to the user
    
    anarcat's avatar
    anarcat committed
       (`roles::static_mirror_web` in Puppet)
    
    Content is split into different "components", which are units of
    content that get synchronized atomically across the different
    hosts. Those components are defined in a YAML file in the
    `tor-puppet.git` repository
    (`modules/roles/misc/static-components.yaml` at the time of writing,
    but it might move to Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).
    
    
    The Jenkins server is also used to build and push websites to static
    source servers.
    
    This diagram summarizes how those components talk to each other
    graphically:
    
    
    ![Static mirrors architecture diagram](static-component/architecture.png)
    
    A narrative of how changes get propagated through the mirror network
    is detailed below.
    
    
    <!-- this is a rephrased copy of -->
    <!-- https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/roles/README.static-mirroring.txt -->
    
    A key advantage of that infrastructure is the higher availability it
    provides: whereas individual virtual machines are power-cycled for
    scheduled maintenance (e.g. kernel upgrades), static mirroring
    machines are removed from the DNS during their maintenance.
    
    
    anarcat's avatar
    anarcat committed
    ### Change process
    
    
    When data changes, the `source` is responsible for running
    `static-update-component`, which instructs the `master` via SSH to run
    `static-master-update-component`, transfers a new copy of the source
    data to the `master` using rsync(1) and, upon successful copy, swaps
    it with the current copy.
    
    The current copy on the `master` is then distributed to all actual
    `mirror`s, again placing a new copy alongside their current copy using
    `rsync(1)`.
    
    Once the data successfully made it to all mirrors, the mirrors are
    instructed to swap the new copy with their current copy, at which
    point the updated data will be served to end users.
    
    <!-- end of the copy -->
    
    
    anarcat's avatar
    anarcat committed
    ### Source code inventory
    
    The source code of the static mirror system is spread out in different
    files and directories in the `tor-puppet.git` repository:
    
     * `modules/roles/misc/static-components.yaml` lists the "components"
     * `modules/roles/manifests/` holds the different Puppet roles:
       * `roles::static_mirror` - a generic mirror, see
         `staticsync::static_mirror` below
       * `roles::static_mirror_web` - a web mirror, including most (but
    
    anarcat's avatar
    anarcat committed
         not necessarily all) components defined in the YAML
    
    anarcat's avatar
    anarcat committed
         configuration. configures Apache (which the above
         doesn't). includes `roles::static_mirror` (and therefore
         `staticsync::static_mirror`)
       * `roles::static_mirror_onion` - configures the hidden services for
         the web mirrors defined above
       * `roles::static_source` - a generic static source, see
         `staticsync::static_source`, below
       * `roles::static_master` - a generic static master, see
         `staticsync::static_master` below
     * `modules/staticsync/` is the core Puppet module holding most of the
       source code:
       * `staticsync::static_source` - source, which:
         * exports the static user SSH key to the master, punching a hole
           in the firewall
         * collects the SSH keys from the master(s)
       * `staticsync::static_mirror` - a mirror which does the above and:
         * deploys the `static-mirror-run` and `static-mirror-run-all`
           scripts (see below)
         * configures a cron job for `static-mirror-run-all`
         * exports a configuration snippet of `/etc/static-clients.conf`
           for the **master**
       * `staticsync::static_master` - a master which:
         * deploys the `static-master-run` and
           `static-master-update-component` scripts (see below)
         * collects the `static-clients.conf` configuration file, which
           is the hostname (`$::fqdn`) of each of the
           `static_sync::static_mirror` exports
         * configures the `basedir` (currently
           `/srv/static.torproject.org`) and `user` home directory
           (currently `/home/mirroradm`)
         * collects the SSH keys from sources, mirrors and other masters
         * exports the SSH key to the mirrors and sources
       * `staticsync::base`, included by all of the above, deploys:
         * `/etc/static-components.conf`: a file derived from the
    
    anarcat's avatar
    anarcat committed
           `static-components.yaml` configuration file
    
    anarcat's avatar
    anarcat committed
         * `/etc/staticsync.conf`: polyglot (bash and Python)
           configuration file propagating the `base` (currently
           `/srv/static.torproject.org`, `masterbase` (currently
           `$base/master`) and `staticuser` (currently `mirroradm`)
           settings
         * `staticsync-ssh-wrap` and `static-update-component` (see below)
    
    TODO: try to figure out why we have `/etc/static-components.conf` and
    not directly the `YAML` file shipped to hosts, in
    `staticsync::base`. See the `static-components.conf.erb` Puppet
    template.
    
    
    anarcat's avatar
    anarcat committed
    ### Scripts walk through
    
    anarcat's avatar
    anarcat committed
    
    <!-- this is a reformatted copy of the `OVERVIEW` in the staticsync
    puppet module -->
    
    - `static-update-component` is run by the user on the **source** host.
    
    
    anarcat's avatar
    anarcat committed
      If not run under sudo as the `staticuser` already, it `sudo`'s to the
      `staticuser`, re-executing itself.  It then SSH to the `static-master`
    
    anarcat's avatar
    anarcat committed
      for that component to run `static-master-update-component`.
    
      LOCKING: none, but see `static-master-update-component`
    
    - `static-master-update-component` is run on the **master** host
    
    
    anarcat's avatar
    anarcat committed
      It `rsync`'s the contents from the **source** host to the static
    
    anarcat's avatar
    anarcat committed
      **master**, and then triggers `static-master-run` to push the
      content to the mirrors.
    
      The sync happens to a new `<component>-updating.incoming-XXXXXX`
      directory.  On sync success, `<component>` is replaced with that new
      tree, and the `static-master-run` trigger happens.
    
      LOCKING: exclusive locks are held on `<component>.lock`
    
    - `static-master-run` triggers all the mirrors for a component to
      initiate syncs. 
      
      When all mirrors have an up-to-date tree, they are
      instructed to update the `cur` symlink to the new tree.
    
      To begin with, `static-master-run` copies `<component>` to
      `<component>-current-push`.
      
      This is the tree all the mirrors then sync from.  If the push was
      successful, `<component>-current-push` is renamed to
      `<component>-current-live`.
    
      LOCKING: exclusive locks are held on `<component>.lock`
    
    - `static-mirror-run` runs on a mirror and syncs components.
    
      There is a symlink called `cur` that points to either `tree-a` or
      `tree-b` for each component.  the `cur` tree is the one that is
      live, the other one usually does not exist, except when a sync is
      ongoing (or a previous one failed and we keep a partial tree).
    
      During a sync, we sync to the `tree-<X>` that is not the live one.
      When instructed by `static-master-run`, we update the symlink and
      remove the old tree.
    
    
    anarcat's avatar
    anarcat committed
      `static-mirror-run` `rsync`'s either `-current-push` or `-current-live`
    
    anarcat's avatar
    anarcat committed
      for a component.
    
      LOCKING: during all of `static-mirror-run`, we keep an exclusive
    
    anarcat's avatar
    anarcat committed
        lock on the `<component>` directory, i.e., the directory that holds
    
    anarcat's avatar
    anarcat committed
        `tree-[ab]` and `cur`.
    
    - `static-mirror-run-all`
    
      Run `static-mirror-run` for all components on this mirror, fetching
      the `-live-` tree.
    
      LOCKING: none, but see `static-mirror-run`.
    
    - `staticsync-ssh-wrap`
    
      wrapper for ssh job dispatching on source, master, and mirror.
    
      LOCKING: on **master**, when syncing `-live-` trees, a shared lock
      is held on `<component>.lock` during the rsync process.
    
    anarcat's avatar
    anarcat committed
    <!-- end of the copy -->
    
    The scripts are written in bash except `static-master-run`, written in
    Python 2.
    
    ### Authentication
    
    
    anarcat's avatar
    anarcat committed
    The authentication between the static site hosts is entirely done through
    
    anarcat's avatar
    anarcat committed
    SSH. The source hosts are accessible by normal users, which can `sudo`
    to a "role" user which has privileges to run the static sync scripts
    as sync user. That user then has privileges to contact the master
    server which, in turn, can login to the mirrors over SSH as well.
    
    The user's `sudo` configuration is therefore critical and that
    `sudoers` configuration could also be considered part of the static
    mirror system.
    
    anarcat's avatar
    anarcat committed
    
    
    Jenkins has SSH access to the `torwww` user in the static
    infrastructure, so it can build and push websites, see below.
    
    ### Jenkins build jobs
    
    Jenkins is used to build some websites and push them to the static
    mirror infrastructure. The Jenkins jobs get triggered from `git-rw`
    git hooks, and are (partially) defined in [jenkins/tools.git](https://gitweb.torproject.org/project/jenkins/tools.git/) and
    [jenkins/jobs.git](https://gitweb.torproject.org/project/jenkins/jobs.git/). Those are fed into [jenkins-job-builder](https://docs.openstack.org/infra/jenkins-job-builder/) to
    build the actual job. Those jobs actually build the site with hugo or
    lektor and package an archive that is then fetched by the static
    source.
    
    The [build scripts](https://gitweb.torproject.org/admin/static-builds.git/) are deployed on `staticiforme`, in the
    `~torwww` home directory. Those get triggered through the
    `~torwww/bin/ssh-wrap` program, hardcoded in
    `/etc/ssh/userkeys/torwww`, which picks the right build job based on
    the argument provided by the Jenkins job, for example:
    
            - shell: "cat incoming/output.tar.gz | ssh torwww@staticiforme.torproject.org hugo-website-{site}"
    
    Then the wrapper eventually does something like this to update the
    static component on the static source:
    
        rsync --delete -v -r "${tmpdir}/incoming/output/." "${basedir}"
        static-update-component "$component"
    
    anarcat's avatar
    anarcat committed
    
    There is no issue tracker specifically for this project, [File][] or
    [search][] for issues in the [team issue tracker][search].
    
     [File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
     [search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues
    
    ## Monitoring and testing
    
    
    Static site synchronisation is monitored in Nagios, using a block in
    `nagios-master.cfg` which looks like:
    
        -
            name: mirror static sync - extra
            check: "dsa_check_staticsync!extra.torproject.org"
            hosts: global
            servicegroups: mirror
    
    That script (actually called `dsa-check-mirrorsync`) actually makes an
    HTTP request to every mirror and checks the timestamp inside a "trace"
    file (`.serial`) to make sure everyone has the same copy of the site.
    
    anarcat's avatar
    anarcat committed
    
    
    anarcat's avatar
    anarcat committed
    There's also a miniature reimplementation of [Nagios](howto/nagios) called
    [mininag](https://gitweb.torproject.org/admin/dns/mini-nag.git/) which runs on the DNS server. It performs health checks
    on the mirrors and takes them out of the DNS zonefiles if they become
    unavailable or have a scheduled reboot. This makes it possible to
    reboot a server and have the server taken out of rotation
    automatically.
    
    
    anarcat's avatar
    anarcat committed
    ## Logs and metrics
    
    
    All tor webservers keep a minimal amount of logs. The IP address and
    
    anarcat's avatar
    anarcat committed
    time (but not the date) are clear (`00:00:00`). The referrer is
    
    disabled on the client side by sending the `Referrer-Policy
    "no-referrer"` header.
    
    The IP addresses are replaced with:
    
     * `0.0.0.0` - HTTP request
     * `0.0.0.1` - HTTPS request
     * `0.0.0.2` - hidden service request
    
    Logs are kept for two weeks.
    
    
    anarcat's avatar
    anarcat committed
    Errors may be sent by email.
    
    anarcat's avatar
    anarcat committed
    Metrics are scraped by [Prometheus](prometheus) using the "Apache"
    
    anarcat's avatar
    anarcat committed
    
    ## Backups
    
    
    anarcat's avatar
    anarcat committed
    The `source` hosts are backed up with [Bacula](backups) without any special
    
    provision. 
    
    TODO: check if master / mirror nodes need to be backup. Probably not?
    
    anarcat's avatar
    anarcat committed
    
    ## Other documentation
    
    
     * [DSA wiki](https://dsa.debian.org/howto/static-mirroring/)
     * [scripts overview](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/staticsync/files/OVERVIEW)
     * [README.static-mirroring](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/master/modules/roles/README.static-mirroring.txt)
    
    anarcat's avatar
    anarcat committed
    
    # Discussion
    
    ## Overview
    
    
    The goal of this discussion section is to consider improvements to the
    static site mirror system at torproject.org. It might also apply to
    debian.org, but the focus is currently on TPO.
    
    anarcat's avatar
    anarcat committed
    
    
    anarcat's avatar
    anarcat committed
    The static site mirror system has been designed for hosting Debian.org
    content. Interestingly, it is not used for the operating system
    mirrors itself, which are synchronized using another, separate system
    ([archvsync](https://salsa.debian.org/mirror-team/archvsync/)).
    
    The static mirror system was written for Debian.org by Peter
    Palfrader. It has also been patches by other DSA members (Stephen
    Gran and Julien Cristau both have more than 100 commits on the old
    code base).
    
    This service is critical: it distributes the main torproject.org
    websites, but also software releases like the tor project source code
    and other websites.
    
    
    ## Limitations
    
    The maintenance status of the mirror code is unclear: while it is
    still in use at Debian.org, it is made of a few sets of components
    which are not bundled in a single package. This makes it hard to
    follow "upstream", although, in theory, it should be possible to
    follow the [`dsa-puppet`](https://salsa.debian.org/dsa-team/mirror/dsa-puppet/) repository. In practice, that's pretty
    difficult because the `dsa-puppet` and `tor-puppet` repositories have
    disconnected histories. Even if they would have a common ancestor, the
    code is spread in multiple directories, which makes it hard to
    track. There has been some refactoring to move most of the code in a
    `staticsync` module, but we still have files strewn over other
    modules.
    
    For certain sites, the static site system requires Jenkins to build
    websites, which further complicates deployments. A static site
    deployment requiring Jenkins needs updates on 5 different
    repositories, across 4 different services:
    
     * a new static component in the (private) `tor-puppet.git` repository
     * a [build script](https://gitweb.torproject.org/project/jenkins/tools.git/tree/slaves/linux/) in the [jenkins/tools.git](https://gitweb.torproject.org/project/jenkins/tools.git/) repository
     * a build job in the [jenkins/jobs.git](https://gitweb.torproject.org/project/jenkins/jobs.git/) repository
     * a [new entry](https://gitweb.torproject.org/admin/static-builds.git/commit/?id=b2344aa1d68f4f065764c6f23d14494020b81f86) in the [ssh wrapper](https://gitweb.torproject.org/admin/static-builds.git/tree/ssh-wrap?id=b2344aa1d68f4f065764c6f23d14494020b81f86) in the
       [admin/static-builds.git](https://gitweb.torproject.org/admin/static-builds.git/) repository
     * a new entry in the `gitolite-admin.git` repository
    
    
    anarcat's avatar
    anarcat committed
    The static site system has no unit tests, linting, release process, or
    CI. Code is deployed directly through Puppet, on the live servers.
    
    There hasn't been a security audit of the system, as far as we could
    tell.
    
    Python 2 porting is probably the most pressing issue in this project:
    the `static-master-run` program is written in old Python 2.4
    code. Thankfully it is fairly short and should be easy to port.
    
    The YAML configuration duplicates the YAML parsing and data structures
    present in Hiera, see [issue 30020](https://gitlab.torproject.org/tpo/tpa/team/-/issues/30020) and [puppet](puppet)).
    
    
    anarcat's avatar
    anarcat committed
    ## Goals
    
    anarcat's avatar
    anarcat committed
    ### Must have
    
    
    anarcat's avatar
    anarcat committed
     * high availability: continue serving content even if one (or a few?)
       servers go down
     * atomicity: the deployed content must be coherent
     * high performance: should be able to saturate a gigabit link and
       withstand simple DDOS attacks
    
    
    anarcat's avatar
    anarcat committed
    ### Nice to have
    
    
    anarcat's avatar
    anarcat committed
     * cache-busting: changes to a CSS or JavaScript file must be
       propagated to the client reasonably quickly
     * possibly host Debian and RPM package repositories
    
    
    anarcat's avatar
    anarcat committed
    ### Non-Goals
    
    
    anarcat's avatar
    anarcat committed
     * implement our own global content distribution network
    
    
    anarcat's avatar
    anarcat committed
    ## Approvals required
    
    anarcat's avatar
    anarcat committed
    
    ## Proposed Solution
    
    
    anarcat's avatar
    anarcat committed
    The static mirror system certainly has its merits: it's flexible,
    powerful and provides a reasonably easy to deploy, high availability
    service, at the cost of some level of obscurity, complexity, and high
    disk space requirements.
    
    ## Cost
    
    Staff, mostly. We expect a reduction in cost if we reduce the number
    of copies of the sites we have to keep around.
    
    ## Alternatives considered
    
    <!-- include benchmarks and procedure if relevant -->
    
     * [GitLab pages](https://docs.gitlab.com/ee/administration/pages/) could be used as a source?
     * the [cache system](cache) could be used as a replacement in the
       front-end
    
    TODO: benchmark gitlab pages vs (say) apache or nginx.
    
    ### GitLab pages replacement
    
    anarcat's avatar
    anarcat committed
    It should be possible to replace parts or the entirety of the system
    progressively, however. A few ideas:
    
    anarcat's avatar
    anarcat committed
     * the **mirror** hosts could be replaced by the [cache
       system](cache). this would possibly require shifting the web service
       from the **mirror** to the **master** or at least some significant
       re-architecture
     * the **source** hosts could be replaced by some parts of the [GitLab
       Pages](https://docs.gitlab.com/ee/administration/pages/) system. unfortunately, that system relies on a custom
       webserver, but it might be possible to bypass that and directly
       access the on-disk files provided by the CI.
    
    The architecture would look something like this:
    
    ![Static system redesign architecture diagram](static-component/architecture-gitlab-pages.png
    )
    
    
    anarcat's avatar
    anarcat committed
    Details of the GitLab pages design and installation is available [in
    our GitLab documentation](howto/gitlab#gitlab-pages).
    
    anarcat's avatar
    anarcat committed
    Concerns about this approach:
    
     * GitLab pages is a custom webserver which issues TLS certs for the
       custom domains and serves the content, it's unclear how reliable or
       performant that server is
    
     * The pages design assumes the existence of a shared filesystem to
       deploy content, currently NFS, but they are switching to S3 (as
       explained above), which introduces significant complexity and moves
       away from the classic "everything is a file" approach
     * The new design also introduces a dependency on the main GitLab
       rails API for availability, which could be a concern, especially
       since that is [usually a "non-free" feature](https://about.gitlab.com/pricing/self-managed/feature-comparison/) (e.g. [PostgreSQL
       replication and failover](https://docs.gitlab.com/ee/administration/postgresql/replication_and_failover.html), [Database load-balancing](https://docs.gitlab.com/ee/administration/database_load_balancing.html),
       [traffic load balancer](https://docs.gitlab.com/ee/administration/reference_architectures/#traffic-load-balancer), [Geo disaster recovery](https://docs.gitlab.com/ee/administration/geo/disaster_recovery/index.html) and,
       generally, [all of Geo](https://about.gitlab.com/solutions/geo/) and most [availability components](https://docs.gitlab.com/ee/administration/reference_architectures/#availability-components)
       are non-free).
     * In general, this increases dependency on GitLab for deployments
    
    anarcat's avatar
    anarcat committed
    
    Next steps:
    
    
     1. [ ] check if the GitLab Pages subsystem provides atomic updates
     2. [x] see how GitLab Pages can be distributed to multiple hosts and
            how scalable it actually is or if we'll need to run the cache
            frontend in front of it. **update**: it can, but with
            significant caveats in terms of complexity, see above
     3. [ ] setup GitLab pages to test with small, non-critical websites
            (e.g. API documentation, etc)
     4. [ ] test the [GitLab pages API-based configuration](https://docs.gitlab.com/ee/administration/pages/#gitlab-api-based-configuration) and see how
            it handles outages of the main rails API
     5. [ ] test the [object storage system](https://docs.gitlab.com/ee/administration/pages/#using-object-storage) and see if it is usable,
            debuggable, highly available and performant enough for our
            needs
     6. [ ] keep track of upstream development of the GitLab pages
            architecture, [see this comment from anarcat](https://gitlab.com/groups/gitlab-org/-/epics/1316#note_496404589) outlining
            some of those concerns
    
    ### Replacing Jenkins with GitLab CI as a builder
    
    
    anarcat's avatar
    anarcat committed
    See the [Jenkins documentation](service/jenkins#gitlab-ci-replacement)
    for more information on that front.
    
    anarcat's avatar
    anarcat committed
    
    
    anarcat's avatar
    anarcat committed
    <!--  LocalWords:  atomicity DDOS YAML Hiera webserver NFS CephFS TLS
     -->
    <!--  LocalWords:  filesystem GitLab scalable frontend CDN HTTPS DNS
     -->
    <!--  LocalWords:  howto Nagios SSL TOC dns letsencrypt sudo LDAP SLA
     -->
    <!--  LocalWords:  rsync cron hostname symlink webservers Bacula DSA
     -->
    <!--  LocalWords:  torproject debian TPO Palfrader Julien Cristau TPA
     -->
    <!--  LocalWords:  LocalWords
     -->