Skip to content

Build our production website in GitLab CI

Scope

In scope

  • build our live, production website via a GitLab CI job, and then deploy the output to webserver(s) upon success
  • by default, use caching + ikiwiki --refresh to avoid a huge time-to-publication increase
  • developers and tech writers can force a full rebuild of the website, that bypasses/invalidates the cache, via GitLab CI

Out of scope

This issue is not about serving our website via GitLab pages.

Expected benefits

More robust

  • the build happens in a controlled, mostly reproducible environment, so problems caused by transition between states are less of a problem
  • the output of the build is published only if it succeeded ⇒ no partly refreshed, half broken website in production
  • Avoid problems caused by incorrect state transitions like tails/sysadmin#18065+

Non-sysadmins have more agency about their work

  • everyone can look at the build output: not only the person who pushed, but also the person who should investigate and debug what happened
  • developers can fix stuff themselves via the GitLab CI config file, if needed
  • developers and tech writers can maintain the configuration themselves (ikiwiki.setup, ikiwiki plugins, build dependencies such as po4a (tails/tails#18667+ and the upcoming tails/tails#20239))
    • no need to maintain changes in 2 different versions (tails.git, puppet-tails)
    • no need to coordinate merging branches with deploying updated configuration on the production infra

Recover from broken website refresh/build without sysadmin intervention

In a variety of situations, an ikiwiki refresh triggered by a Git push fails, leaving it in an unclean state, and then the only way to recover is to ssh into the machine and manually start a full rebuild. This is painful because:

  • When this happens during a release process, the release can be left half-published, until someone fixes this. That’s not fun for the RM.
  • It puts timing/availability/expectations pressure on sysadmins.
  • I suspect our technical writers have grown wary of pushing some kinds of changes that typically trigger this sort of problems. Not being able to do one’s job with a reasonable amount of confidence in oneself and in our infra is surely not fun.

Paves the way towards web server redundancy

Context: tails/sysadmin#16956

E.g. here's how Tor is doing it: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/static-shim#deploying-a-static-site-from-gitlab-ci

Examples: https://gitlab.torproject.org/tpo/web/tpo/-/blob/main/.gitlab-ci.yml?ref_type=heads and https://gitlab.torproject.org/tpo/web/blog/-/blob/main/.gitlab-ci.yml?ref_type=heads

And an example deployment: https://gitlab.torproject.org/tpo/web/tpo/-/jobs/496878

Originally created by @intrigeri on #17364 (Redmine)

To-do

  • Build the website in GitLab CI and push it to www2 → tails!1519
  • Fix #18086+
  • Prevent jobs corresponding to older commits from overwriting newer versions of the website (see thread below)
  • Use our own container image to build the website
  • Pin our GitLab's container registry IP in /etc/hosts of gitlab-runner VMs
  • Figure out how to feed Ikiwiki's PO files update back to tails.git
  • Push the website to www (somehow) and retire tails::website
  • Test changing source string so IkiWiki pushes updated .po files back to the repo
  • Check if there's a better access-token setup than the current one re. needed permissions and expiration time
  • Document accordingly
  • Push to www2 via the private network and remove the public access to that VM's SSH service
Edited by groente-admin