Skip to content
Snippets Groups Projects

Continuous Integration is the system that allows tests to be ran and packages to be built, automatically, when new code is pushed to the version control system (currently git).

Note that even though the current system is Jenkins, this page mostly documents GitLab CI as that will be the likely, long term replacement.

Tutorial

GitLab CI has good documentation upstream. This section documents frequent questions we might get about the work.

Getting started

The GitLab CI quickstart should get you started here. Note that there are some "shared runners" you can already use, and which should be available to all projects.

TODO: do runners have time limits? should we document how to enable the shared runners in a project?

How-to

Pager playbook

TODO: @ahf what happens if there's trouble with the f-droid runners? who to ping? anything we can do to diagnose the problem? what kind of information to send them?

Disaster recovery

Runners should be disposable: if a runner is destroyed, at most the jobs it is currently running will be lost. Otherwise artifacts should be present on the GitLab server, so to recover a runner is as "simple" as creating a new one.

Reference

Installation

Since GitLab CI is basically GitLab with external runners hooked up to it, this section documents how to install and register runners into GitLab.

Linux

TODO: @ahf document how the F-Droid runners were hooked up to GitLab CI. Anything special on top of the official docs?

MacOS/Windows

TODO: @ahf document how MacOS/Windows images are created and runners are setup. don't hesitate to create separate headings for Windows vs MacOS and for image creation vs runner setup.

SLA

The GitLab CI service is offered on a "best effort" basis and might not be fully available.

Design

TODO: expand on GitLab CI's design and architecture, following this checklist. See also the Jenkins section below for the same thing about Jenkins.

Issues

File or search for issues in the GitLab issue tracker.

Monitoring and testing

TODO: @ahf how do we monitor the runners? maybe the prometheus exporter has something? should we hook it inside nagios to get alerts when runners get overwhelmed?

Logs and metrics

TODO: do runners keep logs? where? does it matter? any PII?

TODO: how about performance metrics? how do we know when we'll run out of capacity in the runner network since we don't host the f-droid stuff?

Backups

This service requires no backups: all configuration should be performed by Puppet and/or documented in this wiki page. A lost runner should be rebuilt from scratch, as per disaster recover.

Other documentation

Discussion

Tor currently uses Jenkins to run tests, builds and various automated jobs. This discussion is about if and how to replace this with GitLab CI.

Overview

Ever since the GitLab migration, we have discussed the possibility of replacing Jenkins with GitLab CI, or at least using GitLab CI in some way.

Tor currently utilizes a mixture of different CI systems to ensure some form of quality assurance as part of the software development process:

  • Jenkins (provided by TPA)
  • Gitlab CI (currently Docker builders kindly provided by the FDroid project via Hans from The Guardian Project)
  • Travis CI (used by some of our projects such as tpo/core/tor.git for Linux and MacOS builds)
  • Appveyor (used by tpo/core/tor.git for Windows builds)

By the end of 2020 however, pricing changes at Travis CI made it difficult for the network team to continue running the Mac OS builds there. Furthermore, it was felt that Appveyor was too slow to be useful for builds, so it was proposed (issue 40095) to create a pair of bare metal machines to run those builds, through a libvirt architecture. This is an exception to TPA-RFC 7: tools which was formally proposed in TPA-RFC-8.

Goals

In general, the idea here is to evaluate GitLab CI as a unified platform to replace Travis, and Appveyor in the short term, but also, in the longer term, Jenkins itself.

Must have

  • automated configuration: setting up new builders should be done through Puppet
  • the above requires excellent documentation of the setup procedure in the development stages, so that TPA can transform that into a working Puppet manifest
  • Linux, Windows, Mac OS support
  • x86-64 architecture ("64-bit version of the x86 instruction set", AKA x64, AMD64, Intel 64, what most people use on their computers)
  • Travis replacement
  • autonomy: users should be able to setup new builds without intervention from the service (or system!) administrators
  • clean environments: each build should run in a clean VM

Nice to have

  • fast: the runners should be fast (as in: powerful CPUs, good disks, lots of RAM to cache filesystems, CoW disks) and impose little overhead above running the code natively (as in: no emulation)
  • ARM64 architecture
  • Apple M-1 support
  • Jenkins replacement
  • Appveyor replacement
  • BSD support (FreeBSD, OpenBSD, and NetBSD in that order)

Non-Goals

  • in the short term, we don't aim at doing "Continuous Deployment". this is one of the possible goal of the GitLab CI deployment, but it is considered out of scope for now. see also the LDAP proposed solutions section

Approvals required

TPA's approbation required for the libvirt exception, see TPA-RFC-8.

Proposed Solution

The original proposal from @ahf when as follows:

[...] Reserve two (ideally) "fast" Debian-based machines on TPO infrastructure to build the following:

  • Run Gitlab CI runners via KVM (initially with focus on Windows x86-64 and macOS x86-64). This will replace the need for Travis CI and Appveyor. This should allow both the network team, application team, and anti-censorship team to test software on these platforms (either by building in the VMs or by fetching cross-compiled binaries on the hosts via the Gitlab CI pipeline feature). Since none(?) of our engineering staff are working full-time on MacOS and Windows, we rely quite a bit on this for QA.
  • Run Gitlab CI runners via KVM for the BSD's. Same argument as above, but is much less urgent.
  • Spare capacity (once we have measured it) can be used a generic Gitlab CI Docker runner in addition to the FDroid builders.
  • The faster the CPU the faster the builds.
  • Lots of RAM allows us to do things such as having CoW filesystems in memory for the ephemeral builders and should speed up builds due to faster I/O.

All this would be implemented through a GitLab custom executor using libvirt (see this example implementation).

This is an excerpt from the proposal sent to TPA:

[TPA would] build two (bare metal) machines (in the Cymru cluster) to manage those runners. The machines would grant the GitLab runner (and also @ahf) access to the libvirt environment (through a role user).

ahf would be responsible for creating the base image and deploying the first machine, documenting every step of the way in the TPA wiki. The second machine would be built with Puppet, using those instructions, so that the first machine can be rebuilt or replaced. Once the second machine is built, the first machine should be destroyed and rebuilt, unless we are absolutely confident the machines are identical.

Cost

The machines used were donated, but that is still an "hardware opportunity cost" that is currently undefined.

Staff costs, naturally, should be counted. It is estimated the initial runner setup should take less than two weeks.

Alternatives considered

Ganeti

Ganeti has been considered as an orchestration/deployment platform for the runners, but there is no known integration between GitLab CI runners and Ganeti.

If we find the time or an existing implementation, this would still be a nice improvement.

SSH/shell executors

This works by using an existing machine as a place to run the jobs. Problem is it doesn't run with a clean environment, so it's not a good fit.

Parallels/VirtualBox

Note: couldn't figure out what the difference is between Parallels and VirtualBox, nor if it matters.

Obviously, VirtualBox could be used to run Windows (and possibly MacOS?) images (and maybe BSDs?) but unfortunately, Oracle has made of mess of VirtualBox which keeps it out of Debian so this could be a problematic deployment as well.

Docker

Support in Debian has improved, but is still hit-and-miss. no support for Windows or MacOS, as far as I know, so not a complete solution, but could be used for Linux runners.

Docker machine

This was abandoned upstream and is considered irrelevant.

Kubernetes

@anarcat has been thinking about setting up a Kubernetes cluster for GitLab. There are high hopes that it will help us not only with GitLab CI, but also the "CD" (Continuous Deployment) side of things. This approach was briefly discussed in the LDAP audit, but basically the idea would be to replace the "SSH + role user" approach we currently use for service with GitLab CI.

As explained in the goals section above, this is currently out of scope, but could be considered instead of Docker for runners.

Jenkins

Jenkins was a fine piece of software when it came out: builds! We can easily do builds! On multiple machines too! And a nice web interface with weird blue balls! It was great. But then Travis came along, and then GitLab CI, and then GitHub actions, and it turns out it's much, much easier and intuitive to delegate the build configuration to the project as opposed to keeping it in the CI system.

The design of Jenkins, in other words, feels dated now. It imposes an unnecessary burden on the service admins, which are responsible for configuring and monitoring builds for their users.

It is also believed that installing GitLab runners will be easier on the sysadmins, although that remains to be verified.

In the short term, Jenkins can keep doing what it does, but in the long term, we would greatly benefit from retiring yet another service, since it basically duplicates what GitLab CI can do.

GitLab CI also has the advantage of being able to easily integrate with GitLab pages, making it easier for people to build static websites than the current combination of Jenkins and our static sites system. See the alternatives to the static site system for more information.

TODO: "audit" jenkins as per this document, possibly in its own "jenkins" service page.

In particular, we need to answer:

  • Why does this project exist?

  • What does this project do?

  • What is the context in which it exists?

  • Who has worked on this project in the past? (@weasel?)

  • Who is currently working on this project? (@weasel? @hiro?)

  • Who are the stake-holders? Who "owns" the service/application? (@weasel is the service admin i believe)

  • Who will cry out in anguish when the service/application goes down? (lots? which team still use it?)

  • Are there other projects that depend on it? What are they? (same as above?)

  • Are there possible future users that this project is working towards? (maybe more a question for GitLab CI)

  • Is there an active community around this project? (Jenkins is still healthy, actually, but not well supported in Debian)

  • design and architecture? do we really need to dive into this? would include stuff like:

    • What are the major components, services, storage systems, queues, etc for the project?
    • What data does the project use and how does it flow through the system?
    • What languages, versions, and runtimes are used?
    • interesting in the context of replacing with GitLab CI: What infrastructure is used? How is it defined? Who is responsible?
    • Is there a system for authentication/authorization? How does it work? Who is responsible for the systems involved?
    • where are the repos for control, which repos use it?
  • security review?

  • technical debt

  • urgent stuff