|
|
|
[Continuous Integration](https://en.wikipedia.org/wiki/Continuous_integration) is the system that allows tests to be ran
|
|
|
|
and packages to be built, automatically, when new code is pushed to
|
|
|
|
the version control system (currently [git](howto/git)).
|
|
|
|
|
|
|
|
Note that even though the current system is [Jenkins][], this page mostly documents GitLab
|
|
|
|
CI as that will be the likely, long term replacement.
|
|
|
|
|
|
|
|
[Jenkins]: https://jenkins.torproject.org
|
|
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
|
|
# Tutorial
|
|
|
|
|
|
|
|
[GitLab CI][GitLab CI splash] has [good documentation upstream][GitLab CI upstream]. This section
|
|
|
|
documents frequent questions we might get about the work.
|
|
|
|
|
|
|
|
[GitLab CI upstream]: https://docs.gitlab.com/ee/ci/
|
|
|
|
[GitLab CI splash]: https://about.gitlab.com/stages-devops-lifecycle/continuous-integration/
|
|
|
|
[GitLab CI quickstart]: https://docs.gitlab.com/ee/ci/quick_start/README.html
|
|
|
|
|
|
|
|
<!-- simple, brainless step-by-step instructions requiring little or -->
|
|
|
|
<!-- no technical background -->
|
|
|
|
|
|
|
|
## Getting started
|
|
|
|
|
|
|
|
The [GitLab CI quickstart][] should get you started here. Note that
|
|
|
|
there are some "shared runners" you can already use, and which should
|
|
|
|
be available to all projects.
|
|
|
|
|
|
|
|
TODO: time limits? should we say how to enable the shared runners?
|
|
|
|
|
|
|
|
# How-to
|
|
|
|
|
|
|
|
<!-- more in-depth procedure that may require interpretation -->
|
|
|
|
|
|
|
|
## Pager playbook
|
|
|
|
|
|
|
|
<!-- information about common errors from the monitoring system and -->
|
|
|
|
<!-- how to deal with them. this should be easy to follow: think of -->
|
|
|
|
<!-- your future self, in a stressful situation, tired and hungry. -->
|
|
|
|
|
|
|
|
TODO: what happens if there's trouble with the f-droid runners? who to
|
|
|
|
ping? anything we can do to diagnose the problem? what kind of
|
|
|
|
information to send them?
|
|
|
|
|
|
|
|
## Disaster recovery
|
|
|
|
|
|
|
|
Runners should be disposable: if a runner is destroyed, at most the
|
|
|
|
jobs it is currently running will be lost. Otherwise artifacts should
|
|
|
|
be present on the GitLab server, so to recover a runner is as "simple"
|
|
|
|
as creating a new one.
|
|
|
|
|
|
|
|
# Reference
|
|
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
Since GitLab CI is basically GitLab with external runners hooked up to
|
|
|
|
it, this section documents how to install and register runners into
|
|
|
|
GitLab.
|
|
|
|
|
|
|
|
### Linux
|
|
|
|
|
|
|
|
TODO: document how the F-Droid runners were hooked up to GitLab
|
|
|
|
CI. Anything special on top of [the official docs](https://docs.gitlab.com/runner/register/)?
|
|
|
|
|
|
|
|
### MacOS/Windows
|
|
|
|
|
|
|
|
TODO: @ahf document how MacOS/Windows images are created and runners
|
|
|
|
are setup. don't hesitate to create separate headings for Windows vs
|
|
|
|
MacOS and for image creation vs runner setup.
|
|
|
|
|
|
|
|
## SLA
|
|
|
|
|
|
|
|
The GitLab CI service is offered on a "best effort" basis and might
|
|
|
|
not be fully available.
|
|
|
|
|
|
|
|
## Design
|
|
|
|
<!-- how this is built -->
|
|
|
|
<!-- should reuse and expand on the "proposed solution", it's a -->
|
|
|
|
<!-- "as-built" documented, whereas the "Proposed solution" is an -->
|
|
|
|
<!-- "architectural" document, which the final result might differ -->
|
|
|
|
<!-- from, sometimes significantly -->
|
|
|
|
|
|
|
|
<!-- a good guide to "audit" an existing project's design: -->
|
|
|
|
<!-- https://bluesock.org/~willkg/blog/dev/auditing_projects.html -->
|
|
|
|
|
|
|
|
## Issues
|
|
|
|
|
|
|
|
[File][] or [search][] for issues in the [GitLab issue tracker][search].
|
|
|
|
|
|
|
|
[File]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/new
|
|
|
|
[search]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues
|
|
|
|
|
|
|
|
## Monitoring and testing
|
|
|
|
|
|
|
|
TODO: @ahf how do we monitor the runners? maybe the prometheus
|
|
|
|
exporter has something? should we hook it inside nagios to get alerts
|
|
|
|
when runners get overwhelmed?
|
|
|
|
|
|
|
|
## Logs and metrics
|
|
|
|
|
|
|
|
TODO: do runners keep logs? where? does it matter? any PII?
|
|
|
|
|
|
|
|
TODO: how about performance metrics? how do we know when we'll run out
|
|
|
|
of capacity in the runner network since we don't host the f-droid
|
|
|
|
stuff?
|
|
|
|
|
|
|
|
## Backups
|
|
|
|
|
|
|
|
This service requires no backups: all configuration should be
|
|
|
|
performed by Puppet and/or documented in this wiki page. A lost runner
|
|
|
|
should be rebuilt from scratch, as per [disaster recover](#Disaster recovery).
|
|
|
|
|
|
|
|
## Other documentation
|
|
|
|
|
|
|
|
* [GitLab CI promotional page][GitLab CI splash]
|
|
|
|
* [GitLab CI upstream documentation portal][GitLab CI upstream]
|
|
|
|
* [GitLab CI quickstart][]
|
|
|
|
|
|
|
|
[GitLab CI upstream]: https://docs.gitlab.com/ee/ci/
|
|
|
|
[GitLab CI splash]: https://about.gitlab.com/stages-devops-lifecycle/continuous-integration/
|
|
|
|
[GitLab CI quickstart]: https://docs.gitlab.com/ee/ci/quick_start/README.html
|
|
|
|
|
|
|
|
# Discussion
|
|
|
|
|
|
|
|
Tor currently uses [Jenkins][] to run tests, builds and various
|
|
|
|
automated jobs. This discussion is about if and how to replace this
|
|
|
|
with GitLab CI.
|
|
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
|
|
<!-- describe the overall project. should include a link to a ticket -->
|
|
|
|
<!-- that has a launch checklist -->
|
|
|
|
|
|
|
|
Ever since the [GitLab migration](howto/gitlab), we have discussed the
|
|
|
|
possibility of replacing Jenkins with GitLab CI, or at least using
|
|
|
|
GitLab CI in some way.
|
|
|
|
|
|
|
|
Tor currently utilizes a mixture of different CI systems to ensure
|
|
|
|
some form of quality assurance as part of the software development
|
|
|
|
process:
|
|
|
|
|
|
|
|
- Jenkins (provided by TPA)
|
|
|
|
- Gitlab CI (currently Docker builders kindly provided by the FDroid
|
|
|
|
project via Hans from The Guardian Project)
|
|
|
|
- Travis CI (used by some of our projects such as tpo/core/tor.git for
|
|
|
|
Linux and MacOS builds)
|
|
|
|
- Appveyor (used by tpo/core/tor.git for Windows builds)
|
|
|
|
|
|
|
|
By the end of 2020 however, [pricing changes at Travis
|
|
|
|
CI](https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing) made it difficult for the network team to continue running the
|
|
|
|
Mac OS builds there. Furthermore, it was felt that Appveyor was too
|
|
|
|
slow to be useful for builds, so it was proposed ([issue 40095][]) to
|
|
|
|
create a pair of bare metal machines to run those builds, through a
|
|
|
|
`libvirt` architecture. This is an exception to [TPA-RFC 7: tools](policy/tpa-rfc-7-tools)
|
|
|
|
which was formally proposed in [TPA-RFC-8][].
|
|
|
|
|
|
|
|
[issue 40095]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40095
|
|
|
|
[TPA-RFC-8]: policy/tpa-rfc-8-gitlab-ci-libvirt
|
|
|
|
## Goals
|
|
|
|
|
|
|
|
In general, the idea here is to evaluate GitLab CI as a unified
|
|
|
|
platform to replace Travis, and Appveyor in the short term, but also,
|
|
|
|
in the longer term, Jenkins itself.
|
|
|
|
|
|
|
|
### Must have
|
|
|
|
|
|
|
|
* automated configuration: setting up new builders should be done
|
|
|
|
through Puppet
|
|
|
|
* the above requires excellent documentation of the setup procedure
|
|
|
|
in the development stages, so that TPA can transform that into a
|
|
|
|
working Puppet manifest
|
|
|
|
* Linux, Windows, Mac OS support
|
|
|
|
* x86-64 architecture ("64-bit version of the x86 instruction set",
|
|
|
|
AKA x64, AMD64, Intel 64, what most people use on their computers)
|
|
|
|
* Travis replacement
|
|
|
|
* autonomy: users should be able to setup new builds without
|
|
|
|
intervention from the service (or system!) administrators
|
|
|
|
* clean environments: each build should run in a clean VM
|
|
|
|
|
|
|
|
### Nice to have
|
|
|
|
|
|
|
|
* fast: the runners should be fast (as in: powerful CPUs, good disks,
|
|
|
|
lots of RAM to cache filesystems, CoW disks) and impose little
|
|
|
|
overhead above running the code natively (as in: no emulation)
|
|
|
|
* ARM64 architecture
|
|
|
|
* Apple M-1 support
|
|
|
|
* Jenkins replacement
|
|
|
|
* Appveyor replacement
|
|
|
|
* BSD support (FreeBSD, OpenBSD, and NetBSD in that order)
|
|
|
|
|
|
|
|
### Non-Goals
|
|
|
|
|
|
|
|
* in the short term, we don't aim at doing "Continuous
|
|
|
|
Deployment". this is one of the possible goal of the GitLab CI
|
|
|
|
deployment, but it is considered out of scope for now. see also the
|
|
|
|
[LDAP proposed solutions section][]
|
|
|
|
|
|
|
|
[LDAP proposed solutions section]: howto/ldap#Proposed-Solution
|
|
|
|
|
|
|
|
## Approvals required
|
|
|
|
|
|
|
|
TPA's approbation required for the libvirt exception, see
|
|
|
|
[TPA-RFC-8][].
|
|
|
|
|
|
|
|
## Proposed Solution
|
|
|
|
|
|
|
|
The [original proposal][issue 40095] from @ahf when as follows:
|
|
|
|
|
|
|
|
> [...] Reserve two (ideally) "fast" Debian-based machines on TPO infrastructure to build the following:
|
|
|
|
>
|
|
|
|
> * Run Gitlab CI runners via KVM (initially with focus on Windows
|
|
|
|
> x86-64 and macOS x86-64). This will replace the need for Travis CI
|
|
|
|
> and Appveyor. This should allow both the network team, application
|
|
|
|
> team, and anti-censorship team to test software on these platforms
|
|
|
|
> (either by building in the VMs or by fetching cross-compiled
|
|
|
|
> binaries on the hosts via the Gitlab CI pipeline feature). Since
|
|
|
|
> none(?) of our engineering staff are working full-time on MacOS
|
|
|
|
> and Windows, we rely quite a bit on this for QA.
|
|
|
|
> * Run Gitlab CI runners via KVM for the BSD's. Same argument as
|
|
|
|
> above, but is much less urgent.
|
|
|
|
> * Spare capacity (once we have measured it) can be used a generic
|
|
|
|
> Gitlab CI Docker runner in addition to the FDroid builders.
|
|
|
|
> * The faster the CPU the faster the builds.
|
|
|
|
> * Lots of RAM allows us to do things such as having CoW filesystems
|
|
|
|
> in memory for the ephemeral builders and should speed up builds
|
|
|
|
> due to faster I/O.
|
|
|
|
|
|
|
|
All this would be implemented through a GitLab [custom executor][]
|
|
|
|
using [libvirt](https://libvirt.org/) (see [this example implementation](https://docs.gitlab.com/runner/executors/custom_examples/libvirt.html)).
|
|
|
|
|
|
|
|
This is an excerpt from the [proposal sent to TPA][TPA-RFC-8]:
|
|
|
|
|
|
|
|
> [TPA would] build two (bare metal) machines (in the Cymru cluster)
|
|
|
|
> to manage those runners. The machines would grant the GitLab runner
|
|
|
|
> (and also @ahf) access to the libvirt environment (through a role
|
|
|
|
> user).
|
|
|
|
>
|
|
|
|
> ahf would be responsible for creating the base image and deploying the
|
|
|
|
> first machine, documenting every step of the way in the TPA wiki. The
|
|
|
|
> second machine would be built with Puppet, using those instructions,
|
|
|
|
> so that the first machine can be rebuilt or replaced. Once the second
|
|
|
|
> machine is built, the first machine should be destroyed and rebuilt,
|
|
|
|
> unless we are absolutely confident the machines are identical.
|
|
|
|
>
|
|
|
|
> [custom executor]: https://docs.gitlab.com/runner/executors/custom.html
|
|
|
|
|
|
|
|
## Cost
|
|
|
|
|
|
|
|
The machines used were donated, but that is still an "hardware
|
|
|
|
opportunity cost" that is currently undefined.
|
|
|
|
|
|
|
|
Staff costs, naturally, should be counted. It is estimated the initial
|
|
|
|
runner setup should take less than two weeks.
|
|
|
|
|
|
|
|
## Alternatives considered
|
|
|
|
|
|
|
|
### Ganeti
|
|
|
|
|
|
|
|
Ganeti has been considered as an orchestration/deployment platform for
|
|
|
|
the runners, but there is no known integration between GitLab CI
|
|
|
|
runners and Ganeti.
|
|
|
|
|
|
|
|
If we find the time or an existing implementation, this would still be
|
|
|
|
a nice improvement.
|
|
|
|
|
|
|
|
### SSH/shell executors
|
|
|
|
|
|
|
|
This works by using an existing machine as a place to run the
|
|
|
|
jobs. Problem is it doesn't run with a clean environment, so it's not
|
|
|
|
a good fit.
|
|
|
|
|
|
|
|
### Parallels/VirtualBox
|
|
|
|
|
|
|
|
Note: couldn't figure out what the difference is between Parallels and
|
|
|
|
VirtualBox, nor if it matters.
|
|
|
|
|
|
|
|
Obviously, VirtualBox could be used to run Windows (and possibly
|
|
|
|
MacOS?) images (and maybe BSDs?) but unfortunately, Oracle has made of
|
|
|
|
mess of VirtualBox which [keeps it out of Debian](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=794466) so this could be
|
|
|
|
a problematic deployment as well.
|
|
|
|
|
|
|
|
### Docker
|
|
|
|
|
|
|
|
[Support in Debian](https://tracker.debian.org/pkg/docker.io) has improved, but is still hit-and-miss. no
|
|
|
|
support for Windows or MacOS, as far as I know, so not a complete
|
|
|
|
solution, but could be used for Linux runners.
|
|
|
|
|
|
|
|
### Docker machine
|
|
|
|
|
|
|
|
This was abandoned upstream and is considered irrelevant.
|
|
|
|
|
|
|
|
### Kubernetes
|
|
|
|
|
|
|
|
@anarcat has been thinking about setting up a Kubernetes cluster for
|
|
|
|
GitLab. There are high hopes that it will help us not only with GitLab
|
|
|
|
CI, but also the "CD" (Continuous Deployment) side of things. This
|
|
|
|
approach was briefly [discussed in the LDAP audit][LDAP proposed solutions section], but basically the
|
|
|
|
idea would be to replace the "SSH + role user" approach we currently
|
|
|
|
use for service with GitLab CI.
|
|
|
|
|
|
|
|
As explained in the [goals](#Goals) section above, this is currently out of
|
|
|
|
scope, but could be considered instead of Docker for runners.
|
|
|
|
|
|
|
|
### Jenkins
|
|
|
|
|
|
|
|
[Jenkins][Jenkins CI] was a fine piece of software when it came out: builds! We
|
|
|
|
can easily do builds! On multiple machines too! And a nice web
|
|
|
|
interface with [weird blue balls](https://www.jenkins.io/blog/2012/03/13/why-does-jenkins-have-blue-balls/)! It was great. But then Travis
|
|
|
|
came along, and then GitLab CI, and then GitHub actions, and it turns
|
|
|
|
out it's much, much easier and intuitive to delegate the build
|
|
|
|
configuration to the project as opposed to keeping it in the CI
|
|
|
|
system.
|
|
|
|
|
|
|
|
The design of Jenkins, in other words, feels dated now. It imposes an
|
|
|
|
unnecessary burden on the service admins, which are responsible for
|
|
|
|
configuring and monitoring builds for their users.
|
|
|
|
|
|
|
|
It is also believed that installing GitLab runners will be easier on
|
|
|
|
the sysadmins, although that remains to be verified.
|
|
|
|
|
|
|
|
In the short term, Jenkins can keep doing what it does, but in the
|
|
|
|
long term, we would greatly benefit from retiring yet another service,
|
|
|
|
since it basically duplicates what GitLab CI can do.
|
|
|
|
|
|
|
|
GitLab CI also has the advantage of being able to easily integrate
|
|
|
|
with GitLab pages, making it easier for people to build static
|
|
|
|
websites than the current combination of Jenkins and our [static sites
|
|
|
|
system](howto/static-component). See the [alternatives to the static site
|
|
|
|
system](static-component#Alternatives-considered) for more information.
|
|
|
|
|
|
|
|
[Jenkins CI]: https://en.wikipedia.org/wiki/Jenkins_(software) |