Changes

anarcat · 060f997d
--- a/service/ci.md
+++ b/service/ci.md
@@ -25,10 +25,8 @@ documents frequent questions we might get about the work.
 The [GitLab CI quickstart][] should get you started here. Note that
 there are some "shared runners" you can already use, and which should
-be available to all projects.
+be available to all projects. So your main task here is basically to
+[write a `.gitlab-ci.yml` file](https://docs.gitlab.com/ee/ci/quick_start/README.html#create-a-gitlab-ciyml-file).
-TODO: do runners have time limits? should we document how to enable
-the shared runners in a project?
 # How-to
@@ -39,16 +37,27 @@ the shared runners in a project?
 There might be too many jobs in the queue. You can monitor the queue
 in our [Grafana dashboard](https://grafana.torproject.org/d/QrDJktiMz/gitlab-omnibus).
-## Building docker images
-TODO: document how to build docker images from GitLab CI. Maybe with
+## Enabling/disabling runners
-podman or buildah? see below.
+If a runner is misbehaving, it might be worth "pausing" it while we
+investigate, so that jobs don't all fail on that runner. For this,
+head for the [runner admin interface](https://gitlab.torproject.org/admin/runners) and hit the "pause" button on
+the runner.
+## Registering more runners
-## Image security
+Anyone can run their own personal runner in their own infrastructure
+and register them inside a project on our GitLab instance. For this
+you need to first [install a runner](https://docs.gitlab.com/runner/install/) and [register it in
+GitLab](https://docs.gitlab.com/runner/register/). But we already have shared runners, if they are not
+sufficient, it might be best to request a new one from TPA.
-TODO: document how to create and use more secure Docker images. For
+## Converting a Jenkins job
-example, most images run as root: try to make images run as a regular
-user.
+Upstream has [generic documentation on how to migrate from Jenkins](https://docs.gitlab.com/ee/ci/migration/jenkins.html)
+which could be useful for us. We have yet to write a more complete
+guide on how to migrate jobs to GitLab CI.
 ## Pager playbook
@@ -93,8 +102,7 @@ cluster, using this command:
          ci-runner-01.torproject.org
 The `profile::gitlab_runner` Puppet class deploys the GitLab runner
-code and hooks it into GitLab. It uses the
+code and hooks it into GitLab. It uses the [gitlab_ci_runner](https://forge.puppet.com/modules/puppet/gitlab_ci_runner)
-[gitlab_ci_runner](https://forge.puppet.com/modules/puppet/gitlab_ci_runner)
 module from Voxpupuli to avoid reinventing the wheel. But before
 enabling it on the instance, the following operations need to be
 performed:
@@ -175,15 +183,121 @@ not be fully available.
 ## Design
-TODO: expand on GitLab CI's design and architecture, following [this
+The CI service is currently being serviced by [Jenkins][], but we are
-checklist](https://bluesock.org/~willkg/blog/dev/auditing_projects.html). See also the [Jenkins section](#jenkins) below for the same
+looking at replacing this with GitLab CI in the [2021
-thing about Jenkins.
+roadmap](roadmap/2021). This section therefore mostly documents how the new
+GitLab CI service is built. See [Jenkins section](#jenkins) below for more
+information about the old Jenkins service.
+### GitLab CI architecture
+GitLab CI sits somewhat outside of the main GitLab architecture, in
+that it is not featured proeminently int he [GitLab architecture
+documentation](https://docs.gitlab.com/ee/development/architecture.html). In practice, it is a core component of GitLab in
+that the continuous integration and deployment features of GitLab have
+become a key feature and selling point for the project.
+GitLab CI works by scheduling "pipelines" which are made of one or
+many "jobs", defined in a project's git repository (the
+[`.gitlab-ci.yml`](https://docs.gitlab.com/ee/ci/yaml/) file). Those jobs then get picked up by one of
+many "runners". Those runners are separate processes, usually running
+on a different host than the main GitLab server.
+They regularly poll the central GitLab for jobs and execute those
+inside an "[executor](https://docs.gitlab.com/runner/executors/README.html)". We currently support only "Docker" as an
+executor but are working on different ones, like a custom "podman"
+(for more trusted runners, see below) or KVM executor (for foreign
+platforms like MacOS or Windows).
+What the runner effectively does is basically this:
+ 1. it fetches the git repository of the project
+ 2. it runs a sequence of shell commands on the project inside the
+    executor (e.g. inside a Docker container) with [specific
+    environment variables](https://docs.gitlab.com/ee/ci/variables/README.html#gitlab-cicd-environment-variables) populated from the project's settings
+ 3. it collects artifacts and logs and uploads those back to the main
+    GitLab server
+The jobs are therefore affected by the `.gitlab-ci.yml` file but also
+the configuration of each project. It's a simple yet powerful design.
+### Types of runners
+There are three types of runners:
+ * **shared**: "shared" across all projects, they will pick up any
+   job from any project
+ * **group**: those are restricted to run jobs only within a
+   specific group
+ * **project**: those will only run job within a specific project
+In addition, jobs can be targeted at specific runners by assigning
+them a "tag".
+### Runner tags
+Whether a runner will pick a job depends on a few things:
+ * if it is a "shared", "project" or "group-"specific runner (above)
+ * if it has a tag matching the [`tags` field in the configuration](https://docs.gitlab.com/ee/ci/yaml/#tags)
+We currently use the following tags:
-Some things to look into:
+ * **architecture**: `amd64`, for example, runs on the normal 64-bit
+   Intel/AMD architecture, new tags like this may be introduced when
+   other architectures are supported
+ * **OS**: `linux` is usually implicit but other tags might eventually
+   be added for other OS
+ * **executor** type: `docker`, `KVM`, etc. `docker` are the typical
+   runners, `KVM` runners are possibly more powerful and can, for
+   example, run Docker-inside-Docker (DinD)
+ * **memory** size: `64GB`, `32GB`, `4GB`, etc.
+ * `privileged`: those containers have actual root access and should
+   explicitely be able to run `DinD`
+ * `interactive web terminal`: supports [interactively debugging
+   jobs](https://docs.gitlab.com/ee/ci/interactive_web_terminal/)
+ * `fdroid`: provided as a courtesy by the [F-Droid project](https://f-droid.org/)
+Use tags in your configuration only if your job can be fullfilled by
+only some of those runners. For example, only specify a memory tag if
+your job requires a lot of memory.
+### Upstream release schedules
+GitLab CI is an integral part of GitLab itself and gets released along
+with the core releases. GitLab runner is a [separate software
+project](https://gitlab.com/gitlab-org/gitlab-runner) but usually gets released alongside GitLab.
+### Security
+TODO: Some things to look into:
 * https://docs.gitlab.com/ee/user/project/new_ci_build_permissions_model.html
 * https://docs.gitlab.com/runner/security/
+We do not currently trust GitLab runners for security purposes: at
+most we trust them to correctly report errors in test suite, but we do
+not trust it with compiling and publishing artifacts, so they have a
+low value in our trust chain. This might eventually change.
+### Image, volume and container storage and caching
+GitLab runner creates quite a few containers, volumes and images in
+the course of its regular work. Those tend to pile up, unless they get
+cleaned. [Upstream suggests](https://docs.gitlab.com/runner/executors/docker.html#clearing-docker-cache) a [fairly naive shell script](https://gitlab.com/gitlab-org/gitlab-runner/blob/master/packaging/root/usr/share/gitlab-runner/clear-docker-cache) to do
+this cleanup, but it has a number of issues:
+ 1. it is noisy ([patched locally with this MR](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2711))
+ 2. it might be too aggressive
+So we only run it weekly, and instead run a more "gentle" `docker
+system prune` command to cleanup orphaned stuff after 3 days.
+Also note that documentation on this inside GitLab runner is
+inconsistent at best, see [this other MR](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2711) and [this issue](https://gitlab.com/gitlab-org/gitlab-runner-docker-cleanup/-/issues/21).
+### rootless containers
 TODO: consider podman for running containers more securely, and
 possibly also to build container images inside GitLab CI, which would
 otherwise require docker-in-docker (DinD), unsupported by
@@ -193,9 +307,33 @@ upstream. some ideas here:
 * https://github.com/containers/podman/issues/7982
 * https://github.com/jonasbb/podman-gitlab-runner
+### Current services
+GitLab CI, at TPO, currently runs the following services:
+ * continuous integration: mostly testing after commit
+This is currently used by many teams and is quickly becoming a
+critical service.
+### Possible services
+It could eventually also run those services:
+ * web page hosting through GitLab pages or the existing static site
+   system. this is a requirement to replace Jenkins
+ * continuous deployment: applications and services could be deployed
+   directly from GitLab CI/CD, for example through a Kubernetes
+   cluster or just with plain Docker
+ * artifact publication: tarballs, binaries and Docker images could be
+   built by GitLab runners and published on the GitLab server (or
+   elsewhere). this is a requirement to replace Jenkins
 ## Issues
-[File][] or [search][] for issues in the [GitLab issue tracker][search].
+[File][] or [search][] for issues in our [GitLab issue
+tracker][search]. Upstream has of course an [issue tracker for GitLab
+runner](https://gitlab.com/gitlab-org/gitlab-runner/-/issues) and a [project page](https://gitlab.com/gitlab-org/gitlab-runner).
 [File]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/new
 [search]: https://gitlab.torproject.org/tpo/tpa/gitlab/-/issues