title: TPA-RFC-10: Jenkins retirement
approval: TPA, weasel, Jenkins users (web, network, metrics teams)
deadline:
TPA: 2021-03-09
tor-internal: 2021-04-15
server shutdown: 2021-12-01
status: obsolete
discussion: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40167
Summary: Jenkins will be retired in 2021, replaced by GitLab CI, with special hooks to keep the static site mirror system and Debian package builds operational. Non-critical websites (e.g. documentation) will be built by GitLab CI and served by GitLab pages. Critical websites (e.g. main website) will be built by GitLab CI and served by the static mirror system. Teams are responsible for migrating their jobs, with assistance from TPA, by the end of the year (December 1st 2021).
Background
Jenkins was a fine piece of software when it came out: builds! We can easily do builds! On multiple machines too! And a nice web interface with weird blue balls! It was great. But then Travis CI came along, and then GitLab CI, and then GitHub actions, and it turns out it's much, much easier and intuitive to delegate the build configuration to the project as opposed to keeping it in the CI system.
The design of Jenkins, in other words, feels dated now. It imposes an unnecessary burden on the service admins, which are responsible for configuring and monitoring builds for their users. Introducing a job (particularly a static website job) involves committing to four different git repositories, an error-prone process that rarely works on the first try.
The scripts used to build Jenkins has some technical debt: there's at least one Python script that may or may not have been ported to Python 3. There are, as far as we know, no other emergencies in the maintenance of this system.
In the short term, Jenkins can keep doing what it does, but in the long term, we would greatly benefit from retiring yet another service, since it basically duplicates what GitLab CI already does.
Note that the 2020 user survey also had a few voices suggesting that Jenkins be retired in favor of GitLab CI. Some users also expressed "sadness" with the Jenkins service. Those results were the main driver behind this proposal.
Goals
The goal of this migration is to retire the Jenkins service and
servers (henryi
but also the multiple build-$ARCH-$NN
servers)
with minimal disruption to its users.
Must have
- continuous integration: run unit tests after a push to a git repository
- continuous deployment of static websites: build and upload static websites, to the existing static mirror system, or to GitLab pages for less critical sites
Nice to have
- retire all the existing
build-$ARCH-$NN
machines in favor of the GitLab CI runners architecture
Non-Goals
- retiring the gitolite / gitweb infrastructure is out of scope, even though it is planned as part of the 2021 roadmap. therefore solutions here should not rely too much on gitolite-specific features or hooks
- replacing the current static mirror system is out of scope, and is not planned in the 2021 roadmap at all, so the solution proposed must still be somewhat compatible with the static site mirror system
Proposal
Replacing Jenkins will be done progressively, over the course of 2021, by the different Jenkins users themselves. TPA will coordinate the effort and progressively remove jobs from the Jenkins configuration until none remain, at which point the server -- along with the build boxes -- will be retired.
No archive of the service will be kept.
GitLab Ci as main option, and alternatives
GitLab will be suggested as an alternative for Jenkins users, but users will be free to implement their own build system in other ways if they do not feel GitLab CI is a good fit for their purpose.
In particular, GitLab has a powerful web hook system that can be used to trigger builds on other infrastructure. Alternatively, external build systems could periodically pull Git repositories for changes.
Stakeholders and responsibilities
We know of the following teams currently using Jenkins and affected by this:
- web team: virtually all websites are built in Jenkins, and heavily depend on the static site mirror for proper performance
- network team: the core tor project is also a heavy user of Jenkins, mostly to run tests and checks, but also producing some artefacts (Debian packages and documentation)
- TPA: uses Jenkins to build the status website
- metrics team: onionperf's documentation is built in Jenkins
When this proposal is adopted, a ticket will be created to track all the jobs configured in Jenkins and each team will be responsible to migrate their jobs before the deadline. It is not up to TPA to rebuild those pipelines, as this would be too time-consuming and would require too much domain-specific knowledge. Besides, it's important that teams become familiar with the GitLab CI system so this is a good opportunity to do so.
A more detailed analysis of the jobs currently configured in Jenkins is available in the Configured Jobs section of the Jenkins service documentation.
Specific job recommendations
With the above in mind, here are some recommendation on specific group of jobs currently configured on the Jenkins server and how they could be migrated to the GitLab CI infrastructure.
Some jobs will be harder to migrate than others, so a piecemeal approach will be used.
Here's a breakdown by job type, from easiest to hardest:
Non-critical websites
Non-critical websites should be moved to GitLab Pages. A redirect in the static mirror system should ensure link continuity until GitLab pages is capable of hosting its own CNAMEs (or it could be fixed to do so, but that is optional).
Proof-of-concept jobs have already been done for this. the
status.torproject.org
site has a pipeline that publishes a GitLab
pages, for example, under:
https://tpo.pages.torproject.net/tpa/status-site/
The GitLab pages domain may still change in the future and should not be relied upon just yet.
Linux CI tests
Test suites running on Linux machines should be progressively migrated to GitLab CI. Hopefully this should be a fairly low-hanging fruit, and that effort has already started, with jobs already running in GitLab CI with a Docker-based runner.
Windows CI tests
GitLab CI will eventually gain Windows (and Mac!) based runners (see issue 40095) which should be able to replace the Windows CI jobs from Jenkins.
Critical website builds
Critical websites should be built by GitLab CI just like non-critical sites, but must be pushed to the static mirror system somehow. The GitLab Pages data source (currently the main GitLab server) should be used as a "static source" which would get triggered by a GitLab web hook after a successful job.
The receiving end of that web hook would be a new service, also running on the GitLab Pages data source, which would receive hook notifications and trigger the relevant static component updates to rsync the files to the static mirror system.
As an exception to the "users migrate their own jobs" rule, TPA and the web team will jointly oversee the implementation of the integration between GitLab CI and the static mirror system. Considering the complexity of both systems, it is unlikely the web team or TPA will be in a position to individually implement this solution.
Debian package builds
Debian packages pose a challenge similar to the critical website
builds in that there is existing infrastructure, external to GitLab,
which we need to talk with. In this case, it's the
https://deb.torproject.org server (currently palmeri
).
There are two possible solutions:
-
build packages in GitLab CI and reuse the "critical website webhook" discussed above to trigger uploads of the artifact to the Debian archive from outside GitLab
-
build packages on another system, triggered using a new web hook
Update: see ticket 40241 for followup.
Retirement checklist
Concretely, the following will be removed on retirement:
-
windows build boxes retirement (VMs starting with
w*
,weissi
,woronowii
,winklerianum
,Windows buildbox
purpose in LDAP) -
Linux build boxes retirement (
build-$ARCH-$NN.torproject.org
,build box
purpose in LDAP) -
NAT box retirement (
nat-fsn-01.torproject.org
) -
Jenkins box retirement (
rouyi.torproject.org
) - Puppet code cleanup (retire buildbox and Jenkins code)
- git code cleanup (archive Jenkins repositories)
Update: follow ticket 40218 for progress.
Examples
Examples:
- the network team is migrating their CI jobs to GitLab CI
- the https://research.torproject.org/ site would end up as a GitLab pages site
- the https://www.torproject.org/ site -- and all current Lektor sites -- would stay in the static mirror system, but would be built in GitLab CI
- a new Lektor site may not necessarily be hosted in the static mirror system, if it's non-critical, it just happens that the current set of Lektor sites are all considered critical
Deadline
This proposal will be adopted by TPA by March 9th unless there are any objections. It will be proposed to tor-internal after TPA's adoption, where it will be adopted (or rejected) on April 15th unless there are any objections.
All Jenkins jobs SHOULD be migrated to other services by the end of 2021. The Jenkins server itself will be shut down on December 1st, unless a major problem comes up, in which case extra delays could be given for teams.
References
See the GitLab, GitLab CI, and Jenkins service documentation for more background on how Jenkins and GitLab CI work.
Discussions and feedback on this RFC can be sent in issue 40167.