Skip to content
Snippets Groups Projects
title: Tails and TPA Puppet codebase merge
deadline: 2025-02-26
status: proposed

Background

TPA-RFC-73 identified Puppet as a bottleneck for the merge between TPA and Tails infrastructure, as it blocks keeping, migrating and merging several other services. Merging codebases and ditching one of the Puppet servers is a complex move, so in this document we detail how that will be done.

Proposal

Goals

Must have

  • One Puppet Server to rule them all
  • Adoption of TPA's solution for handling Puppet modules and ENC
  • Convergence in Puppet modules versions
  • Commit signing (as it's fundamental for Tails' current backup solution)

Non-goals

This proposal is not about:

  • Completely refactoring and deduplicating code, as that will be done step-by-step while we handle each services individually after the Puppet Server merge
  • Ditching one way to store secrets in favor of another, as that will be done separately in the future, after both teams had the chance to experience Trocla and hiera-eyaml
  • Tackling individual service merges, such as backups, dns, monitoring and firewall; these will be tackled individually once all infra is under one Puppet Server
  • Applying new code standards everywhere; at most, we'll come up with general guidelines that could (maybe should) be used for new code and, in the future, for refactoring

Phase 1: Codebase preparation

This phase ensures that, once Tails code is copied to Tor's Puppet Control repo:

  • Code structure will match and be coherent
  • Tails code will not affect Tor's infra and Tor's code will not affect Tails infra

Note: Make sure to freeze all Puppet code refactoring on both sides before starting.

Converge in structure

Tails:

  • (1.1) Switch from Git submodules to using g10k (#41974)
  • (1.2) Remove ENC configuration, Tails don't really use it and the Puppet server switch will implement Tor's instead
  • (1.3) Move node definitions under manifests/nodes.pp to roles
  • (1.4) Switch to the directory structure used by Tor:
    • Move custom non-profile modules (bitcoind, borgbackup, etckeeper, gitolite, rbac, reprepro, rss2email, tails, tirewall and yapgp) to legacy/. Note: there are no naming conflicts in this case.
    • Make sure to leave only 3rd party modules under modules/. There are 2 naming conflicts here (unbound and network): Tails uses these from voxpupuli and Tor uses custom ones in legacy/, so in these cases we deprecate the Tor ones in favor of voxpupuli's.
    • Rename hieradata to data
    • Rename profiles to site
  • (1.5) Move default configuration to a new profile::tails class and include it in all nodes

Converge in substance

Tails:

  • (1.6) Rename all profiles from tails::profile to profile::tails
  • (1.7) Ensure all exported resources' tags are prefixed with tails_
  • (1.8) Upgrade 3rd-party modules to match TPA versions

Tor:

  • (1.9) Install all 3rd-party modules that are used by Tails but not by Tor
  • (1.10) Isolate all exported resources and collectors using tags
  • (1.11) Move default configuration to a new profile::tpa class and include it in all nodes
  • (1.12) Enforce signed commits
  • (1.13) Ensure all private data is moved to Trocla and publish the repo (tpo/tpa/team#29387)
  • (1.14) Import the tails::profile::puppet::eyaml profile into TPA's profile::puppet::server
  • (1.15) Copy the EYAML keys from the Tails to the Tor puppet server, and adapt hiera.yaml to use them
  • (1.16) Upgrade 3rd-party modules to match Tails versions

When we say "upgrade", we don't mean to upgrade to the latest upstream version of a module, but to the latest release that is highest version between the two codebases while also satisfying dependency requirements.

In other words, we don't "upgrade everything to latest", we "upgrade to Tails", or "upgrade to TPA", depending on the module. It's likely going to be "upgrade to Tails versions" everywhere, that said, considering the Tails codebase is generally tidier.

Phase 2: Puppet server switch

This phase moves all nodes from one Puppet server to the other:

  • (2.1) Copy code (legacy modules and profiles) from Tails to Tor
  • (2.2) Create a flag that determines whether a node is Tails or TPA and which base class it should include, and assign nodes to their corresponding base class using the flag above
  • (2.3) Point Tails nodes to the Tor Puppet server
  • (2.4) Retire the Tails' Puppet server

Phase 3: Codebase homogeneity

This phase paves the way towards a cleaner future:

  • (3.1) Remove all tails::profile::puppet profiles
  • (3.2) Merge the 8 conflicting Tails and TPA profiles:
    • grub
    • limesurvey
    • mta
    • nginx
    • podman
    • rspamd
    • sudo
    • sysctl
  • (3.3) Move the remaining 114 non-conflicting Tails profiles to profile (without ::tails)

At this point, we'll have 244 profiles.

Next steps

From here on, there's a single code base on a single Puppet server, and nodes from both fleets (Tails and TPA) use the same environment.

The code base is not, however, fully merged just yet, of course. A possible way forward to merge services might be like this:

  • To "merge" a service, a class existing in one profile (say profile::prometheus from profile::tpa) is progressively added to all nodes on the other side, and eventually to the other profile (say profile::tails)

So while we don't have a detailed step-by-step plan to merge all services, the above should give us general guidelines to merge services on a need-to basis, and progress in the merge roadmap.

Costs

To estimate costs of tasks in days of work, We use the same parameters as proposed in Jacob Kaplan-Moss' estimation technique.

"Complexity" estimates the size of a task in days, accounting for all other things a worker has to deal with during a normal workday:

Complexity Time
small 1 day
medium 3 days
large 1 week (5 days)
extra-large 2 weeks (10 days)

"Uncertainty" is a scale factor applied to the length to get a pessimistic estimate if things go wrong:

Uncertainty Level Multiplier
low 1.1
moderate 1.5
high 2.0
extreme 5.0

Per-task worst-case duration estimate

Task Codebase Complexity Uncertainty Expected (days) Worst case (days)
(1.1) Switch to g10k Tails small high 2 4
(1.2) Remove ENC Tails small low 1 1.1
(1.3) Move nodes do roles Tails medium low 3 3.3
(1.4) Switch directory structure Tails small moderate 1 1.5
(1.5) Create default profile Tails small moderate 1 1.5
(1.6) Rename Tails profiles Tails small low 1 1.1
(1.7) Prefix exported resources Tails medium low 3 3.3
(1.8) Upgrade 3rd party modules Tails large moderate 5 7.5
(1.9) Install missing 3rd party modules Tor small low 1 1.1
(1.10) Prefix exported resources Tor medium low 3 3.3
(1.11) Create default profile Tor small moderate 1 1.5
(1.12) Enforce signed commits Tor medium moderate 3 4.5
(1.13) Move private data to Trocla Tor large moderate 5 7.5
(1.14) Publish repository Tor large moderate 5 7.5
(1.15) Enable EYAML Tor small low 1 1.1
(1.16) Upgrade 3rd party modules Tor x-large high 10 20
(2.1) Copy code Tor small low 1 1.1
(2.2) Differentiate Tails and Tor nodes Tor small moderate 1 1.5
(2.3) Switch Tails' nodes to Tor's Puppet server Tor large extreme 5 25
(2.4) Retire the Tails Puppet server Tor small low 1 1.1
(3.1) Ditch the Tails' Puppet profile Tor small low 1 1.1
(3.2) Merge conflicting profiles Tor large extreme 5 25
(3.3) Ditch the profile::tails namespace Tor small low 1 1.1

Per-phase worst-case time estimate

Task Worst case (days) Worst case (weeks)
Phase 1: Codebase preparation 69.8 17.45
Phase 2: Puppet server switch 28.7 7.2
Phase 3: Codebase homogeneity 27.2 6.8

Worst case duration: 125.7 days =~ 31.5 weeks

Timeline

The following parallel activities will probably influence (i.e. delay) this plan:

  • Upgrade to Debian Trixie: maybe start on March, ideally finish by the end of 2025
  • North-hemisphere summer vacations

Base on the above estimates, taking into account the potential delays, and stretching it a bit for a worst case scenario, here is a rough per-month timeline:

  • March (all Tails):
    • (1.1) Switch to g10k
    • (1.2) Remove ENC
    • (1.3) Move nodes to roles
    • (1.4) Switch directory structure
  • April:
    • (1.5) Create default profile
    • (1.6) Rename Tails profiles
    • (1.7) Prefix exported resources
    • (1.8) Upgrade 3rd party modules (Tails)
  • May:
    • (1.8) Upgrade 3rd party modules (Tails) (continuation)
    • (1.9) Install missing 3rd party modules (Tor)
    • (1.10) Prefix exported resources (Tor)
    • (1.11) Create default profile (Tor)
  • June (all Tor from now on):
    • (1.12) Enforce signed commits
    • (1.13) Move private data to Trocla
  • July:
    • (1.14) Publish repository
    • (1.15) Enable EYAML
    • (1.16) Upgrade 3rd party modules
  • August:
    • (1.16) Upgrade 3rd party modules (continuation)
  • September:
    • (2.1) Copy code
    • (2.2) Differentiate Tails and Tor nodes
    • (2.3) Switch Tails' nodes to Tor's Puppet server
  • October:
    • (2.3) Switch Tails' nodes to Tor's Puppet server (continuation)
  • November:
    • (2.4) Retire the Tails Puppet server
    • (3.1) Ditch the Tails' Puppet profile
  • Devember:
    • (3.2) Merge conflicting profiles
  • January:
    • (3.2) Merge conflicting profiles (continuation)
    • (3.3) Ditch the profile::tails namespace

Alternatives considered

  • Migrate services to TPA before moving Puppet: some of the Tails services heavily depend on others and/or on the network setup. For example, Jenkins Agents on different machines talk to a Jenkins Orchestrator and a Gitolite server hosted on different VMs, then build nightly ISOs that are copied to the web VM and published over HTTP. Migrating all of these over to TPA's infra would be much more complex than just merging Puppet.

References