Snippets Groups Projects

1 week ago
a508e13d
update Puppet doc to account for removal of the 3rdparty/ dir · a508e13d
zen authored 1 week ago
```
refs team#41974
```
a508e13d

History
update Puppet doc to account for removal of the 3rdparty/ dir
zen authored 1 week ago
```
refs team#41974
```

tpa-rfc-77-puppet-merge.md 13.33 KiB

title: Tails and TPA Puppet codebase merge
deadline: 2025-02-26
status: proposed

Background
Proposal
Costs
- Per-task worst-case duration estimate
- Per-phase worst-case time estimate
Timeline
Alternatives considered
References

Background

TPA-RFC-73 identified Puppet as a bottleneck for the merge between TPA and Tails infrastructure, as it blocks keeping, migrating and merging several other services. Merging codebases and ditching one of the Puppet servers is a complex move, so in this document we detail how that will be done.

Proposal

Goals

Must have

One Puppet Server to rule them all
Adoption of TPA's solution for handling Puppet modules and ENC
Convergence in Puppet modules versions
Commit signing (as it's fundamental for Tails' current backup solution)

Non-goals

This proposal is not about:

Completely refactoring and deduplicating code, as that will be done step-by-step while we handle each services individually after the Puppet Server merge
Ditching one way to store secrets in favor of another, as that will be done separately in the future, after both teams had the chance to experience Trocla and hiera-eyaml
Tackling individual service merges, such as backups, dns, monitoring and firewall; these will be tackled individually once all infra is under one Puppet Server
Applying new code standards everywhere; at most, we'll come up with general guidelines that could (maybe should) be used for new code and, in the future, for refactoring

Phase 1: Codebase preparation

This phase ensures that, once Tails code is copied to Tor's Puppet Control repo:

Code structure will match and be coherent
Tails code will not affect Tor's infra and Tor's code will not affect Tails infra

Note: Make sure to freeze all Puppet code refactoring on both sides before starting.

Converge in structure

Tails:

(1.1) Switch from Git submodules to using g10k (#41974)
(1.2) Remove ENC configuration, Tails don't really use it and the Puppet server switch will implement Tor's instead
(1.3) Move node definitions under manifests/nodes.pp to roles
(1.4) Switch to the directory structure used by Tor:
- Move custom non-profile modules (bitcoind, borgbackup, etckeeper, gitolite, rbac, reprepro, rss2email, tails, tirewall and yapgp) to legacy/. Note: there are no naming conflicts in this case.
- Make sure to leave only 3rd party modules under modules/. There are 2 naming conflicts here (unbound and network): Tails uses these from voxpupuli and Tor uses custom ones in legacy/, so in these cases we deprecate the Tor ones in favor of voxpupuli's.
- Rename hieradata to data
- Rename profiles to site
(1.5) Move default configuration to a new profile::tails class and include it in all nodes

Converge in substance

Tails:

(1.6) Rename all profiles from tails::profile to profile::tails
(1.7) Ensure all exported resources' tags are prefixed with tails_
(1.8) Upgrade 3rd-party modules to match TPA versions

Tor:

(1.9) Install all 3rd-party modules that are used by Tails but not by Tor
(1.10) Isolate all exported resources and collectors using tags
(1.11) Move default configuration to a new profile::tpa class and include it in all nodes
(1.12) Enforce signed commits
(1.13) Ensure all private data is moved to Trocla and publish the repo (tpo/tpa/team#29387)
(1.14) Import the tails::profile::puppet::eyaml profile into TPA's profile::puppet::server
(1.15) Copy the EYAML keys from the Tails to the Tor puppet server, and adapt hiera.yaml to use them
(1.16) Upgrade 3rd-party modules to match Tails versions

When we say "upgrade", we don't mean to upgrade to the latest upstream version of a module, but to the latest release that is highest version between the two codebases while also satisfying dependency requirements.

In other words, we don't "upgrade everything to latest", we "upgrade to Tails", or "upgrade to TPA", depending on the module. It's likely going to be "upgrade to Tails versions" everywhere, that said, considering the Tails codebase is generally tidier.

Phase 2: Puppet server switch

This phase moves all nodes from one Puppet server to the other:

(2.1) Copy code (legacy modules and profiles) from Tails to Tor
(2.2) Create a flag that determines whether a node is Tails or TPA and which base class it should include, and assign nodes to their corresponding base class using the flag above
(2.3) Point Tails nodes to the Tor Puppet server
(2.4) Retire the Tails' Puppet server

Phase 3: Codebase homogeneity

This phase paves the way towards a cleaner future:

(3.1) Remove all tails::profile::puppet profiles
(3.2) Merge the 8 conflicting Tails and TPA profiles:
- grub
- limesurvey
- mta
- nginx
- podman
- rspamd
- sudo
- sysctl
(3.3) Move the remaining 114 non-conflicting Tails profiles to profile (without ::tails)

At this point, we'll have 244 profiles.

Next steps

From here on, there's a single code base on a single Puppet server, and nodes from both fleets (Tails and TPA) use the same environment.

The code base is not, however, fully merged just yet, of course. A possible way forward to merge services might be like this:

To "merge" a service, a class existing in one profile (say profile::prometheus from profile::tpa) is progressively added to all nodes on the other side, and eventually to the other profile (say profile::tails)

So while we don't have a detailed step-by-step plan to merge all services, the above should give us general guidelines to merge services on a need-to basis, and progress in the merge roadmap.

Costs

To estimate costs of tasks in days of work, We use the same parameters as proposed in Jacob Kaplan-Moss' estimation technique.

"Complexity" estimates the size of a task in days, accounting for all other things a worker has to deal with during a normal workday:

Complexity	Time
small	1 day
medium	3 days
large	1 week (5 days)
extra-large	2 weeks (10 days)

"Uncertainty" is a scale factor applied to the length to get a pessimistic estimate if things go wrong:

Uncertainty Level	Multiplier
low	1.1
moderate	1.5
high	2.0
extreme	5.0

Per-task worst-case duration estimate

Task	Codebase	Complexity	Uncertainty	Expected (days)	Worst case (days)
(1.1) Switch to g10k	Tails	small	high	2	4
(1.2) Remove ENC	Tails	small	low	1	1.1
(1.3) Move nodes do roles	Tails	medium	low	3	3.3
(1.4) Switch directory structure	Tails	small	moderate	1	1.5
(1.5) Create default profile	Tails	small	moderate	1	1.5
(1.6) Rename Tails profiles	Tails	small	low	1	1.1
(1.7) Prefix exported resources	Tails	medium	low	3	3.3
(1.8) Upgrade 3rd party modules	Tails	large	moderate	5	7.5
(1.9) Install missing 3rd party modules	Tor	small	low	1	1.1
(1.10) Prefix exported resources	Tor	medium	low	3	3.3
(1.11) Create default profile	Tor	small	moderate	1	1.5
(1.12) Enforce signed commits	Tor	medium	moderate	3	4.5
(1.13) Move private data to Trocla	Tor	large	moderate	5	7.5
(1.14) Publish repository	Tor	large	moderate	5	7.5
(1.15) Enable EYAML	Tor	small	low	1	1.1
(1.16) Upgrade 3rd party modules	Tor	x-large	high	10	20
(2.1) Copy code	Tor	small	low	1	1.1
(2.2) Differentiate Tails and Tor nodes	Tor	small	moderate	1	1.5
(2.3) Switch Tails' nodes to Tor's Puppet server	Tor	large	extreme	5	25
(2.4) Retire the Tails Puppet server	Tor	small	low	1	1.1
(3.1) Ditch the Tails' Puppet profile	Tor	small	low	1	1.1
(3.2) Merge conflicting profiles	Tor	large	extreme	5	25
(3.3) Ditch the `profile::tails` namespace	Tor	small	low	1	1.1

Per-phase worst-case time estimate

Task	Worst case (days)	Worst case (weeks)
Phase 1: Codebase preparation	69.8	17.45
Phase 2: Puppet server switch	28.7	7.2
Phase 3: Codebase homogeneity	27.2	6.8

Worst case duration: 125.7 days =~ 31.5 weeks

Timeline

The following parallel activities will probably influence (i.e. delay) this plan:

Upgrade to Debian Trixie: maybe start on March, ideally finish by the end of 2025
North-hemisphere summer vacations

Base on the above estimates, taking into account the potential delays, and stretching it a bit for a worst case scenario, here is a rough per-month timeline:

March (all Tails):
- (1.1) Switch to g10k
- (1.2) Remove ENC
- (1.3) Move nodes to roles
- (1.4) Switch directory structure
April:
- (1.5) Create default profile
- (1.6) Rename Tails profiles
- (1.7) Prefix exported resources
- (1.8) Upgrade 3rd party modules (Tails)
May:
- (1.8) Upgrade 3rd party modules (Tails) (continuation)
- (1.9) Install missing 3rd party modules (Tor)
- (1.10) Prefix exported resources (Tor)
- (1.11) Create default profile (Tor)
June (all Tor from now on):
- (1.12) Enforce signed commits
- (1.13) Move private data to Trocla
July:
- (1.14) Publish repository
- (1.15) Enable EYAML
- (1.16) Upgrade 3rd party modules
August:
- (1.16) Upgrade 3rd party modules (continuation)
September:
- (2.1) Copy code
- (2.2) Differentiate Tails and Tor nodes
- (2.3) Switch Tails' nodes to Tor's Puppet server
October:
- (2.3) Switch Tails' nodes to Tor's Puppet server (continuation)
November:
- (2.4) Retire the Tails Puppet server
- (3.1) Ditch the Tails' Puppet profile
Devember:
- (3.2) Merge conflicting profiles
January:
- (3.2) Merge conflicting profiles (continuation)
- (3.3) Ditch the profile::tails namespace

Alternatives considered

Migrate services to TPA before moving Puppet: some of the Tails services heavily depend on others and/or on the network setup. For example, Jenkins Agents on different machines talk to a Jenkins Orchestrator and a Gitolite server hosted on different VMs, then build nightly ISOs that are copied to the web VM and published over HTTP. Migrating all of these over to TPA's infra would be much more complex than just merging Puppet.

References