| ... | ... | @@ -3,11 +3,16 @@ title: TPA-RFC-80: Debian trixie upgrade schedule |
|
|
|
costs: staff, 4+ weeks
|
|
|
|
approval: TPA, service admins
|
|
|
|
affected users: TPA, service admins
|
|
|
|
deadline: TODO
|
|
|
|
status: draft
|
|
|
|
deadline: 2 weeks, 2025-03-18
|
|
|
|
status: proposed
|
|
|
|
discussion: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41990
|
|
|
|
---
|
|
|
|
|
|
|
|
Summary: start upgrading servers during the Debian "trixie" freeze, if
|
|
|
|
it goes well, complete most of the fleet upgrade in around June 2025,
|
|
|
|
with full completion by the end of 2025, with a 2026 year free of
|
|
|
|
major upgrades entirely. Improve automation.
|
|
|
|
|
|
|
|
# Background
|
|
|
|
|
|
|
|
Debian 13 "trixie", currently "testing" is going into freeze soon, which
|
| ... | ... | @@ -58,14 +63,15 @@ and proposal like this one would link against the upstream release |
|
|
|
notes. Unfortunately, at the time writing, upstream hasn't yet
|
|
|
|
produced release notes (as we're still in testing).
|
|
|
|
|
|
|
|
TODO: well the above sounds bad. maybe we shouldn't upgrade during
|
|
|
|
freeze after all?
|
|
|
|
We're hoping the procedure will be fine-tuned by the time we're ready
|
|
|
|
to coordinate the second batch of updates, around May 20204, when we
|
|
|
|
will send reminders to affected teams.
|
|
|
|
|
|
|
|
## Upgrade schedule
|
|
|
|
|
|
|
|
The upgrade is split in multiple batches:
|
|
|
|
|
|
|
|
- installer changes: TODO
|
|
|
|
- automation and installer changes
|
|
|
|
|
|
|
|
- low complexity: mostly TPA services and less critical Tails servers
|
|
|
|
|
| ... | ... | @@ -76,7 +82,7 @@ The upgrade is split in multiple batches: |
|
|
|
- high complexity: Tails VMs running services not from the official
|
|
|
|
Debian repositories
|
|
|
|
|
|
|
|
- cleanup: TODO
|
|
|
|
- cleanup
|
|
|
|
|
|
|
|
The free time between the first two batches will also allow us to
|
|
|
|
cover for unplanned contingencies: upgrades that could drag on and
|
| ... | ... | @@ -87,6 +93,21 @@ that should be "fun" for the team. This policy has proven to be |
|
|
|
effective in the previous upgrades and we are eager to repeat it
|
|
|
|
again.
|
|
|
|
|
|
|
|
### Upgrade automation and installer changes
|
|
|
|
|
|
|
|
First, we tweak the installers to deploy trixie by default to avoid
|
|
|
|
installing further "old" systems. This includes the bare-metal
|
|
|
|
installers but also and especially the virtual machine installers and
|
|
|
|
container images.
|
|
|
|
|
|
|
|
We also want to work on automating the upgrade procedure
|
|
|
|
further. We've had catastrophic errors in the PostgreSQL upgrade
|
|
|
|
procedure in the past, in particular, but the whole procedure is now
|
|
|
|
considered ripe for automation, see [tpo/tpa/team#41485][] for
|
|
|
|
details.
|
|
|
|
|
|
|
|
[tpo/tpa/team#41485]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41485
|
|
|
|
|
|
|
|
### Batch 1: low complexity, April-May 2025
|
|
|
|
|
|
|
|
This is actually scheduled in two weeks: TPA boxes will be upgraded in
|
| ... | ... | @@ -158,7 +179,9 @@ this work, in a single week. |
|
|
|
|
|
|
|
[first batch of bookworm machines]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41251
|
|
|
|
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 1 TODO]().
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 1][].
|
|
|
|
|
|
|
|
[issue batch 1]: "https://gitlab.torproject.org/tpo/tpa/team/-/issues/42071"
|
|
|
|
|
|
|
|
### Batch 2: moderate complexity, May-June 2025
|
|
|
|
|
| ... | ... | @@ -241,7 +264,9 @@ will likely take us 60 hours (or two weeks) to complete the upgrade. |
|
|
|
|
|
|
|
[second batch of bookworm upgrades]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41252
|
|
|
|
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 2 TODO]().
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 2][].
|
|
|
|
|
|
|
|
[issue batch 2]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/42070
|
|
|
|
|
|
|
|
### Batch 3: high complexity, 2025 Q3-Q4
|
|
|
|
|
| ... | ... | @@ -257,21 +282,21 @@ eventually be made part of the second batch. |
|
|
|
15 TPA machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
alberti.torproject.org
|
|
|
|
dal-node-01.torproject.org
|
|
|
|
dal-node-02.torproject.org
|
|
|
|
dal-node-03.torproject.org
|
|
|
|
fsn-node-01.torproject.org
|
|
|
|
fsn-node-02.torproject.org
|
|
|
|
fsn-node-03.torproject.org
|
|
|
|
fsn-node-04.torproject.org
|
|
|
|
fsn-node-05.torproject.org
|
|
|
|
fsn-node-06.torproject.org
|
|
|
|
fsn-node-07.torproject.org
|
|
|
|
fsn-node-08.torproject.org
|
|
|
|
nevii.torproject.org
|
|
|
|
pauli.torproject.org
|
|
|
|
puppetdb-01.torproject.org
|
|
|
|
- [ ] alberti.torproject.org
|
|
|
|
- [ ] dal-node-01.torproject.org
|
|
|
|
- [ ] dal-node-02.torproject.org
|
|
|
|
- [ ] dal-node-03.torproject.org
|
|
|
|
- [ ] fsn-node-01.torproject.org
|
|
|
|
- [ ] fsn-node-02.torproject.org
|
|
|
|
- [ ] fsn-node-03.torproject.org
|
|
|
|
- [ ] fsn-node-04.torproject.org
|
|
|
|
- [ ] fsn-node-05.torproject.org
|
|
|
|
- [ ] fsn-node-06.torproject.org
|
|
|
|
- [ ] fsn-node-07.torproject.org
|
|
|
|
- [ ] fsn-node-08.torproject.org
|
|
|
|
- [ ] nevii.torproject.org
|
|
|
|
- [ ] pauli.torproject.org
|
|
|
|
- [ ] puppetdb-01.torproject.org
|
|
|
|
```
|
|
|
|
|
|
|
|
It seems like the [bookworm Ganeti upgrade][] took roughly 10h of
|
| ... | ... | @@ -281,17 +306,17 @@ possibly 20h. |
|
|
|
11 Tails machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
isoworker1.dragon
|
|
|
|
isoworker2.dragon
|
|
|
|
isoworker3.dragon
|
|
|
|
isoworker4.dragon
|
|
|
|
isoworker5.dragon
|
|
|
|
isoworker6.iguana
|
|
|
|
isoworker7.iguana
|
|
|
|
isoworker8.iguana
|
|
|
|
jenkins.dragon
|
|
|
|
survey.lizard
|
|
|
|
translate.lizard
|
|
|
|
- [ ] isoworker1.dragon
|
|
|
|
- [ ] isoworker2.dragon
|
|
|
|
- [ ] isoworker3.dragon
|
|
|
|
- [ ] isoworker4.dragon
|
|
|
|
- [ ] isoworker5.dragon
|
|
|
|
- [ ] isoworker6.iguana
|
|
|
|
- [ ] isoworker7.iguana
|
|
|
|
- [ ] isoworker8.iguana
|
|
|
|
- [ ] jenkins.dragon
|
|
|
|
- [ ] survey.lizard
|
|
|
|
- [ ] translate.lizard
|
|
|
|
```
|
|
|
|
|
|
|
|
[bookworm Ganeti upgrade]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41254
|
| ... | ... | @@ -299,11 +324,20 @@ translate.lizard |
|
|
|
The challenge with Tails upgrades is the coordination with the Tails
|
|
|
|
team, in particular for the Jenkins upgrades.
|
|
|
|
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 3 TODO]().
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 3][].
|
|
|
|
|
|
|
|
[issue batch 3]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/42069
|
|
|
|
|
|
|
|
### Cleanup work
|
|
|
|
|
|
|
|
## Upgrade automation
|
|
|
|
Once the upgrade is completed and the entire fleet is again running a
|
|
|
|
single OS, it's time for cleanup. This involves updating configuration
|
|
|
|
files to the new versions and removing old compatibility code in
|
|
|
|
Puppet, removing old container images, and generally wrapping things
|
|
|
|
up.
|
|
|
|
|
|
|
|
TODO: document we want to start automating upgrades more
|
|
|
|
This process has been historically neglected, but we're hoping to wrap
|
|
|
|
this up, worst case in 2026.
|
|
|
|
|
|
|
|
# Alternatives considered
|
|
|
|
|
| ... | ... | |
| ... | ... | |