Changes
Page history
propose trixie upgrade schedule (
#41990
)
authored
Mar 04, 2025
by
anarcat
Hide whitespace changes
Inline
Side-by-side
policy/tpa-rfc-80-debian-trixie-upgrade-schedule.md
View page @
b703d63d
...
@@ -3,11 +3,16 @@ title: TPA-RFC-80: Debian trixie upgrade schedule
...
@@ -3,11 +3,16 @@ title: TPA-RFC-80: Debian trixie upgrade schedule
costs
:
staff, 4+ weeks
costs
:
staff, 4+ weeks
approval
:
TPA, service admins
approval
:
TPA, service admins
affected users
:
TPA, service admins
affected users
:
TPA, service admins
deadline
:
TODO
deadline
:
2 weeks, 2025-03-18
status
:
draft
status
:
proposed
discussion
:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41990
discussion
:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41990
---
---
Summary: start upgrading servers during the Debian "trixie" freeze, if
it goes well, complete most of the fleet upgrade in around June 2025,
with full completion by the end of 2025, with a 2026 year free of
major upgrades entirely. Improve automation.
# Background
# Background
Debian 13 "trixie", currently "testing" is going into freeze soon, which
Debian 13 "trixie", currently "testing" is going into freeze soon, which
...
@@ -58,14 +63,15 @@ and proposal like this one would link against the upstream release
...
@@ -58,14 +63,15 @@ and proposal like this one would link against the upstream release
notes. Unfortunately, at the time writing, upstream hasn't yet
notes. Unfortunately, at the time writing, upstream hasn't yet
produced release notes (as we're still in testing).
produced release notes (as we're still in testing).
TODO: well the above sounds bad. maybe we shouldn't upgrade during
We're hoping the procedure will be fine-tuned by the time we're ready
freeze after all?
to coordinate the second batch of updates, around May 20204, when we
will send reminders to affected teams.
## Upgrade schedule
## Upgrade schedule
The upgrade is split in multiple batches:
The upgrade is split in multiple batches:
-
installer changes
: TODO
-
automation and
installer changes
-
low complexity: mostly TPA services and less critical Tails servers
-
low complexity: mostly TPA services and less critical Tails servers
...
@@ -76,7 +82,7 @@ The upgrade is split in multiple batches:
...
@@ -76,7 +82,7 @@ The upgrade is split in multiple batches:
-
high complexity: Tails VMs running services not from the official
-
high complexity: Tails VMs running services not from the official
Debian repositories
Debian repositories
-
cleanup
: TODO
-
cleanup
The free time between the first two batches will also allow us to
The free time between the first two batches will also allow us to
cover for unplanned contingencies: upgrades that could drag on and
cover for unplanned contingencies: upgrades that could drag on and
...
@@ -87,6 +93,21 @@ that should be "fun" for the team. This policy has proven to be
...
@@ -87,6 +93,21 @@ that should be "fun" for the team. This policy has proven to be
effective in the previous upgrades and we are eager to repeat it
effective in the previous upgrades and we are eager to repeat it
again.
again.
### Upgrade automation and installer changes
First, we tweak the installers to deploy trixie by default to avoid
installing further "old" systems. This includes the bare-metal
installers but also and especially the virtual machine installers and
container images.
We also want to work on automating the upgrade procedure
further. We've had catastrophic errors in the PostgreSQL upgrade
procedure in the past, in particular, but the whole procedure is now
considered ripe for automation, see
[
tpo/tpa/team#41485
][]
for
details.
[
tpo/tpa/team#41485
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41485
### Batch 1: low complexity, April-May 2025
### Batch 1: low complexity, April-May 2025
This is actually scheduled in two weeks: TPA boxes will be upgraded in
This is actually scheduled in two weeks: TPA boxes will be upgraded in
...
@@ -158,7 +179,9 @@ this work, in a single week.
...
@@ -158,7 +179,9 @@ this work, in a single week.
[
first batch of bookworm machines
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41251
[
first batch of bookworm machines
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41251
Feedback and coordination of this batch happens in
[
issue batch 1 TODO
](
).
Feedback and coordination of this batch happens in
[
issue batch 1
][]
.
[
issue batch 1
]:
"https://gitlab.torproject.org/tpo/tpa/team/-/issues/42071"
### Batch 2: moderate complexity, May-June 2025
### Batch 2: moderate complexity, May-June 2025
...
@@ -241,7 +264,9 @@ will likely take us 60 hours (or two weeks) to complete the upgrade.
...
@@ -241,7 +264,9 @@ will likely take us 60 hours (or two weeks) to complete the upgrade.
[
second batch of bookworm upgrades
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41252
[
second batch of bookworm upgrades
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41252
Feedback and coordination of this batch happens in
[
issue batch 2 TODO
](
).
Feedback and coordination of this batch happens in
[
issue batch 2
][]
.
[
issue batch 2
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/42070
### Batch 3: high complexity, 2025 Q3-Q4
### Batch 3: high complexity, 2025 Q3-Q4
...
@@ -257,21 +282,21 @@ eventually be made part of the second batch.
...
@@ -257,21 +282,21 @@ eventually be made part of the second batch.
15 TPA machines:
15 TPA machines:
```
```
alberti.torproject.org
- [ ]
alberti.torproject.org
dal-node-01.torproject.org
- [ ]
dal-node-01.torproject.org
dal-node-02.torproject.org
- [ ]
dal-node-02.torproject.org
dal-node-03.torproject.org
- [ ]
dal-node-03.torproject.org
fsn-node-01.torproject.org
- [ ]
fsn-node-01.torproject.org
fsn-node-02.torproject.org
- [ ]
fsn-node-02.torproject.org
fsn-node-03.torproject.org
- [ ]
fsn-node-03.torproject.org
fsn-node-04.torproject.org
- [ ]
fsn-node-04.torproject.org
fsn-node-05.torproject.org
- [ ]
fsn-node-05.torproject.org
fsn-node-06.torproject.org
- [ ]
fsn-node-06.torproject.org
fsn-node-07.torproject.org
- [ ]
fsn-node-07.torproject.org
fsn-node-08.torproject.org
- [ ]
fsn-node-08.torproject.org
nevii.torproject.org
- [ ]
nevii.torproject.org
pauli.torproject.org
- [ ]
pauli.torproject.org
puppetdb-01.torproject.org
- [ ]
puppetdb-01.torproject.org
```
```
It seems like the
[
bookworm Ganeti upgrade
][]
took roughly 10h of
It seems like the
[
bookworm Ganeti upgrade
][]
took roughly 10h of
...
@@ -281,17 +306,17 @@ possibly 20h.
...
@@ -281,17 +306,17 @@ possibly 20h.
11 Tails machines:
11 Tails machines:
```
```
isoworker1.dragon
- [ ]
isoworker1.dragon
isoworker2.dragon
- [ ]
isoworker2.dragon
isoworker3.dragon
- [ ]
isoworker3.dragon
isoworker4.dragon
- [ ]
isoworker4.dragon
isoworker5.dragon
- [ ]
isoworker5.dragon
isoworker6.iguana
- [ ]
isoworker6.iguana
isoworker7.iguana
- [ ]
isoworker7.iguana
isoworker8.iguana
- [ ]
isoworker8.iguana
jenkins.dragon
- [ ]
jenkins.dragon
survey.lizard
- [ ]
survey.lizard
translate.lizard
- [ ]
translate.lizard
```
```
[
bookworm Ganeti upgrade
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41254
[
bookworm Ganeti upgrade
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/41254
...
@@ -299,11 +324,20 @@ translate.lizard
...
@@ -299,11 +324,20 @@ translate.lizard
The challenge with Tails upgrades is the coordination with the Tails
The challenge with Tails upgrades is the coordination with the Tails
team, in particular for the Jenkins upgrades.
team, in particular for the Jenkins upgrades.
Feedback and coordination of this batch happens in
[
issue batch 3 TODO
](
).
Feedback and coordination of this batch happens in
[
issue batch 3
][]
.
[
issue batch 3
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/42069
### Cleanup work
## Upgrade automation
Once the upgrade is completed and the entire fleet is again running a
single OS, it's time for cleanup. This involves updating configuration
files to the new versions and removing old compatibility code in
Puppet, removing old container images, and generally wrapping things
up.
TODO: document we want to start automating upgrades more
This process has been historically neglected, but we're hoping to wrap
this up, worst case in 2026.
# Alternatives considered
# Alternatives considered
...
...
...
...