|
|
|
---
|
|
|
|
title: TPA-RFC-80: Debian trixie upgrade schedule
|
|
|
|
costs: staff, 4+ weeks
|
|
|
|
approval: TPA, service admins
|
|
|
|
affected users: TPA, service admins
|
|
|
|
deadline: TODO
|
|
|
|
status: draft
|
|
|
|
discussion: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41990
|
|
|
|
---
|
|
|
|
|
|
|
|
# Background
|
|
|
|
|
|
|
|
Debian 13 "trixie", currently "testing" is going into freeze soon, which
|
|
|
|
means we should have a new Debian stable release in 2025. It has been
|
|
|
|
a long-standing tradition at TPA to collaborate in the Debian
|
|
|
|
development process and part of that process is to upgrade our servers
|
|
|
|
during the freeze. Upgrading during the freeze makes it easier for us
|
|
|
|
to fix bugs as we find them and contribute them to the community.
|
|
|
|
|
|
|
|
The [freeze dates announced by the debian.org release team][] are:
|
|
|
|
|
|
|
|
2025-03-15 - Milestone 1 - Transition and toolchain freeze
|
|
|
|
2025-04-15 - Milestone 2 - Soft Freeze
|
|
|
|
2025-05-15 - Milestone 3 - Hard Freeze - for key packages and
|
|
|
|
packages without autopkgtests
|
|
|
|
To be announced - Milestone 4 - Full Freeze
|
|
|
|
|
|
|
|
Even though we've just completed the Debian 11 ("bullseye") and 12
|
|
|
|
("bookworm") upgrades in late 2024, we feel it's a good idea to start
|
|
|
|
*and* complete the trixie upgrades in 2025. That way, we can hope of
|
|
|
|
having a year or two (2026-2027?) *without* any major upgrades.
|
|
|
|
|
|
|
|
This proposal is part of the [Debian 13 trixie upgrade milestone][],
|
|
|
|
itself part of the [2025 TPA roadmap][].
|
|
|
|
|
|
|
|
[Debian 13 trixie upgrade milestone]: https://gitlab.torproject.org/groups/tpo/tpa/-/milestones/12
|
|
|
|
[2025 TPA roadmap]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/roadmap/2025
|
|
|
|
[freeze dates announced by the debian.org release team]: https://lists.debian.org/debian-devel-announce/2025/01/msg00004.html
|
|
|
|
|
|
|
|
# Proposal
|
|
|
|
|
|
|
|
As usual, we perform the upgrades in three batches, in increasing
|
|
|
|
order of complexity, starting in 2025Q2, hoping to finish by the end
|
|
|
|
of 2025.
|
|
|
|
|
|
|
|
Note that, this year, this proposal also includes upgrading the Tails
|
|
|
|
infrastructure as well. To help with merging rotations in the two
|
|
|
|
teams, TPA staff will upgrade Tails machines, with Tails folks
|
|
|
|
assistance, and vice-versa.
|
|
|
|
|
|
|
|
## Affected users
|
|
|
|
|
|
|
|
All service admins are affected by this change. If you have shell
|
|
|
|
access on any TPA server, you want to read this announcement.
|
|
|
|
|
|
|
|
In the past, TPA has typically keeps a page detailing notable changes
|
|
|
|
and proposal like this one would link against the upstream release
|
|
|
|
notes. Unfortunately, at the time writing, upstream hasn't yet
|
|
|
|
produced release notes (as we're still in testing).
|
|
|
|
|
|
|
|
TODO: well the above sounds bad. maybe we shouldn't upgrade during
|
|
|
|
freeze after all?
|
|
|
|
|
|
|
|
## Upgrade schedule
|
|
|
|
|
|
|
|
The upgrade is split in multiple batches:
|
|
|
|
|
|
|
|
- installer changes: TODO
|
|
|
|
|
|
|
|
- low complexity: mostly TPA services and less critical Tails servers
|
|
|
|
|
|
|
|
- moderate complexity: TPA "service admins" machines and remaining
|
|
|
|
Tails physical servers and VMs running services from the official
|
|
|
|
Debian repositories only
|
|
|
|
|
|
|
|
- high complexity: Tails VMs running services not from the official
|
|
|
|
Debian repositories
|
|
|
|
|
|
|
|
- cleanup: TODO
|
|
|
|
|
|
|
|
The free time between the first two batches will also allow us to
|
|
|
|
cover for unplanned contingencies: upgrades that could drag on and
|
|
|
|
other work that will inevitably need to be performed.
|
|
|
|
|
|
|
|
The objective is to do the batches in collective "upgrade parties"
|
|
|
|
that should be "fun" for the team. This policy has proven to be
|
|
|
|
effective in the previous upgrades and we are eager to repeat it
|
|
|
|
again.
|
|
|
|
|
|
|
|
### Batch 1: low complexity, April-May 2025
|
|
|
|
|
|
|
|
This is actually scheduled in two weeks: TPA boxes will be upgraded in
|
|
|
|
the last week of April, and Tails in the first week of May.
|
|
|
|
|
|
|
|
The idea is to start the upgrade long enough before the vacations to
|
|
|
|
give us plenty of time to recover, and some room to start the second
|
|
|
|
batch.
|
|
|
|
|
|
|
|
In April, Debian should also be in "soft freeze", not quite a fully
|
|
|
|
"stable" environment, but that should be good enough for simple
|
|
|
|
setups.
|
|
|
|
|
|
|
|
35 TPA machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
archive-01.torproject.org
|
|
|
|
cdn-backend-sunet-02.torproject.org
|
|
|
|
chives.torproject.org
|
|
|
|
dal-rescue-01.torproject.org
|
|
|
|
dal-rescue-02.torproject.org
|
|
|
|
gayi.torproject.org
|
|
|
|
hetzner-hel1-02.torproject.org
|
|
|
|
hetzner-hel1-03.torproject.org
|
|
|
|
hetzner-nbg1-01.torproject.org
|
|
|
|
hetzner-nbg1-02.torproject.org
|
|
|
|
idle-dal-02.torproject.org
|
|
|
|
idle-fsn-01.torproject.org
|
|
|
|
lists-01.torproject.org
|
|
|
|
loghost01.torproject.org
|
|
|
|
mandos-01.torproject.org
|
|
|
|
media-01.torproject.org
|
|
|
|
minio-01.torproject.org
|
|
|
|
mta-dal-01.torproject.org
|
|
|
|
mx-dal-01.torproject.org
|
|
|
|
neriniflorum.torproject.org
|
|
|
|
ns3.torproject.org
|
|
|
|
ns5.torproject.org
|
|
|
|
palmeri.torproject.org
|
|
|
|
perdulce.torproject.org
|
|
|
|
srs-dal-01.torproject.org
|
|
|
|
ssh-dal-01.torproject.org
|
|
|
|
static-gitlab-shim.torproject.org
|
|
|
|
staticiforme.torproject.org
|
|
|
|
static-master-fsn.torproject.org
|
|
|
|
submit-01.torproject.org
|
|
|
|
vault-01.torproject.org
|
|
|
|
web-dal-07.torproject.org
|
|
|
|
web-dal-08.torproject.org
|
|
|
|
web-fsn-01.torproject.org
|
|
|
|
web-fsn-02.torproject.org
|
|
|
|
```
|
|
|
|
|
|
|
|
4 Tails machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
ecours.tails.net
|
|
|
|
puppet.lizard
|
|
|
|
skink.tails.net
|
|
|
|
stone.tails.net
|
|
|
|
```
|
|
|
|
|
|
|
|
In the [first batch of bookworm machines][], we ended up taking 20
|
|
|
|
minutes per machine, done in a single day, but warned that the second
|
|
|
|
batch took longer.
|
|
|
|
|
|
|
|
It's probably safe to estimate 20 hours (30 minutes per machine) for
|
|
|
|
this work, in a single week.
|
|
|
|
|
|
|
|
[first batch of bookworm machines]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41251
|
|
|
|
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 1 TODO]().
|
|
|
|
|
|
|
|
### Batch 2: moderate complexity, May-June 2025
|
|
|
|
|
|
|
|
This is scheduled for the last week of may for TPA machines, and the
|
|
|
|
first week of June for Tails.
|
|
|
|
|
|
|
|
At this point, Debian testing should be in "hard freeze", which should
|
|
|
|
be more stable.
|
|
|
|
|
|
|
|
40 TPA machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
anonticket-01.torproject.org
|
|
|
|
backup-storage-01.torproject.org
|
|
|
|
bacula-director-01.torproject.org
|
|
|
|
btcpayserver-02.torproject.org
|
|
|
|
bungei.torproject.org
|
|
|
|
carinatum.torproject.org
|
|
|
|
check-01.torproject.org
|
|
|
|
ci-runner-x86-02.torproject.org
|
|
|
|
ci-runner-x86-03.torproject.org
|
|
|
|
colchicifolium.torproject.org
|
|
|
|
collector-02.torproject.org
|
|
|
|
crm-int-01.torproject.org
|
|
|
|
dangerzone-01.torproject.org
|
|
|
|
donate-01.torproject.org
|
|
|
|
donate-review-01.torproject.org
|
|
|
|
forum-01.torproject.org
|
|
|
|
gitlab-02.torproject.org
|
|
|
|
henryi.torproject.org
|
|
|
|
materculae.torproject.org
|
|
|
|
meronense.torproject.org
|
|
|
|
metricsdb-01.torproject.org
|
|
|
|
metricsdb-02.torproject.org
|
|
|
|
metrics-store-01.torproject.org
|
|
|
|
onionbalance-02.torproject.org
|
|
|
|
onionoo-backend-03.torproject.org
|
|
|
|
polyanthum.torproject.org
|
|
|
|
probetelemetry-01.torproject.org
|
|
|
|
rdsys-frontend-01.torproject.org
|
|
|
|
rdsys-test-01.torproject.org
|
|
|
|
relay-01.torproject.org
|
|
|
|
rude.torproject.org
|
|
|
|
survey-01.torproject.org
|
|
|
|
tbb-nightlies-master.torproject.org
|
|
|
|
tb-build-02.torproject.org
|
|
|
|
tb-build-03.torproject.org
|
|
|
|
tb-build-06.torproject.org
|
|
|
|
tb-pkgstage-01.torproject.org
|
|
|
|
tb-tester-01.torproject.org
|
|
|
|
telegram-bot-01.torproject.org
|
|
|
|
weather-01.torproject.org
|
|
|
|
```
|
|
|
|
|
|
|
|
17 Tails machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
apt-proxy.lizard
|
|
|
|
apt.lizard
|
|
|
|
bitcoin.lizard
|
|
|
|
bittorrent.lizard
|
|
|
|
bridge.lizard
|
|
|
|
dns.lizard
|
|
|
|
dragon.tails.net
|
|
|
|
gitlab-runner.iguana
|
|
|
|
iguana.tails.net
|
|
|
|
lizard.tails.net
|
|
|
|
mail.lizard
|
|
|
|
misc.lizard
|
|
|
|
puppet-git.lizard
|
|
|
|
rsync.lizard
|
|
|
|
teels.tails.net
|
|
|
|
whisperback.lizard
|
|
|
|
www.lizard
|
|
|
|
```
|
|
|
|
|
|
|
|
The [second batch of bookworm upgrades][] took 33 hours for 31
|
|
|
|
machines, so about one hour per box. Here we have 57 machines, so it
|
|
|
|
will likely take us 60 hours (or two weeks) to complete the upgrade.
|
|
|
|
|
|
|
|
[second batch of bookworm upgrades]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41252
|
|
|
|
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 2 TODO]().
|
|
|
|
|
|
|
|
### Batch 3: high complexity, 2025 Q3-Q4
|
|
|
|
|
|
|
|
Those machines are harder to upgrade, or more critical. In the case of
|
|
|
|
TPA machines, we typically regroup the Ganeti servers and all the
|
|
|
|
"snowflake" servers that are not properly Puppetized and full of
|
|
|
|
legacy, namely the LDAP, DNS, and Puppet servers.
|
|
|
|
|
|
|
|
That said, we waited a long time to upgrade the Ganeti cluster for
|
|
|
|
bookworm, and it turned out to be trivial, so perhaps those could
|
|
|
|
eventually be made part of the second batch.
|
|
|
|
|
|
|
|
15 TPA machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
alberti.torproject.org
|
|
|
|
dal-node-01.torproject.org
|
|
|
|
dal-node-02.torproject.org
|
|
|
|
dal-node-03.torproject.org
|
|
|
|
fsn-node-01.torproject.org
|
|
|
|
fsn-node-02.torproject.org
|
|
|
|
fsn-node-03.torproject.org
|
|
|
|
fsn-node-04.torproject.org
|
|
|
|
fsn-node-05.torproject.org
|
|
|
|
fsn-node-06.torproject.org
|
|
|
|
fsn-node-07.torproject.org
|
|
|
|
fsn-node-08.torproject.org
|
|
|
|
nevii.torproject.org
|
|
|
|
pauli.torproject.org
|
|
|
|
puppetdb-01.torproject.org
|
|
|
|
```
|
|
|
|
|
|
|
|
It seems like the [bookworm Ganeti upgrade][] took roughly 10h of
|
|
|
|
work. We ballpark the rest of the upgrade to another 10h of work, so
|
|
|
|
possibly 20h.
|
|
|
|
|
|
|
|
11 Tails machines:
|
|
|
|
|
|
|
|
```
|
|
|
|
isoworker1.dragon
|
|
|
|
isoworker2.dragon
|
|
|
|
isoworker3.dragon
|
|
|
|
isoworker4.dragon
|
|
|
|
isoworker5.dragon
|
|
|
|
isoworker6.iguana
|
|
|
|
isoworker7.iguana
|
|
|
|
isoworker8.iguana
|
|
|
|
jenkins.dragon
|
|
|
|
survey.lizard
|
|
|
|
translate.lizard
|
|
|
|
```
|
|
|
|
|
|
|
|
[bookworm Ganeti upgrade]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41254
|
|
|
|
|
|
|
|
The challenge with Tails upgrades is the coordination with the Tails
|
|
|
|
team, in particular for the Jenkins upgrades.
|
|
|
|
|
|
|
|
Feedback and coordination of this batch happens in [issue batch 3 TODO]().
|
|
|
|
|
|
|
|
## Upgrade automation
|
|
|
|
|
|
|
|
TODO: document we want to start automating upgrades more
|
|
|
|
|
|
|
|
# Alternatives considered
|
|
|
|
|
|
|
|
## Retirements or rebuilds
|
|
|
|
|
|
|
|
We do not plan any major upgrade or retirements in the third phase
|
|
|
|
this time.
|
|
|
|
|
|
|
|
In the future, we hope to decouple those as much as possible, as the
|
|
|
|
Icinga retirement and Mailman 3 became blockers that slowed down the
|
|
|
|
upgrade significantly for bookworm. In both cases, however, the
|
|
|
|
upgrades *were* challenging and had to be performed one way or
|
|
|
|
another, so it's unclear if we can optimize this any further.
|
|
|
|
|
|
|
|
We are clear, however, that we will not postpone an upgrade for a
|
|
|
|
server retirement. Dangerzone, for example, is scheduled for
|
|
|
|
retirement ([TPA-RFC-78][]) but is still planned as normal above.
|
|
|
|
|
|
|
|
[TPA-RFC-78]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-78-dangerzone-retirement
|
|
|
|
|
|
|
|
# Costs
|
|
|
|
|
|
|
|
The entire work here should consist of about four weeks of work,
|
|
|
|
medium uncertainty.
|
|
|
|
|
|
|
|
# Approvals required
|
|
|
|
|
|
|
|
This proposal needs approval from TPA team members, but service admins
|
|
|
|
can request additional delay if they are worried about their service
|
|
|
|
being affected by the upgrade.
|
|
|
|
|
|
|
|
Comments or feedback can be provided in issues linked above, or the
|
|
|
|
general process can be commented on in issue [tpo/tpa/team#41990][].
|
|
|
|
|
|
|
|
[tpo/tpa/team#41990]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41990
|
|
|
|
|
|
|
|
# References
|
|
|
|
|
|
|
|
* [Debian 13 trixie upgrade milestone][]
|
|
|
|
* [discussion ticket][tpo/tpa/team#41990]
|
|
|
|
|
|
|
|
[TPA bookworm upgrade procedure]: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/upgrades/bookworm |