Debian 12 bookworm entered freeze in January 19th 2023. TPA is in the process of studying the procedure and hopes to start immediately after the bullseye upgrade is completed. We have a hard deadline of one year after the stable release, which gives us a few years to complete this process. Typically, however, we try to upgrade during the freeze to report (and contribute to) issues we find during the upgrade, as those are easier to fix during the freeze than after. In that sense, the deadline is more like the third quarter of 2023.
It is an aggressive timeline, which will like be missed. It is tracked in the GitLab issue tracker under the % Debian 12 bookworm upgrade milestone. Upgrades will be staged in batches, see TPA-RFC-20 for details on how that was performed in bullseye.
As soon as when the bullseye upgrade is completed, we hope to phase out the bullseye installers so that new machines are setup with bullseye.
This page aims at documenting the upgrade procedure, known problems and upgrade progress of the fleet.
- Procedure
- Service-specific upgrade procedures
- Notable changes
- Issues
- Troubleshooting
- References
- Fleet-wide changes
- Per host progress
Procedure
This procedure is designed to be applied, in batch, on multiple servers. Do NOT follow this procedure unless you are familiar with the command line and the Debian upgrade process. It has been crafted by and for experienced system administrators that have dozens if not hundreds of servers to upgrade.
In particular, it runs almost completely unattended: configuration changes are not prompted during the upgrade, and just not applied at all, which will break services in many cases. We use a clean-conflicts script to do this all in one shot to shorten the upgrade process (without it, configuration file changes stop the upgrade at more or less random times). Then those changes get applied after a reboot. And yes, that's even more dangerous.
IMPORTANT: if you are doing this procedure over SSH (I had the privilege of having a console), you may want to upgrade SSH first as it has a longer downtime period, especially if you are on a flaky connection.
See the "conflicts resolution" section below for how to handle
clean_conflicts
output.
-
Preparation:
echo reset to the default locale && export LC_ALL=C.UTF-8 && echo install some dependencies && sudo apt install ttyrec screen debconf-utils deborphan apt-forktracer && echo create ttyrec file with adequate permissions && sudo touch /var/log/upgrade-bookworm.ttyrec && sudo chmod 600 /var/log/upgrade-bookworm.ttyrec && sudo ttyrec -a -e screen /var/log/upgrade-bookworm.ttyrec
-
Backups and checks:
( umask 0077 && tar cfz /var/backups/pre-bookworm-backup.tgz /etc /var/lib/dpkg /var/lib/apt/extended_states /var/cache/debconf $( [ -e /var/lib/aptitude/pkgstates ] && echo /var/lib/aptitude/pkgstates ) && dpkg --get-selections "*" > /var/backups/dpkg-selections-pre-bookworm.txt && debconf-get-selections > /var/backups/debconf-selections-pre-bookworm.txt ) && : lock down puppet-managed postgresql version && ( if jq -re '.resources[] | select(.type=="Class" and .title=="Profile::Postgresql") | .title' < /var/lib/puppet/client_data/catalog/$(hostname -f).json; then echo "tpa_preupgrade_pg_version_lock: '$(/usr/share/postgresql-common/supported-versions)'" > /etc/facter/facts.d/tpa_preupgrade_pg_version_lock.yaml; fi ) && : pre-upgrade puppet run ( puppet agent --test || true ) && apt-mark showhold && dpkg --audit && echo look for dkms packages and make sure they are relevant, if not, purge. && ( dpkg -l '*dkms' || true ) && echo look for leftover config files && /usr/local/sbin/clean_conflicts && echo make sure backups are up to date in Bacula && printf "End of Step 2\a\n"
-
Enable module loading (for ferm) and test reboots:
systemctl disable modules_disabled.timer && puppet agent --disable "running major upgrade" && shutdown -r +1 "bookworm upgrade step 3: rebooting with module loading enabled"
-
Perform any pending upgrade and clear out old pins:
export LC_ALL=C.UTF-8 && sudo ttyrec -a -e screen /var/log/upgrade-bookworm.ttyrec apt update && apt -y upgrade && echo Check for pinned, on hold, packages, and possibly disable && rm -f /etc/apt/preferences /etc/apt/preferences.d/* && rm -f /etc/apt/sources.list.d/backports.debian.org.list && rm -f /etc/apt/sources.list.d/backports.list && rm -f /etc/apt/sources.list.d/bookworm.list && rm -f /etc/apt/sources.list.d/bullseye.list && rm -f /etc/apt/sources.list.d/*-backports.list && rm -f /etc/apt/sources.list.d/experimental.list && rm -f /etc/apt/sources.list.d/incoming.list && rm -f /etc/apt/sources.list.d/proposed-updates.list && rm -f /etc/apt/sources.list.d/sid.list && rm -f /etc/apt/sources.list.d/testing.list && echo purge removed packages && apt purge $(dpkg -l | awk '/^rc/ { print $2 }') && apt purge '?obsolete' && apt autoremove -y --purge && echo possibly clean up old kernels && dpkg -l 'linux-image-*' && echo look for packages from backports, other suites or archives && echo if possible, switch to official packages by disabling third-party repositories && apt-forktracer && printf "End of Step 4\a\n"
-
Check free space (see this guide to free up space), disable auto-upgrades, and download packages:
systemctl stop apt-daily.timer && sed -i 's#bullseye-security#bookworm-security#' $(ls /etc/apt/sources.list /etc/apt/sources.list.d/*) && sed -i 's/bullseye/bookworm/g' $(ls /etc/apt/sources.list /etc/apt/sources.list.d/*) && apt update && apt -y -d full-upgrade && apt -y -d upgrade && apt -y -d dist-upgrade && df -h && printf "End of Step 5\a\n"
-
Actual upgrade run:
echo put server in maintenance && sudo touch /etc/nologin && env DEBIAN_FRONTEND=noninteractive APT_LISTCHANGES_FRONTEND=none APT_LISTBUGS_FRONTEND=none UCF_FORCE_CONFFOLD=y \ apt full-upgrade -y -o Dpkg::Options::='--force-confdef' -o Dpkg::Options::='--force-confold' && printf "End of Step 6\a\n"
-
Post-upgrade procedures:
apt-get update --allow-releaseinfo-change && puppet agent --enable && puppet agent -t --noop && printf "Press enter to continue, Ctrl-C to abort." && read -r _ && (puppet agent -t || true) && echo deploy upgrades after possible Puppet sources.list changes && apt update && apt upgrade -y && rm -f /etc/default/bacula-fd.ucf-dist /etc/apache2/conf-available/security.conf.dpkg-dist /etc/apache2/mods-available/mpm_worker.conf.dpkg-dist /etc/default/puppet.dpkg-dist /etc/ntpsec/ntp.conf.dpkg-dist /etc/puppet/puppet.conf.dpkg-dist /etc/apt/apt.conf.d/50unattended-upgrades.dpkg-dist /etc/bacula/bacula-fd.conf.ucf-dist /etc/ca-certificates.conf.dpkg-old /etc/cron.daily/bsdmainutils.dpkg-remove /etc/default/prometheus-apache-exporter.dpkg-dist /etc/default/prometheus-node-exporter.dpkg-dist /etc/ldap/ldap.conf.dpkg-dist /etc/logrotate.d/apache2.dpkg-dist /etc/nagios/nrpe.cfg.dpkg-dist /etc/ssh/ssh_config.dpkg-dist /etc/ssh/sshd_config.ucf-dist /etc/sudoers.dpkg-dist /etc/syslog-ng/syslog-ng.conf.dpkg-dist /etc/unbound/unbound.conf.dpkg-dist /etc/systemd/system/fstrim.timer && printf "\a" && /usr/local/sbin/clean_conflicts && systemctl start apt-daily.timer && echo 'workaround for Debian bug #989720' && sed -i 's/^allow-ovs/auto/' /etc/network/interfaces && rm /etc/nologin && printf "End of Step 7\a\n" && shutdown -r +1 "bookworm upgrade step 7: removing old kernel image"
-
Service-specific upgrade procedures
If the server is hosting a more complex service, follow the right Service-specific upgrade procedures
-
Post-upgrade cleanup:
export LC_ALL=C.UTF-8 && sudo ttyrec -a -e screen /var/log/upgrade-bookworm.ttyrec apt-mark manual bind9-dnsutils puppet-agent && apt purge apt-forktracer && echo purging removed packages && apt purge $(dpkg -l | awk '/^rc/ { print $2 }') && apt autopurge && apt purge $(deborphan --guess-dummy) && while deborphan -n | grep -q . ; do apt purge $(deborphan -n); done && apt autopurge && echo review obsolete and odd packages && apt purge '?obsolete' && apt autopurge && apt list "?narrow(?installed, ?not(?codename($(lsb_release -c -s | tail -1))))" && apt clean && echo review installed kernels: && dpkg -l 'linux-image*' | less && printf "End of Step 9\a\n" && shutdown -r +1 "bookworm upgrade step 9: testing reboots one final time"
IMPORTANT: make sure you test the services at this point, or at least notify the admins responsible for the service so they do so. This will allow new problems that developed due to the upgrade to be found earlier.
Conflicts resolution
When the clean_conflicts
script gets run, it asks you to check each
configuration file that was modified locally but that the Debian
package upgrade wants to overwrite. You need to make a decision on
each file. This section aims to provide guidance on how to handle
those prompts.
Those config files should be manually checked on each host:
/etc/default/grub.dpkg-dist
/etc/initramfs-tools/initramfs.conf.dpkg-dist
The grub
config file, in particular, should be restored to the
upstream default and host-specific configuration moved to the grub.d
directory.
If other files come up, they should be added in the above decision
list, or in an operation in step 2 or 7 of the above procedure, before
the clean_conflicts
call.
Files that should be updated in Puppet are mentioned in the Issues section below as well.
Service-specific upgrade procedures
PostgreSQL upgrades
Note: before doing the entire major upgrade procedure, it is worth
considering upgrading PostgreSQL to "backports". There are no officiel
"Debian backports" of PostgreSQL, but there is an
https://apt.postgresql.org/ repo which is supposedly compatible
with the official Debian packages. The only (currently known) problem
with that repo is that it doesn't use the tilde (~
) version number,
so that when you do eventually do the major upgrade, you need to
manually upgrade those packages as well.
PostgreSQL is special and needs to be upgraded manually.
-
make a full backup of the old cluster:
ssh -tt bungei.torproject.org 'sudo -u torbackup postgres-make-one-base-backup $(grep ^meronense.torproject.org $(which postgres-make-base-backups ))'
The above assumes the host to backup is
meronense
and the backup server isbungei
. See howto/postgresql for details of that procedure. -
Once the backup completes, on the database server, possibly stop users of the database, because it will have to be stopped for the major upgrade.
on the Bacula director, in particular, this probably means waiting for all backups to complete and stopping the director:
service bacula-director stop
this will mean other things on other servers! failing to stop writes to the database will lead to problems with the backup monitoring system. an alternative is to just stop PostgreSQL altogether:
service postgresql@13-main stop
This also involves stopping Puppet so that it doesn't restart services:
puppet agent --disable "PostgreSQL upgrade"
-
On the storage server, move the directory out of the way and recreate it:
ssh bungei.torproject.org "mv /srv/backups/pg/meronense /srv/backups/pg/meronense-13 && sudo -u torbackup mkdir /srv/backups/pg/meronense"
-
on the database server, do the actual cluster upgrade:
export LC_ALL=C.UTF-8 && printf "about to stop and destroy cluster main on postgresql-15, press enter to continue" && read _ && port15=$(grep ^port /etc/postgresql/15/main/postgresql.conf | sed 's/port.*= //;s/[[:space:]].*$//') if psql -P $port15 --no-align --tuples-only \ -c "SELECT datname FROM pg_database WHERE datistemplate = false and datname != 'postgres';" \ | grep .; then echo "ERROR: database cluster 15 not empty" else pg_dropcluster --stop 15 main && pg_upgradecluster -m upgrade -k 13 main && rm -f /etc/facter/facts.d/tpa_preupgrade_pg_version_lock.yaml fi
Yes, that implies DESTROYING the NEW version but the point is we then recreate it from the old one.
TODO: this whole procedure needs to be moved into fabric, for sanity.
-
run puppet on the server and on the storage server to update backup configuration files; this should also restart any services stopped at step 1
puppet agent --enable && pat ssh bungei.torproject.org pat
-
make a new full backup of the new cluster:
ssh -tt bungei.torproject.org 'sudo -u torbackup postgres-make-one-base-backup $(grep ^meronense.torproject.org $(which postgres-make-base-backups ))'
-
make sure you check for gaps in the write-ahead log, see tpo/tpa/team#40776 for an example of that problem and the WAL-MISSING-AFTER PosgreSQL playbook for recovery.
-
purge the old backups directory after 3 weeks:
ssh bungei.torproject.org "echo 'rm -r /srv/backups/pg/meronense-13/' | at now + 21day"
The old PostgreSQL packages will be automatically cleaned up and purged at step 9 of the general upgrade procedure.
It is also wise to read the release notes for the relevant
release to see if there are any specific changes that are needed at
the application level, for service owners. In general, the above
procedure does use pg_upgrade
so that's already covered.
RT upgrades
Request Tracker was upgraded from version 4.4.6 (bullseye) to 5.0.3. The Debian
package is now request-tracker5
. To implement this transition, a manual
database upgrade was executed, and the Puppet profile was updated to reflect the
new package and executable names, and configuration options.
https://docs.bestpractical.com/rt/5.0.3/UPGRADING-5.0.html
Ganeti upgrades
So far it seems there is no significant upgrade on the Ganeti clusters, at least as far as Ganeti itself is concerned. In fact, there hasn't been a release upstream since 2022, which is a bit concerning.
There was a bug with the newer Haskell code in bookworm but the 3.0.2-2 package already has a patch (really a workaround) to fix that. Also, there was a serious regression in the Linux kernel which affected Haskell programs (1036755). The fix for this issue was released to bookworm in July 2023, in kernel 6.1.38.
No special procedure seems to be required for the Ganeti upgrade this time around, follow the normal upgrade procedures.
Puppet server upgrade
In my (anarcat) home lab, I had to apt install postgresql puppetdb puppet-terminus-puppetdb
and follow the connect instructions, as
I was using the redis terminus before (probably not relevant for TPA).
I also had to adduser puppetdb puppet
for it to be able to access
the certs, and add the certs to the jetty config. Basically:
certname="$(puppet config print certname)"
hostcert="$(puppet config print hostcert)"
hostkey="$(puppet config print hostprivkey)"
cacert="$(puppet config print cacert)"
adduser puppetdb puppet
cat >>/etc/puppetdb/conf.d/jetty.ini <<-EOF
ssl-host = 0.0.0.0
ssl-port = 8081
ssl-key = ${hostkey}
ssl-cert = ${hostcert}
ssl-ca-cert = ${cacert}
EOF
echo "Starting PuppetDB ..."
systemctl start puppetdb
cp /usr/share/doc/puppet-terminus-puppetdb/routes.yaml.example /etc/puppet/routes.yaml
cat >/etc/puppet/puppetdb.conf <<-EOF
[main]
server_urls = https://${certname}:8081
also:
apt install puppet-module-puppetlabs-cron-core puppet-module-puppetlabs-augeas-core puppet-module-puppetlabs-sshkeys-core
puppetserver gem install trocla:0.4.0 --no-document
Notable changes
Here is a list of notable changes from a system administration perspective:
- Podman upgraded to 4.3 means we can use it to make GitLab CI runners, see TPA-RFC-58 and issue tpo/tpa/team#41296
See also the wiki page about bookworm for another list.
New packages
This is a curated list of packages that were introduced in bookworm. There are actually thousands of new packages in the new Debian release, but this is a small selection of projects I found particularly interesting:
- OpenSnitch - interactive firewall inspired by Little Snitch (on Mac)
Updated packages
This table summarizes package changes that could be interesting for our project.
Package | Bullseye | Bookworm | Notes |
---|---|---|---|
Ansible | 2.10 | 2.14 | |
Bind | 9.16 | 9.18 | DoT, DoH, XFR-over-TLS, |
GCC | 10 | 12 | see GCC 11 and GCC 12 release notes |
Emacs | 27.1 | 28.1 | native compilation, seccomp, better emoji support, 24-bit true color support in terminals, C-x 4 4 to display next command in a new window, xterm-mouse-mode, context-menu-mode, repeat-mode |
Firefox | 91.13 | 102.11 | 91.13 already in buster-security |
Git | 2.30 | 2.39 |
rebase --update-refs , merge ort strategy, stash --staged , sparse index support, SSH signatures, help.autoCorrect=prompt , maintenance start , clone.defaultRemoteName , git rev-list --disk-usage
|
Golang | 1.15 | 1.19 | generics, fuzzing, SHA-1, TLS 1.0, and 1.1 disabled by default, performance improvements, embed package, Apple ARM support |
Linux | 5.10 | 6.1 | mainline Rust, multi-generational LRU, KMSAN, KFENCE, maple trees, guest memory encryption, AMD Zen performance improvements, C11, Blake-2 RNG, NTFS write support, Samba 3, Landlock, Apple M1, and much more |
LLVM | 13 | 15 | see LLVM 14 and LLVM 15 release notes |
OpenJDK | 11 | 17 | see this list for release notes |
OpenLDAP | 2.4 | 2.5 | 2FA, load balancer support |
OpenSSL | 1.1.1 | 3.0 | FIPS 140-3 compliance, MD2, DES disabled by default, AES-SIV, KDF-SSH, KEM-RSAVE, HTTPS client, Linux KTLS support |
OpenSSH | 8.4 | 9.2 |
scp now uses SFTP , NTRU quantum-resistant key exchange, SHA-1 disabled EnableEscapeCommandline
|
Podman | 3.0 | 4.3 | GitLab runner, sigstore support, Podman Desktop, volume mount , container clone , pod clone , Netavark network stack rewrite, podman-restart.service to restart all containers, digest support for pull , and lots more |
Postgresql | 13 | 15 | stats collector optimized out, UNIQUE NULLS NOT DISTINCT, MERGE, zstd/lz4 compression for WAL files, also in pg_basebackup, see also feature matrix |
Prometheus | 2.24 | 2.42 |
keep_firing_for alerts, @ modifier, classic UI removed, promtool check service-discovery command, feature flags which include native histograms, agent mode, snapshot-on-shutdown for faster restarts, generic HTTP service discovery, dark theme, Alertmanager v2 API default |
Python | 3.9.2 | 3.11 | exception groups, TOML in stdlib, "pipe" for Union types, structural pattern matching, Self type, variadic generatics, major performance improvements, Python 2 removed completely |
Puppet | 5.5.22 | 7.23 | major work from colleagues and myself |
Rustc | 1.48 | 1.63 |
Rust 2021, I/O safety, scoped threads, cargo add , --timings , inline assembly, bare-metal x86, captured identifiers in format strings, binding @ pattern , Open range patterns, IntoIterator for arrays, Or patterns, Unicode identifiers, const generics, arm64 tier-1 incremental compilation turned off and on a few times |
Vim | 8.2 | 9.0 | Vim9 script |
See the official release notes for the full list from Debian.
Removed packages
TODO
Python 2 was completely removed from Debian, a long-term task that had already started with bullseye, but not completed.
See also the noteworthy obsolete packages list.
Deprecation notices
TODO
Issues
See also the official list of known issues.
sudo -i
stops working
Note: This issue has been resolved
After upgrading to bookworm, sudo -i
started rejecting valid passwords on many machines. This is
because bookworm introduced a new /etc/pam.d/sudo-i
file. Anarcat fixed this in puppet with a
new sudo-i file that TPA vendors.
If you're running into this issue, check that puppet has deployed the correct file in
/etc/pamd./sudo-i
Pending
-
there's a regression in the bookworm Linux kernel (1036755) which causes crashes in (some?) Haskell programs which should be fixed before we start deploying Ganeti upgrades, in particular
-
Schleuder (and Rails, in general) have issues upgrading between bullseye and bookworm (1038935)
See also the official list of known issues.
grub-pc failures
On some hosts, grub-pc
failed to configure correctly:
Setting up grub-pc (2.06-13) ...
grub-pc: Running grub-install ...
/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk-7f3a5ef1-b522-4726 does not exist, so cannot grub-install to it!
You must correct your GRUB install devices before proceeding:
DEBIAN_FRONTEND=dialog dpkg --configure grub-pc
dpkg --configure -a
dpkg: error processing package grub-pc (--configure):
installed grub-pc package post-installation script subprocess returned error exit status 1
The fix is, as described, to run dpkg --configure grub-pc
and pick
the disk with a partition to install grub on. It's unclear what a
preemptive fix for that is.
NTP configuration to be ported
We have some slight diffs in our Puppet-managed NTP configuration:
Notice: /Stage[main]/Ntp/File[/etc/ntpsec/ntp.conf]/content:
--- /etc/ntpsec/ntp.conf 2023-09-26 14:41:08.648258079 +0000
+++ /tmp/puppet-file20230926-35001-x7hntz 2023-09-26 14:47:56.547991158 +0000
@@ -4,13 +4,13 @@
# /etc/ntp.conf, configuration for ntpd; see ntp.conf(5) for help
-driftfile /var/lib/ntpsec/ntp.drift
+driftfile /var/lib/ntp/ntp.drift
# Leap seconds definition provided by tzdata
leapfile /usr/share/zoneinfo/leap-seconds.list
# Enable this if you want statistics to be logged.
-#statsdir /var/log/ntpsec/
+#statsdir /var/log/ntpstats/
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
Notice: /Stage[main]/Ntp/File[/etc/ntpsec/ntp.conf]/content: content changed '{sha256}c5d627a596de1c67aa26dfbd472a4f07039f4664b1284cf799d4e1eb43c92c80' to '{sha256}18de87983c2f8491852390acc21c466611d6660083b0d0810bb6509470949be3'
Notice: /Stage[main]/Ntp/File[/etc/ntpsec/ntp.conf]/mode: mode changed '0644' to '0444'
Info: /Stage[main]/Ntp/File[/etc/ntpsec/ntp.conf]: Scheduling refresh of Exec[service ntpsec restart]
Info: /Stage[main]/Ntp/File[/etc/ntpsec/ntp.conf]: Scheduling refresh of Exec[service ntpsec restart]
Notice: /Stage[main]/Ntp/File[/etc/default/ntpsec]/content:
--- /etc/default/ntpsec 2023-07-29 20:51:53.000000000 +0000
+++ /tmp/puppet-file20230926-35001-d4tltp 2023-09-26 14:47:56.579990910 +0000
@@ -1,9 +1 @@
-NTPD_OPTS="-g -N"
-
-# Set to "yes" to ignore DHCP servers returned by DHCP.
-IGNORE_DHCP=""
-
-# If you use certbot to obtain a certificate for ntpd, provide its name here.
-# The ntpsec deploy hook for certbot will handle copying and permissioning the
-# certificate and key files.
-NTPSEC_CERTBOT_CERT_NAME=""
+NTPD_OPTS='-g'
Notice: /Stage[main]/Ntp/File[/etc/default/ntpsec]/content: content changed '{sha256}26bcfca8526178fc5e0df1412fbdff120a0d744cfbd023fef7b9369e0885f84b' to '{sha256}1bb4799991836109d4733e4aaa0e1754a1c0fee89df225598319efb83aa4f3b1'
Notice: /Stage[main]/Ntp/File[/etc/default/ntpsec]/mode: mode changed '0644' to '0444'
Info: /Stage[main]/Ntp/File[/etc/default/ntpsec]: Scheduling refresh of Exec[service ntpsec restart]
Info: /Stage[main]/Ntp/File[/etc/default/ntpsec]: Scheduling refresh of Exec[service ntpsec restart]
Notice: /Stage[main]/Ntp/Exec[service ntpsec restart]: Triggered 'refresh' from 4 events
Note that this is a "reverse diff", that is Puppet restoring the old bullseye config, so we should apply the reverse of this in Puppet.
sudo configuration lacks limits.conf?
Just notice this diff on all hosts:
--- /etc/pam.d/sudo 2021-12-14 19:59:20.613496091 +0000
+++ /etc/pam.d/sudo.dpkg-dist 2023-06-27 11:45:00.000000000 +0000
@@ -1,12 +1,8 @@
-##
-## THIS FILE IS UNDER PUPPET CONTROL. DON'T EDIT IT HERE.
-##
#%PAM-1.0
-# use the LDAP-derived password file for sudo access
-auth requisite pam_pwdfile.so pwdfile=/var/lib/misc/thishost/sudo-passwd
+# Set up user limits from /etc/security/limits.conf.
+session required pam_limits.so
-# disable /etc/password for sudo authentication, see #6367
-#@include common-auth
+@include common-auth
@include common-account
@include common-session-noninteractive
Why don't we have pam_limits
setup? Historical oddity? To investigatte.
Resolved
libc configuration failure on skip-upgrade
The alberti upgrade failed with:
/usr/bin/perl: error while loading shared libraries: libcrypt.so.1: cannot open shared object file: No such file
or directory
dpkg: error processing package libc6:amd64 (--configure):
installed libc6:amd64 package post-installation script subprocess returned error exit status 127
Errors were encountered while processing:
libc6:amd64
perl: error while loading shared libraries: libcrypt.so.1: cannot open shared object file: No such file or direct
ory
needrestart is being skipped since dpkg has failed
E: Sub-process /usr/bin/dpkg returned an error code (1)
The solution is:
dpkg -i libc6_2.36-9+deb12u1_amd64.deb libpam0g_1.5.2-6_amd64.deb libcrypt1_1%3a4.4.33-2_amd64.deb
apt install -f
This happened because I mistakenly followed this procedure instead of the bullseye procedure when upgrading it to bullseye, in other words, doing a "skip upgrade", directly upgrading from buster to bookworm, see this ticket for more context.x
Could not enable fstrim.timer
During and after the upgrade to bookworm, this error may be shown during Puppet runs:
Error: Could not enable fstrim.timer
Error: /Stage[main]/Torproject_org/Service[fstrim.timer]/enable: change from 'false' to 'true' failed: Could not enable fstrim.timer: (corrective)
The solution is to run:
rm /etc/systemd/system/fstrim.timer
systemctl reload-daemon
This removes an obsolete symlink which systemd gets annoyed about.
unable to connect via ssh with nitrokey start token
Connecting to, or via, a bookworm server fails when using a Nitrokey Start token:
sign_and_send_pubkey: signing failed for ED25519 "(none)" from agent: agent refused operation
This is caused by an incompatibility introduced in recent versions of OpenSSH.
The fix is to upgrade the token's firmware. Several workarounds are documented in this ticket: https://dev.gnupg.org/T5931
Troubleshooting
Upgrade failures
Instructions on errors during upgrades can be found in the release notes troubleshooting section.
Reboot failures
If there's any trouble during reboots, you should use some recovery system. The release notes actually have good documentation on that, on top of "use a live filesystem".
References
- Official guide (TODO: review)
- Release notes (TODO: review)
- DSA guide (TODO: review)
- anarcat guide (WIP, last sync 2023-04-06)
- Solution proposal to automate this
Fleet-wide changes
The following changes need to be performed once for the entire fleet, generally at the beginning of the upgrade process.
installer changes
The installer need to be changed to support the new release. This includes:
- the Ganeti installers (add a
gnt-instance-debootstrap
variant,modules/profile/manifests/ganeti.pp
intor-puppet.git
, see commit 4d38be42 for an example) - the (deprecated) libvirt installer
(
modules/roles/files/virt/tor-install-VM
, intor-puppet.git
) - the wiki documentation:
- create a new page like this one documenting the process, linked from howto/upgrades
- make an entry in the
data.csv
to start tracking progress (see below), copy theMakefile
as well, changing the suite name - change the Ganeti procedure so that the new suite is used by default
- change the Hetzner robot install procedure
-
fabric-tasks
and the fabric installer (TODO)
Debian archive changes
The Debian archive on db.torproject.org
(currently alberti) need to
have a new suite added. This can be (partly) done by editing files
/srv/db.torproject.org/ftp-archive/
. Specifically, the two following
files need to be changed:
-
apt-ftparchive.config
: a new stanza for the suite, basically copy-pasting from a previous entry and changing the suite -
Makefile
: add the new suite to the for loop
But it is not enough: the directory structure need to be crafted by hand as well. A simple way to do so is to replicate a previous release structure:
cd /srv/db.torproject.org/ftp-archive
rsync -a --include='*/' --exclude='*' archive/dists/bullseye/ archive/dists/bookworm/
Per host progress
Note that per-host upgrade policy is in howto/upgrades.
When a critical mass of servers have been upgraded and only "hard" ones remain, they can be turned into tickets and tracked in GitLab. In the meantime...
A list of servers to upgrade can be obtained with:
curl -s -G http://localhost:8080/pdb/query/v4 --data-urlencode 'query=nodes { facts { name = "lsbdistcodename" and value != "bullseye" }}' | jq .[].certname | sort
Or in Prometheus:
count(node_os_info{version_id!="11"}) by (alias)
Or, by codename, including the codename in the output:
count(node_os_info{version_codename!="bullseye"}) by (alias,version_codename)

The above graphic shows the progress of the migration between major releases. It can be regenerated with the predict-os script. It pulls information from puppet to update a CSV file to keep track of progress over time.
WARNING: the graph may be incorrect or missing as the upgrade procedure ramps up. The following graph will be converted into a Grafana dashboard to fix that, see issue 40512.