Verified Commit c8b162d1 authored by anarcat's avatar anarcat
Browse files

Revert "try to unbreak display from the web interface"

This reverts commit fdb4264d and
commit b8483506.

This is so obviously wrong, I don't even know where to begin. Plus it
still doesn't unbreak GitLab. Ugh.
parent b8483506
Loading
Loading
Loading
Loading
+277 −225
Original line number Diff line number Diff line
Debian 11 [bullseye](https://wiki.debian.org/DebianBullseye) was [released on August 14 2021](https://www.debian.org/News/2021/20210814)). Tor started the upgrade to bullseye shortly after and hopes to complete the process before the [buster](howto/upgrades/buster) EOL, [one year after the stable release](https://www.debian.org/security/faq#lifespan), so normally around August 2022.
Debian 11 [bullseye](https://wiki.debian.org/DebianBullseye) was [released on August 14 2021](https://www.debian.org/News/2021/20210814)). Tor
started the upgrade to bullseye shortly after and hopes to complete
the process before the [buster](howto/upgrades/buster) EOL, [one year after the stable
release](https://www.debian.org/security/faq#lifespan), so normally around August 2022.

It is an aggressive timeline, which might be missed.

Starting from now on however, no new Debian 10 buster machine will be created: all new machines will run Debian 11 bullseye.

This page aims at documenting the upgrade procedure, known problems and upgrade progress of the fleet.

* [Procedure](#procedure)
* [Service-specific upgrade procedures](#service-specific-upgrade-procedures)
  * [RT and PostgreSQL upgrades](#rt-and-postgresql-upgrades)
* [Notable changes](#notable-changes)
  * [New packages](#new-packages)
  * [Updated packages](#updated-packages)
  * [Removed packages](#removed-packages)
  * [Deprecation notices](#deprecation-notices)
    * [usrmerge](#usrmerge)
    * [slapd](#slapd)
    * [apt-key](#apt-key)
* [Issues](#issues)
  * [Pending](#pending)
  * [Resolved](#resolved)
    * [tor-nagios-checks tempfile](#tor-nagios-checks-tempfile)
* [Troubleshooting](#troubleshooting)
  * [Upgrade failures](#upgrade-failures)
  * [Reboot failures](#reboot-failures)
* [References](#references)
* [Fleet-wide changes](#fleet-wide-changes)
  * [installer changes](#installer-changes)
  * [Debian archive changes](#debian-archive-changes)
* [Per host progress](#per-host-progress)
Starting from now on however, no new Debian 10 buster machine will be
created: all new machines will run Debian 11 bullseye.

# Procedure

This procedure is designed to be applied, in batch, on multiple servers. Do NOT follow this procedure unless you are familiar with the command line and the Debian upgrade process. It has been crafted by and for experienced system administrators that have dozens if not hundreds of servers to upgrade.
This page aims at documenting the upgrade procedure, known problems
and upgrade progress of the fleet.

In particular, it runs almost completely unattended: configuration changes are not prompted during the upgrade, and just not applied at all, which _will_ break services in many cases. We use a [clean-conflicts](https://gitlab.com/anarcat/koumbit-scripts/-/blob/master/vps/clean_conflicts) script to do this all in one shot to shorten the upgrade process (without it, configuration file changes stop the upgrade at more or less random times). Then those changes get applied after a reboot. And yes, that's even more dangerous.
[[_TOC_]]

IMPORTANT: if you are doing this procedure over SSH (I had the privilege of having a console), you may want to [upgrade SSH first](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-information.en.html#ssh-not-available) as it has a longer downtime period, especially if you are on a flaky connection.
# Procedure

WARNING: this procedure has not been tested yet on TPA infrastructure. It's a merge of the TPA buster procedure and anarcat's bullseye procedure.
This procedure is designed to be applied, in batch, on multiple
servers. Do NOT follow this procedure unless you are familiar with the
command line and the Debian upgrade process. It has been crafted by
and for experienced system administrators that have dozens if not
hundreds of servers to upgrade.

In particular, it runs almost completely unattended: configuration
changes are not prompted during the upgrade, and just not applied at
all, which *will* break services in many cases. We use a
[clean-conflicts](https://gitlab.com/anarcat/koumbit-scripts/-/blob/master/vps/clean_conflicts) script to do this all in one shot to shorten the
upgrade process (without it, configuration file changes stop the
upgrade at more or less random times). Then those changes get applied
after a reboot. And yes, that's even more dangerous.

IMPORTANT: if you are doing this procedure over SSH (I had the
privilege of having a console), you may want to [upgrade SSH first](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-information.en.html#ssh-not-available)
as it has a longer downtime period, especially if you are on a flaky
connection.

WARNING: this procedure has not been tested yet on TPA
infrastructure. It's a merge of the TPA buster procedure and anarcat's
bullseye procedure.

 1. Preparation:

   ```plaintext
        : reset to the default locale
        export LC_ALL=C.UTF-8 &&
        sudo apt install ttyrec screen debconf-utils apt-show-versions deborphan apt-forktracer &&
        sudo ttyrec -e screen /var/log/upgrade-bullseye.ttyrec
   ```

 2. Backups and checks:

   ```plaintext
        ( 
          umask 0077 &&
          tar cfz /var/backups/pre-bullseye-backup.tgz /etc /var/lib/dpkg /var/lib/apt/extended_states /var/cache/debconf $( [ -e /var/lib/aptitude/pkgstates ] && echo /var/lib/aptitude/pkgstates ) &&
@@ -66,20 +61,18 @@ WARNING: this procedure has not been tested yet on TPA infrastructure. It's a me
        find /etc -name '*.dpkg-*' -o -name '*.ucf-*' -o -name '*.merge-error' &&
        : make sure backups are up to date in Nagios &&
        printf "End of Step 2\a\n"
   ```

 3. Enable module loading (for ferm) and test reboots:

   ```plaintext
        systemctl disable modules_disabled.timer &&
        puppet agent --disable "running major upgrade" &&
        shutdown -r +1 "rebooting with module loading enabled"

        export LC_ALL=C.UTF-8 &&
        sudo ttyrec -a -e screen /var/log/upgrade-buster.ttyrec
   ```

 4. Perform any pending upgrade and clear out old pins:

   ```plaintext
        : Check for pinned, on hold, packages, and possibly disable &&
        rm -f /etc/apt/preferences /etc/apt/preferences.d/* &&
        rm -f /etc/apt/sources.list.d/backports.debian.org.list &&
@@ -98,10 +91,10 @@ WARNING: this procedure has not been tested yet on TPA infrastructure. It's a me
        : if possible, switch to official packages by disabling third-party repositories &&
        apt-forktracer | sort &&
        printf "End of Step 4\a\n"
   ```
5. Check free space (see [this guide to free up space](http://www.debian.org/releases/buster/amd64/release-notes/ch-upgrading.en.html#sufficient-space)), disable auto-upgrades, and download packages:

   ```plaintext
 5. Check free space (see [this guide to free up space][]), disable
    auto-upgrades, and download packages:

        systemctl stop apt-daily.timer &&
        sed -i 's#buster/updates#bullseye-security#' /etc/apt/sources.list $(ls /etc/apt/sources.list.d/*) &&
        sed -i 's/buster/bullseye/g' /etc/apt/sources.list $(ls /etc/apt/sources.list.d/*) &&
@@ -112,18 +105,16 @@ WARNING: this procedure has not been tested yet on TPA infrastructure. It's a me
        apt -y -d upgrade &&
        apt -y -d dist-upgrade &&
        printf "End of Step 5\a\n"
   ```

 6. Actual upgrade run:

   ```plaintext
        DEBIAN_FRONTEND=noninteractive APT_LISTCHANGES_FRONTEND=none APT_LISTBUGS_FRONTEND=none && apt full-upgrade -y -o Dpkg::Options::='--force-confdef' -o Dpkg::Options::='--force-confold' &&
        printf "\a" &&
        /home/anarcat/src/koumbit-scripts/vps/clean_conflicts &&
        printf "End of Step 6\a\n"
   ```

 7. Post-upgrade procedures:

   ```plaintext
        apt-get update --allow-releaseinfo-change &&
        puppet agent --enable &&
        puppet agent -t --noop &&
@@ -132,10 +123,9 @@ WARNING: this procedure has not been tested yet on TPA infrastructure. It's a me
        systemctl start apt-daily.timer &&
        printf "End of Step 7\a\n" &&
        shutdown -r +1 "rebooting to get rid of old kernel image..."
   ```

 8. Post-upgrade checks:

   ```plaintext
        export LC_ALL=C.UTF-8 &&
        sudo ttyrec -a -e screen /var/log/upgrade-buster.ttyrec

@@ -152,30 +142,43 @@ WARNING: this procedure has not been tested yet on TPA infrastructure. It's a me
        apt-forktracer | sort
        printf "End of Step 8\a\n"
        shutdown -r +1 "testing reboots one final time"
   ```
9. Change the hostgroup of the host to bullseye in Nagios (in `tor-nagios/config/nagios-master.cfg` on `git@git-rw.tpo`)

 9. Change the hostgroup of the host to bullseye in Nagios (in
    `tor-nagios/config/nagios-master.cfg` on `git@git-rw.tpo`)

[this guide to free up space]: http://www.debian.org/releases/buster/amd64/release-notes/ch-upgrading.en.html#sufficient-space

# Service-specific upgrade procedures

## RT and PostgreSQL upgrades

Both of those required special handling in [buster](howto/buster), probably going to be similar here.
Both of those required special handling in [buster](howto/buster), probably going
to be similar here.

# Notable changes

Here is a list of notable changes from a system administration perspective:
Here is a list of notable changes from a system administration
perspective:

 * new: [driverless scanning and printing](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-whats-new.en.html#driverless-operation)
* persistent systemd journal, which might have some privacy issues (`rm -rf /var/log/journal` to disable, see [journald.conf(5)](https://manpages.debian.org/bullseye/systemd/journald.conf.5.en.html))
 * persistent systemd journal, which might have some privacy issues
   (`rm -rf /var/log/journal` to disable, see [journald.conf(5)](https://manpages.debian.org/bullseye/systemd/journald.conf.5.en.html))
 * last release to support non-merged /usr
 * security archive changed to `deb https://deb.debian.org/debian-security bullseye-security main contrib` (covered by script above)
* [password hashes have changed](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-information.en.html#pam-default-password) to [yescrypt](https://www.openwall.com/yescrypt/) (recognizable from its `$y$` prefix), a major change from the previous default, SHA-512 (recognizable from its `$6$` prefix), see also [crypt(5)](https://manpages.debian.org/crypt.5) (in bullseye), [crypt(3)](https://manpages.debian.org/crypt.3) (in buster), and `mkpasswd -m help` for a list of supported hashes on whatever
 * [password hashes have changed](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-information.en.html#pam-default-password) to [yescrypt](https://www.openwall.com/yescrypt/) (recognizable
   from its `$y$` prefix), a major change from the previous default,
   SHA-512 (recognizable from its `$6$` prefix), see also
   [crypt(5)](https://manpages.debian.org/crypt.5) (in bullseye), [crypt(3)](https://manpages.debian.org/crypt.3) (in buster), and
   `mkpasswd -m help` for a list of supported hashes on whatever

There is a more [exhaustive review of server-level changes from mikas](https://michael-prokop.at/blog/2021/05/27/what-to-expect-from-debian-bullseye-newinbullseye/) as well. Notable:
There is a more [exhaustive review of server-level changes from
mikas](https://michael-prokop.at/blog/2021/05/27/what-to-expect-from-debian-bullseye-newinbullseye/) as well. Notable:

* `kernel.unprivileged_userns_clone` enabled by default ([bug 898446](https://bugs.debian.org/898446))
 * `kernel.unprivileged_userns_clone` enabled by default ([bug
   898446](https://bugs.debian.org/898446))
 * Prometheus [hardering](https://salsa.debian.org/go-team/packages/prometheus/-/commit/62017e7de3f9e5ae02bc842cabd3b2da69fb354f), initiated by anarcat
* Ganeti has a major upgrade! there were concerns about the upgrade path, not sure how that turned out
 * Ganeti has a major upgrade! there were concerns about the upgrade
   path, not sure how that turned out

## New packages

@@ -184,21 +187,29 @@ There is a more [exhaustive review of server-level changes from mikas](https://m
## Updated packages

This table summarizes package version changes I find interesting.

| Package     | Buster | Bullseye | Notes                                                                                                                   |
|---------|--------|----------|-------|
|-------------|--------|----------|-------------------------------------------------------------------------------------------------------------------------|
| Docker      | 18     | 20       | Docker made it for a second release                                                                                     |
| Emacs | 26 | 27 | JSON parsing for LSP? \~/.config/emacs/? harfbuzz?? oh my! [details](https://emacsredux.com/blog/2020/08/13/emacs-27-1/) |
| Emacs       | 26     | 27       | JSON parsing for LSP? ~/.config/emacs/? harfbuzz?? oh my! [details](https://emacsredux.com/blog/2020/08/13/emacs-27-1/) |
| Ganeti      | 2.16.0 | 3.0.1    | breaking upgrade?                                                                                                       |
| OpenSSH | 7.9 | 8.4 | [FIDO/U2F, Include](http://www.openssh.com/txt/release-8.2), [signatures](http://www.openssh.com/txt/release-8.1), [quantum-resistant key exchange, key fingerprint as confirmation](http://www.openssh.com/txt/release-8.0) |
| OpenSSH     | 7.9    | 8.4      | [FIDO/U2F, Include][8.2], [signatures][8.1], [quantum-resistant key exchange, key fingerprint as confirmation][8.0]     |
| Postgresql  | 11     | 13       |                                                                                                                         |
| Python      | 3.7    | 3.9      | walrus operator, importlib.metadata, dict unions, zoneinfo                                                              |
| Puppet      | 5.5    | 5.5      | Missed the Puppet 6 (and 7!) releases                                                                                   |

Note that this table may not be up to date with the current bullseye release. See the [official release notes](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-whats-new.en.html#newdistro) for a more up to date list.
[8.0]: http://www.openssh.com/txt/release-8.0
[8.1]: http://www.openssh.com/txt/release-8.1
[8.2]: http://www.openssh.com/txt/release-8.2

Note that this table may not be up to date with the current bullseye
release. See the [official release notes](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-whats-new.en.html#newdistro) for a more up to date
list.

## Removed packages

* Python 2 support is removed! hopefully most of my stuff is already Python 3, but I did lose monkeysign and gameclock, as mentioned above
 * Python 2 support is removed! hopefully most of my stuff is already
   Python 3, but I did lose monkeysign and gameclock, as mentioned above
 * Mailman 2 is consequently removed

See also the [noteworthy obsolete packages](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-information.en.html#noteworthy-obsolete-packages) list.
@@ -207,17 +218,25 @@ See also the [noteworthy obsolete packages](https://www.debian.org/releases/bull

### usrmerge

It might be important to install `usrmerge` package as well, considering that [merged /usr will be the default in bullseye + 1](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978636#178). This, however, can be done _after_ the upgrade but needs to be done _before_ the next major upgrade (Debian 12, bookworm).
It might be important to install `usrmerge` package as well,
considering that [merged /usr will be the default in bullseye +
1](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978636#178). This, however, can be done *after* the upgrade but needs to be
done *before* the next major upgrade (Debian 12, bookworm).

The (bullseye) installers should be tweaked to remove the `--no-merged-usr` everywhere, in any case. See [ticket 40367](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40367).
The (bullseye) installers should be tweaked to remove the
`--no-merged-usr` everywhere, in any case. See [ticket 40367](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40367).

### slapd

OpenLDAP dropped support for all backends but [slapd-mdb](https://manpages.debian.org//bullseye/slapd/slapd-mdb.5.html). This will require a migration on the LDAP server.
OpenLDAP dropped support for all backends but [slapd-mdb](https://manpages.debian.org//bullseye/slapd/slapd-mdb.5.html). This
will require a migration on the LDAP server.

### apt-key

The `apt-key` command is deprecated and should not be used. Files should be dropped in `/etc/apt/trusted.gpg.d` or (preferably) into an outside directory (we typically use `/usr/share/keyrings`). It is believed that we already do the correct thing here.
The `apt-key` command is deprecated and should not be used. Files
should be dropped in ` /etc/apt/trusted.gpg.d` or (preferably) into an
outside directory (we typically use `/usr/share/keyrings`). It is
believed that we already do the correct thing here.

# Issues

@@ -226,30 +245,43 @@ See also the official list of [known issues](https://www.debian.org/releases/bul
## Pending

 * (from buster:) upgrading restarts openvswitch will mean all guests lose network
* (from buster:) Puppet might try to downgrade the `sources.list` files to `stretch` or `n/a` for some reason, just re-run Puppet after fixing the `sources.list` files, it will eventually figure it out.
* (from buster:) a bunch of config files from Puppet had conflicts needing to be resolved and that we should really, really finish upgrading in Puppet now

 * (from buster:) Puppet might try to downgrade the `sources.list`
   files to `stretch` or `n/a` for some reason, just re-run Puppet
   after fixing the `sources.list` files, it will eventually figure it
   out.

 * (from buster:) a bunch of config files from Puppet had conflicts
   needing to be resolved and that we should really, really finish
   upgrading in Puppet now

 * (from buster:) ferm fails to reload during upgrade, with the following error:
 
  ```plaintext
        Couldn't load match `state':No such file or directory
  ```
* The official list of [known issues](https://www.debian.org/releases/buster/amd64/release-notes/ch-information.en.html)

 * The official list of [known issues][]

[known issues]: https://www.debian.org/releases/buster/amd64/release-notes/ch-information.en.html

## Resolved

### tor-nagios-checks tempfile

[this patch](https://gitweb.torproject.org/admin/tor-nagios.git/commit/?id=661d0dbb00a2876da21f3c66c18d8eb8f8cae790) was necessary to port from `tempfile` to `mktemp` in that TPA-specific Debian package.
[this patch](https://gitweb.torproject.org/admin/tor-nagios.git/commit/?id=661d0dbb00a2876da21f3c66c18d8eb8f8cae790) was necessary to port from `tempfile` to `mktemp` in
that TPA-specific Debian package.

# Troubleshooting

## Upgrade failures

Instructions on errors during upgrades can be found in [the release notes troubleshooting section](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-upgrading.en.html#trouble).
Instructions on errors during upgrades can be found in [the release
notes troubleshooting section](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-upgrading.en.html#trouble).

## Reboot failures

If there's any trouble during reboots, you should use some recovery system. The [release notes actually have good documentation on that](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-upgrading.en.html#recovery), on top of "use a live filesystem".
If there's any trouble during reboots, you should use some recovery
system. The [release notes actually have good documentation on
that](https://www.debian.org/releases/bullseye/amd64/release-notes/ch-upgrading.en.html#recovery), on top of "use a live filesystem".

# References

@@ -262,49 +294,69 @@ If there's any trouble during reboots, you should use some recovery system. The

# Fleet-wide changes

The following changes need to be performed _once_ for the entire fleet, generally at the beginning of the upgrade process.
The following changes need to be performed *once* for the entire
fleet, generally at the beginning of the upgrade process.

## installer changes

The installer need to be changed to support the new release. This includes:
The installer need to be changed to support the new release. This
includes:

* the Ganeti installers (add a `gnt-instance-debootstrap` variant, `modules/profile/manifests/ganeti.pp` in `tor-puppet.git`, see commit 4d38be42 for an example)
* the (deprecated) libvirt installer (`modules/roles/files/virt/tor-install-VM`, in `tor-puppet.git`)
 * the Ganeti installers (add a `gnt-instance-debootstrap` variant,
   `modules/profile/manifests/ganeti.pp` in `tor-puppet.git`, see
   commit 4d38be42 for an example)
 * the (deprecated) libvirt installer
   (`modules/roles/files/virt/tor-install-VM`, in `tor-puppet.git`)
 * the wiki documentation:
  * create a new page like this one documenting the process, linked from [howto/upgrades](howto/upgrades)
  * make an entry in the `data.csv` to start tracking progress (see below), copy the `Makefile` as well, changing the suite name
  * change the [Ganeti procedure](howto/ganeti#adding-a-new-instance) so that the new suite is used by default
   * create a new page like this one documenting the process, linked
     from [howto/upgrades](howto/upgrades)
   * make an entry in the `data.csv` to start tracking progress (see
     below), copy the `Makefile` as well, changing the suite name
   * change the [Ganeti procedure](howto/ganeti#adding-a-new-instance) so that the new suite is used by
     default
   * change the [Hetzner robot](howto/new-machine-hetzner-robot) install procedure
 * `tsa-misc` and the fabric installer (TODO)

## Debian archive changes

The Debian archive on `db.torproject.org` (currently alberti) need to have a new suite added. This can be (partly) done by editing files `/srv/db.torproject.org/ftp-archive/`. Specifically, the two following files need to be changed:
The Debian archive on `db.torproject.org` (currently alberti) need to
have a new suite added. This can be (partly) done by editing files
`/srv/db.torproject.org/ftp-archive/`. Specifically, the two following
files need to be changed:

* `apt-ftparchive.config`: a new stanza for the suite, basically copy-pasting from a previous entry and changing the suite
 * `apt-ftparchive.config`: a new stanza for the suite, basically
   copy-pasting from a previous entry and changing the suite
 * `Makefile`: add the new suite to the for loop

But it is not enough: the directory structure need to be crafted by hand as well. A simple way to do so is to replicate a previous release structure:
But it is not enough: the directory structure need to be crafted by
hand as well. A simple way to do so is to replicate a previous release
structure:

```plaintext
    cd /srv/db.torproject.org/ftp-archive
    rsync -a --include='*/' --exclude='*' archive/dists/buster/  archive/dists/bullseye/
```

# Per host progress

When a critical mass of servers have been upgraded and only "hard" ones remain, they can be turned into tickets and tracked in GitLab. In the meantime...
When a critical mass of servers have been upgraded and only "hard"
ones remain, they can be turned into tickets and tracked in GitLab. In
the meantime...

A list of servers to upgrade can be obtained with:

```plaintext
    curl -s -G http://localhost:8080/pdb/query/v4 --data-urlencode 'query=nodes { facts { name = "lsbdistcodename" and value != "bullseye" }}' | jq .[].certname | sort
```

Policy established in [howto/upgrades](howto/upgrades).

![graph showing planned completion date, currently around September 2020](/howto/upgrades/bullseye/predict.png)
<figure>
<img alt="graph showing planned completion date, currently around September 2020" src="/howto/upgrades/bullseye/predict.png" />
<figcaption>

The above graphic shows the progress of the migration between major releases. It can be regenerated with the [predict-os](https://gitlab.com/anarcat/predict-os) script. It pulls information from [howto/puppet](howto/puppet) to update a [CSV file](data.csv) to keep track of progress over time.
The above graphic shows the progress of the migration between major
releases. It can be regenerated with the [predict-os](https://gitlab.com/anarcat/predict-os) script. It
pulls information from [howto/puppet](howto/puppet) to update a [CSV file](data.csv) to
keep track of progress over time.

WARNING: the graph may be incorrect or missing as the upgrade procedure ramps up.
 No newline at end of file
WARNING: the graph may be incorrect or missing as the upgrade
procedure ramps up.
</figcaption>
</figure>