document blocked upgrades more directly (#41671) authored by anarcat's avatar anarcat
The previous runbook wasn't directly mentioning the alert and might
have been a little jarring.

Now that we have a magic command to dump the packages pending upgrade,
use it!
......@@ -72,6 +72,37 @@ that new `sources.list` entries be paired with a "pin" (see
[apt_preferences(5)](https://manpages.debian.org/apt_preferences.5)). See also [tpo/tpa/team#40771](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40771) for a
discussion and rationale of that change.
## Blocked upgrades
<!-- note that this section is cross-referenced from the -->
<!-- PackagesPendingTooLong alert in prometheus-alerts.git change the -->
<!-- link target there if you change the heading here. -->
If you receive an alert like:
Packages pending on test.example.com for a week
It's because unattended upgrades have failed to upgrade packages on
the given host for over a week, which is a sign that the upgrade
failed or, more likely, the package is not allowed to upgrade
automatically.
The list of affected hosts and packages can be inspected with the
following [fabric](howto/fabric) command:
fab -H pauli.torproject.org host.all-pending-upgrades
Note that this will *also* catch hosts that have pending upgrade that
*may* be upgraded automatically by unattended-upgrades, as it doesn't
check for alerts, but for the metric directly. You can use the
`--query` parameter to restrict to the alerting hosts instead:
fab -H pauli.torproject.org host.all-pending-upgrades --query='ALERTS{alertname="PackagesPendingTooLong",alertstate="firing"}'
Look at the list of packages to be upgraded, and consider upgrading
them manually, with Cumin (see below), or individually, by logging
into the host over SSH directly.
## Manual upgrades with Cumin
It's also possible to do a manual mass-upgrade run with
......
......