add another pager playbook (for #41770) authored by anarcat's avatar anarcat
This page really is a mess now, ugh.
...@@ -103,6 +103,68 @@ Look at the list of packages to be upgraded, and consider upgrading ...@@ -103,6 +103,68 @@ Look at the list of packages to be upgraded, and consider upgrading
them manually, with Cumin (see below), or individually, by logging them manually, with Cumin (see below), or individually, by logging
into the host over SSH directly. into the host over SSH directly.
## Out of date package lists
The `AptUpdateLagging` looks like this:
Package lists on test.torproject.org are out of date
It means that `apt-get update` has not ran recently enough. This could
be an issue with the mirrors, some attacker blocking updates, or more
likely a misconfiguration error of some sort.
You can reproduce the issue by running, by hand, the textfile
collector responsible for this metrics:
/usr/share/prometheus-node-exporter-collectors/apt_info.py
Example:
root@perdulce:~# /usr/share/prometheus-node-exporter-collectors/apt_info.py
# HELP apt_upgrades_pending Apt packages pending updates by origin.
# TYPE apt_upgrades_pending gauge
apt_upgrades_pending{origin="",arch=""} 0
# HELP apt_upgrades_held Apt packages pending updates but held back.
# TYPE apt_upgrades_held gauge
apt_upgrades_held{origin="",arch=""} 0
# HELP apt_autoremove_pending Apt packages pending autoremoval.
# TYPE apt_autoremove_pending gauge
apt_autoremove_pending 21
# HELP apt_package_cache_timestamp_seconds Apt update last run time.
# TYPE apt_package_cache_timestamp_seconds gauge
apt_package_cache_timestamp_seconds 1727313209.2261558
# HELP node_reboot_required Node reboot is required for software updates.
# TYPE node_reboot_required gauge
node_reboot_required 0
The `apt_package_cache_timestamp_seconds` is the one triggering the
alert. It's the number of seconds since "epoch", compare it to the
output of `date +%s`.
Try to run `apt update` by hand to see if it fixes the issue:
apt update
/usr/share/prometheus-node-exporter-collectors/apt_info.py | grep timestamp
If it does, it means a cron job is missing. Normally, unattended
upgrades should update the package list regularly, check if the
service timer is properly configured:
systemctl status apt-daily.timer
You can see the latest output of that job with:
journalctl -e -u apt-daily.service
Normally, the package lists are updated automatically by that job, if
the `APT::Periodic::Update-Package-Lists` setting (typically in
`/etc/apt/apt.conf.d/10periodic`, but it could be elsewhere in
`/etc/apt/apt.conf.d`) is set to 1.
Before the transition to Prometheus, NRPE checks were also running
updates on package lists, it's possible the retirement might have
broken this, see also [#41770](https://gitlab.torproject.org/tpo/tpa/team/-/issues/41770).
## Manual upgrades with Cumin ## Manual upgrades with Cumin
It's also possible to do a manual mass-upgrade run with It's also possible to do a manual mass-upgrade run with
... ...
......