add another pager playbook (for #41770) authored by anarcat's avatar anarcat
This page really is a mess now, ugh.
......@@ -103,6 +103,68 @@ Look at the list of packages to be upgraded, and consider upgrading
them manually, with Cumin (see below), or individually, by logging
into the host over SSH directly.
## Out of date package lists
The `AptUpdateLagging` looks like this:
Package lists on test.torproject.org are out of date
It means that `apt-get update` has not ran recently enough. This could
be an issue with the mirrors, some attacker blocking updates, or more
likely a misconfiguration error of some sort.
You can reproduce the issue by running, by hand, the textfile
collector responsible for this metrics:
/usr/share/prometheus-node-exporter-collectors/apt_info.py
Example:
root@perdulce:~# /usr/share/prometheus-node-exporter-collectors/apt_info.py
# HELP apt_upgrades_pending Apt packages pending updates by origin.
# TYPE apt_upgrades_pending gauge
apt_upgrades_pending{origin="",arch=""} 0
# HELP apt_upgrades_held Apt packages pending updates but held back.
# TYPE apt_upgrades_held gauge
apt_upgrades_held{origin="",arch=""} 0
# HELP apt_autoremove_pending Apt packages pending autoremoval.
# TYPE apt_autoremove_pending gauge
apt_autoremove_pending 21
# HELP apt_package_cache_timestamp_seconds Apt update last run time.
# TYPE apt_package_cache_timestamp_seconds gauge
apt_package_cache_timestamp_seconds 1727313209.2261558
# HELP node_reboot_required Node reboot is required for software updates.
# TYPE node_reboot_required gauge
node_reboot_required 0
The `apt_package_cache_timestamp_seconds` is the one triggering the
alert. It's the number of seconds since "epoch", compare it to the
output of `date +%s`.
Try to run `apt update` by hand to see if it fixes the issue:
apt update
/usr/share/prometheus-node-exporter-collectors/apt_info.py | grep timestamp
If it does, it means a cron job is missing. Normally, unattended
upgrades should update the package list regularly, check if the
service timer is properly configured:
systemctl status apt-daily.timer
You can see the latest output of that job with:
journalctl -e -u apt-daily.service
Normally, the package lists are updated automatically by that job, if
the `APT::Periodic::Update-Package-Lists` setting (typically in
`/etc/apt/apt.conf.d/10periodic`, but it could be elsewhere in
`/etc/apt/apt.conf.d`) is set to 1.
Before the transition to Prometheus, NRPE checks were also running
updates on package lists, it's possible the retirement might have
broken this, see also [#41770](https://gitlab.torproject.org/tpo/tpa/team/-/issues/41770).
## Manual upgrades with Cumin
It's also possible to do a manual mass-upgrade run with
......
......