Move up example for ganeti- + non-ganeti- reboots for full fleet authored by lelutin's avatar lelutin
This is the most likely scenario so it should appear first in the
documentation.

I've also modified the example command for running non-ganeti reboots so
that it directly launches the reboots and it automatically excludes
ganeti nodes.
......@@ -378,6 +378,32 @@ You can see the list of pending reboots with this Fabric task:
See below for how to handle specific situations.
## Full fleet reboot
This is the most likely scenario, especially when we were able to upgrade all of
the servers to the same, stable, release of debian.
In this case, the faster way to run reboots is to reboot ganeti nodes with all
of their contained instances in order to clear out reboots for many servers at
once, then reboot the hosts that are not in ganeti.
### Rebooting Ganeti nodes
See the [Ganeti reboot procedures](howto/ganeti#rebooting) for this procedure.
### Remaining nodes
The [Karma alert
dashboard](https://karma.torproject.org/?q=%40state%3Dactive&q=alertname%3DNeedsReboot)
will show remaining hosts that might have been missed by the above procedure.
But if you want to run more upgrades in parallels and are doing a
fleet-wide reboot, while running the Ganeti reboots (above), you can
perform reboots on the hosts _not_ on Ganeti cluster by pulling the
list of hosts from LDAP:
fab -H $(ssh db.torproject.org 'ldapsearch -H ldap://db.torproject.org -x -ZZ -b "ou=hosts,dc=torproject,dc=org" "(!(physicalHost=gnt-*))" hostname' | sed -n '/hostname/{s/hostname: //;p}' | grep -v ".*-node-[0-9]\+\|^#" | paste -sd ',') fleet.reboot-host
## Rebooting a single host
If this is only a virtual machine, and the only one affected, it can
......@@ -454,25 +480,6 @@ And this is the list of all *physical* hosts with a pending upgrade, alphabetica
fab -H $(ssh puppetdb-01.torproject.org "curl -s -G http://localhost:8080/pdb/query/v4 --data-urlencode 'query=inventory[certname] { facts.apt_reboot_required = true and facts.virtual = \"physical\" }'" | jq -r '.[].certname' | sort | paste -sd ',')
## Rebooting Ganeti nodes
See the [Ganeti reboot procedures](howto/ganeti#rebooting) for this procedure.
## Remaining nodes
The [Nagios unhandled problems](https://nagios.torproject.org/cgi-bin/icinga/status.cgi?allunhandledproblems) will show remaining hosts that
might have been missed by the above procedure.
But if you want to run more upgrades in parallels and are doing a
fleet-wide reboot, while running the Ganeti reboots (above), you can
perform reboots on the hosts *not* on Ganeti cluster by pulling the
list of hosts from LDAP:
ldapsearch -H ldap://db.torproject.org -x -ZZ -b "ou=hosts,dc=torproject,dc=org" '(!(physicalHost=gnt-*))' hostname | sed -n '/hostname/{s/hostname: //;p}' | sort
... and then pick the hosts judiciously to avoid overlapping with
hosts in the same rotation currently rebooting in Ganeti.
## Userland reboots
systemd 254 (Debian 13 trixie and above) has a special command:
......
......