Skip to content

automate reboots

in legacy/trac#31957 (moved) we have worked on automating upgrades, but that's only part of the problem. we also need to reboot in some situations.

we have various mechanisms to do so right now:

  • tsa-misc/reboot-host - reboot script for kvm boxes, kind of a mess, to be removed when we finish the kvm-ganeti migration
  • tsa-misc/reboot-guest - reboot a single host. kind of a hack, but useful to reboot a single machine
  • misc/multi-tool/torproject-reboot-simple - iterate over all hosts with rebootPolicy=justdoit in LDAP and reboot them with torproject-reboot-many
  • misc/multi-tool/torproject-reboot-rotation - iterate over all hosts with rebootPolicy=rotation in LDAP and reboot them with torproject-reboot-many, with a 30 minute delay between each host
  • ganeti-reboot-cluster - a tool to reboot the ganeti cluster

There are various problems with all this:

  • the torproject-reboot-* scripts do not take care of rebootPolicy=manual hosts replaced with fabric
  • the ganeti-reboot-cluster script has been known to fail if a cluster is unbalanced the fabric script performs better
  • the ganeti-reboot-cluster script currently fails when hosts talk to each other over IPv6 somehow (see legacy/trac#33412 (moved)) have not witnessed this in the fabric script
  • we have 5 different ways of performing reboots, we should have just one script that does it all fixed in fabric
  • reboot-{host,guest} do not check if hosts need reboot before rebooting (but the multi-tool does) fixed in fabric

In short, this is kind of a mess, and we should refactor this. We should consider using needrestart, which knows how to reboot individual hosts.

I also added a feature request to the needrestart puppet module to expose its knowledge as a puppet fact, so we can use that information from PuppetDB instead of SSH'ing in each host and calling the dsa-* tools.

Edited by anarcat
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information