anarcat · 3b811dcf
--- a/howto/upgrades.md
+++ b/howto/upgrades.md
@@ -211,6 +211,52 @@ This can be done in parallel across clusters:
 This is also documented in the [howto/ganeti](howto/ganeti) section. Do not
 forget to rebalance the cluster after the reboot.

+### Rebooting Ganeti guests
+
+If you see this in Nagios:
+
+    The following processes have libs linked that were upgraded: ganeti14: qemu-system-x86 (41509): ganeti15: qemu-system-x86 (41081): ganeti8: qemu-system-x86 (22106)
+
+... and the Ganeti node itself doesn't need to be restarted, you can
+see a stressful reboot by just migrating the instances between the
+nodes. This will restart the `qemu` processes and complete the
+upgrade, while imposing minimal (if any) downtime.
+
+The process here is to do a `gnt-node migrate` on all nodes, which
+will empty one node at a time. When that is complete, the cluster
+needs to be rebalanced. This is not exactly an "idempotent" process:
+you might not end up with exactly the same state as you had in the
+beginning, even after rebalancing the cluster.
+
+Make sure you run in a screen session, because this process takes
+time:
+
+    screen
+
+Then, look at the current state of the cluster:
+
+    hbal -L -C -v
+
+Take note of the score and the proposed solution, but do not execute
+it. This will give you an idea of how good or bad things are after the
+migrate.
+
+Then migrate all guests, for example:
+
+    for node in chi-node-0{1,2,3,4}; do gnt-node migrate -f $node; done
+
+Once that is done, all the warnings should be gone from Nagios.
+
+Then rebalance the cluster:
+
+    hbal -L -C -v --no-disk-moves
+
+Note that we use `--no-disk-moves` to try to keep the solver from
+moving actual disks. Since the `migrate` task above shouldn't have
+moved any disk, it should be able to find a solution with a score
+similar than the one we started with, without moving disks (which is
+an even slower operation).
+
 ### Remaining nodes

 The scaleway box needs special handholding, see [ticket 32920](https://bugs.torproject.org/32920). The