... | ... | @@ -211,6 +211,52 @@ This can be done in parallel across clusters: |
|
|
This is also documented in the [howto/ganeti](howto/ganeti) section. Do not
|
|
|
forget to rebalance the cluster after the reboot.
|
|
|
|
|
|
### Rebooting Ganeti guests
|
|
|
|
|
|
If you see this in Nagios:
|
|
|
|
|
|
The following processes have libs linked that were upgraded: ganeti14: qemu-system-x86 (41509): ganeti15: qemu-system-x86 (41081): ganeti8: qemu-system-x86 (22106)
|
|
|
|
|
|
... and the Ganeti node itself doesn't need to be restarted, you can
|
|
|
see a stressful reboot by just migrating the instances between the
|
|
|
nodes. This will restart the `qemu` processes and complete the
|
|
|
upgrade, while imposing minimal (if any) downtime.
|
|
|
|
|
|
The process here is to do a `gnt-node migrate` on all nodes, which
|
|
|
will empty one node at a time. When that is complete, the cluster
|
|
|
needs to be rebalanced. This is not exactly an "idempotent" process:
|
|
|
you might not end up with exactly the same state as you had in the
|
|
|
beginning, even after rebalancing the cluster.
|
|
|
|
|
|
Make sure you run in a screen session, because this process takes
|
|
|
time:
|
|
|
|
|
|
screen
|
|
|
|
|
|
Then, look at the current state of the cluster:
|
|
|
|
|
|
hbal -L -C -v
|
|
|
|
|
|
Take note of the score and the proposed solution, but do not execute
|
|
|
it. This will give you an idea of how good or bad things are after the
|
|
|
migrate.
|
|
|
|
|
|
Then migrate all guests, for example:
|
|
|
|
|
|
for node in chi-node-0{1,2,3,4}; do gnt-node migrate -f $node; done
|
|
|
|
|
|
Once that is done, all the warnings should be gone from Nagios.
|
|
|
|
|
|
Then rebalance the cluster:
|
|
|
|
|
|
hbal -L -C -v --no-disk-moves
|
|
|
|
|
|
Note that we use `--no-disk-moves` to try to keep the solver from
|
|
|
moving actual disks. Since the `migrate` task above shouldn't have
|
|
|
moved any disk, it should be able to find a solution with a score
|
|
|
similar than the one we started with, without moving disks (which is
|
|
|
an even slower operation).
|
|
|
|
|
|
### Remaining nodes
|
|
|
|
|
|
The scaleway box needs special handholding, see [ticket 32920](https://bugs.torproject.org/32920). The
|
... | ... | |