Skip to content
Snippets Groups Projects
Unverified Commit 5d54dfe1 authored by anarcat's avatar anarcat
Browse files

document the new reboot procedures (#33406)

parent 0f143ad0
No related branches found
No related tags found
No related merge requests found
...@@ -562,18 +562,18 @@ use case. ...@@ -562,18 +562,18 @@ use case.
## Rebooting ## Rebooting
Those hosts need special care, as we can accomplish zero-downtime Those hosts need special care, as we can accomplish zero-downtime
reboots on those machines. There's a script (`ganeti-reboot-cluster`) reboots on those machines. The `reboot` script in `tsa-misc` takes
deployed in the ganeti cluster that can be ran on the master to care of the special steps involved (which is basically to empty a
migrate all instances around and perform a clean reboot. node before rebooting it).
Such a reboot should be ran interactively, inside a `tmux` or `screen` Such a reboot should be ran interactively, inside a `tmux` or `screen`
session, and takes over 15 minutes to complete right now, but depends session, and takes over 15 minutes to complete right now, but depends
on the size of the cluster (in terms of core memory usage). on the size of the cluster (in terms of core memory usage).
Once the reboot is completed, all instances might end up on a single Once the reboot is completed, all instances might end up on a single
machine, and the cluster might need to be rebalanced. This is machine, and the cluster might need to be rebalanced, see
automatically scheduled by the `ganeti-reboot-cluster` script and will below. (Note: the update script should eventually do that, see [ticket
be done within 30 minutes of the reboot. #33406](https://trac.torproject.org/projects/tor/ticket/33406)).
## Rebalancing a cluster ## Rebalancing a cluster
......
...@@ -105,20 +105,17 @@ this: ...@@ -105,20 +105,17 @@ this:
#### Rebooting guests #### Rebooting guests
If this is only a virtual machine, and the only one affected, it can If this is only a virtual machine, and the only one affected, it can
be rebooted directly. This is a useful pipeline that will reboot the be rebooted directly. This can be done with the `tsa-misc` script
host and make sure it comes back within a certain delay: called `reboot`:
HOST=foo.torproject.org && ./reboot -H test-01.torproject.org,test-02.torproject.org
ssh root@$HOST /sbin/shutdown -r +5 new kernel &&
echo "waiting 5 minutes for reboot to happen..."
sleep 5m &&
echo "waiting for host to go down for 30 seconds..." &&
sleep 30 &&
echo "waiting up to 2 minutes for $HOST to come back..." &&
date &&
ping -c 10 -w 120 $HOST ; ssh $HOST uptime && echo "check uptime above"
(Update: the above script is now in `tsa-misc/reboot-guest`.) By default, the script will wait 2 minutes before hosts: that should
be changed to *30 minutes* if the hosts are part of a mirror network
to give the monitoring systems (`mini-nag`) time to rotate the hosts
in and out of DNS:
./reboot -H mirror-01.torproject.org,mirror-02.torproject.org --delay-nodes 1800
If the host has an encrypted filesystem and is hooked up with Mandos, it If the host has an encrypted filesystem and is hooked up with Mandos, it
will return automatically. Otherwise it might need a password to be will return automatically. Otherwise it might need a password to be
...@@ -155,7 +152,7 @@ complains about `libvirt-qemu` processes, use the ...@@ -155,7 +152,7 @@ complains about `libvirt-qemu` processes, use the
#### Rebooting Ganeti clusters #### Rebooting Ganeti clusters
This is documented in the [[ganeti]] section, but it's basically This is documented in the [[ganeti]] section, but it's basically
running the `ganeti-reboot-cluster` and `hbal` commands. running the above `reboot` sript and `hbal` commands.
#### Generic upgrade routines #### Generic upgrade routines
...@@ -167,9 +164,6 @@ LDAP hosts have information about how they can be rebooted, in the ...@@ -167,9 +164,6 @@ LDAP hosts have information about how they can be rebooted, in the
rebooted one at a time rebooted one at a time
* `manual` - needs to be done by hand * `manual` - needs to be done by hand
The scripts (in `tsa-misc`?) `torproject-reboot-rotation` and
`torproject-reboot-simple` take care of the latter two.
### Example runs ### Example runs
Here's an example run of the upgrade tool: Here's an example run of the upgrade tool:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment