... | @@ -2639,6 +2639,23 @@ disks (so to speak) with, for example: |
... | @@ -2639,6 +2639,23 @@ disks (so to speak) with, for example: |
|
|
|
|
|
gnt-instance activate-disks onionbalance-02.torproject.org
|
|
gnt-instance activate-disks onionbalance-02.torproject.org
|
|
|
|
|
|
|
|
### Failed disk on node
|
|
|
|
|
|
|
|
If a disk fails on a node, we should get it replaced as soon as possible. Here
|
|
|
|
are the steps one can follow to achieve that:
|
|
|
|
|
|
|
|
1. Open an incident-type issue in gitlab in the TPA/Team project. Set its
|
|
|
|
priority to High.
|
|
|
|
2. empty the node of its instances. in the `fabric-tasks` repository: `./ganeti
|
|
|
|
-H $cluster-node-$number.torproject.org empty-node`
|
|
|
|
* Take note in the issue of which instances were migrated by this operation.
|
|
|
|
3. Open a support ticket with Hetzner and then once the machine is back online
|
|
|
|
with the new disk, replace the it in the appropriate RAID arrays. See [the
|
|
|
|
RAID documentation page](howto/raid#replacing-a-drive)
|
|
|
|
4. Finally, bring back the instances on the node with the list of instances
|
|
|
|
noted down at step 1. Still in `fabric-tasks`: `fab -H $cluster_master -i
|
|
|
|
instance1 -i instance2`
|
|
|
|
|
|
## Disaster recovery
|
|
## Disaster recovery
|
|
|
|
|
|
If things get completely out of hand and the cluster becomes too
|
|
If things get completely out of hand and the cluster becomes too
|
... | | ... | |