... | ... | @@ -1956,6 +1956,58 @@ to remove the logical volumes on the target node: |
|
|
ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_meta
|
|
|
ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_data
|
|
|
|
|
|
### Cleaning up ghost disks
|
|
|
|
|
|
Under certain circumstances, you might end up with "ghost" disks, for
|
|
|
example:
|
|
|
|
|
|
Tue Oct 4 13:24:07 2022 - ERROR: cluster : ghost disk 'ed225e68-83af-40f7-8d8c-cf7e46adad54' in temporary DRBD map
|
|
|
|
|
|
It's unclear how this happens, but in this specific case it is
|
|
|
believed the problem occurred because a disk failed to add to an
|
|
|
instance being resized.
|
|
|
|
|
|
It's *possible* this is a situation similar to the one above, in which
|
|
|
case you must first find *where* the ghost disk is, with something
|
|
|
like:
|
|
|
|
|
|
gnt-cluster command 'lvs --noheadings' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54'
|
|
|
|
|
|
If this finds a device, you can remove it as normal:
|
|
|
|
|
|
ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/ed225e68-83af-40f7-8d8c-cf7e46adad54.disk1_data
|
|
|
|
|
|
... but in this case, the DRBD map is *not* associated with a logical
|
|
|
volume. You can also check the `dmsetup` output for a match as well:
|
|
|
|
|
|
gnt-cluster command 'dmsetup ls' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54'
|
|
|
|
|
|
According to [this discussion](https://groups.google.com/g/ganeti/c/s5qoh26T1yA), it's possible that restarting
|
|
|
ganeti on all nodes might clear out the issue:
|
|
|
|
|
|
gnt-cluster command 'service ganeti restart'
|
|
|
|
|
|
If *all* the "ghost" disks mentioned are not actually found anywhere
|
|
|
in the cluster, either in the device mapper or logical volumes, it
|
|
|
might just be stray data leftover in the data file.
|
|
|
|
|
|
So it *looks* like the proper way to do this is to *remove* the
|
|
|
temporary file where this data is stored:
|
|
|
|
|
|
gnt-cluster command 'grep ed225e68-83af-40f7-8d8c-cf7e46adad54 /var/lib/ganeti/tempres.data'
|
|
|
ssh ... service ganeti stop
|
|
|
ssh ... rm /var/lib/ganeti/tempres.data
|
|
|
ssh ... service ganeti start
|
|
|
gnt-cluster verify
|
|
|
|
|
|
That solution was proposed in [this discussion](https://groups.google.com/g/ganeti/c/SMR3yNek3Js). Anarcat toured the
|
|
|
Ganeti source code and found that the `ComputeDRBDMap` function, in
|
|
|
the Haskell codebase, basically just sucks the data out of that
|
|
|
`tempres.data` JSON file, and dumps it into the Python side of
|
|
|
things. Then the Python code looks for those disks in its internal
|
|
|
disk list and compares. It's pretty unlikely that the warning would
|
|
|
happen with the disks still being around, therefore.
|
|
|
|
|
|
### Fixing inconsistent disks
|
|
|
|
|
|
Sometimes `gnt-cluster verify` will give this error:
|
... | ... | |