Loading howto/ganeti.md +52 −0 Original line number Diff line number Diff line Loading @@ -1956,6 +1956,58 @@ to remove the logical volumes on the target node: ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_meta ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_data ### Cleaning up ghost disks Under certain circumstances, you might end up with "ghost" disks, for example: Tue Oct 4 13:24:07 2022 - ERROR: cluster : ghost disk 'ed225e68-83af-40f7-8d8c-cf7e46adad54' in temporary DRBD map It's unclear how this happens, but in this specific case it is believed the problem occurred because a disk failed to add to an instance being resized. It's *possible* this is a situation similar to the one above, in which case you must first find *where* the ghost disk is, with something like: gnt-cluster command 'lvs --noheadings' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54' If this finds a device, you can remove it as normal: ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/ed225e68-83af-40f7-8d8c-cf7e46adad54.disk1_data ... but in this case, the DRBD map is *not* associated with a logical volume. You can also check the `dmsetup` output for a match as well: gnt-cluster command 'dmsetup ls' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54' According to [this discussion](https://groups.google.com/g/ganeti/c/s5qoh26T1yA), it's possible that restarting ganeti on all nodes might clear out the issue: gnt-cluster command 'service ganeti restart' If *all* the "ghost" disks mentioned are not actually found anywhere in the cluster, either in the device mapper or logical volumes, it might just be stray data leftover in the data file. So it *looks* like the proper way to do this is to *remove* the temporary file where this data is stored: gnt-cluster command 'grep ed225e68-83af-40f7-8d8c-cf7e46adad54 /var/lib/ganeti/tempres.data' ssh ... service ganeti stop ssh ... rm /var/lib/ganeti/tempres.data ssh ... service ganeti start gnt-cluster verify That solution was proposed in [this discussion](https://groups.google.com/g/ganeti/c/SMR3yNek3Js). Anarcat toured the Ganeti source code and found that the `ComputeDRBDMap` function, in the Haskell codebase, basically just sucks the data out of that `tempres.data` JSON file, and dumps it into the Python side of things. Then the Python code looks for those disks in its internal disk list and compares. It's pretty unlikely that the warning would happen with the disks still being around, therefore. ### Fixing inconsistent disks Sometimes `gnt-cluster verify` will give this error: Loading Loading
howto/ganeti.md +52 −0 Original line number Diff line number Diff line Loading @@ -1956,6 +1956,58 @@ to remove the logical volumes on the target node: ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_meta ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_data ### Cleaning up ghost disks Under certain circumstances, you might end up with "ghost" disks, for example: Tue Oct 4 13:24:07 2022 - ERROR: cluster : ghost disk 'ed225e68-83af-40f7-8d8c-cf7e46adad54' in temporary DRBD map It's unclear how this happens, but in this specific case it is believed the problem occurred because a disk failed to add to an instance being resized. It's *possible* this is a situation similar to the one above, in which case you must first find *where* the ghost disk is, with something like: gnt-cluster command 'lvs --noheadings' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54' If this finds a device, you can remove it as normal: ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/ed225e68-83af-40f7-8d8c-cf7e46adad54.disk1_data ... but in this case, the DRBD map is *not* associated with a logical volume. You can also check the `dmsetup` output for a match as well: gnt-cluster command 'dmsetup ls' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54' According to [this discussion](https://groups.google.com/g/ganeti/c/s5qoh26T1yA), it's possible that restarting ganeti on all nodes might clear out the issue: gnt-cluster command 'service ganeti restart' If *all* the "ghost" disks mentioned are not actually found anywhere in the cluster, either in the device mapper or logical volumes, it might just be stray data leftover in the data file. So it *looks* like the proper way to do this is to *remove* the temporary file where this data is stored: gnt-cluster command 'grep ed225e68-83af-40f7-8d8c-cf7e46adad54 /var/lib/ganeti/tempres.data' ssh ... service ganeti stop ssh ... rm /var/lib/ganeti/tempres.data ssh ... service ganeti start gnt-cluster verify That solution was proposed in [this discussion](https://groups.google.com/g/ganeti/c/SMR3yNek3Js). Anarcat toured the Ganeti source code and found that the `ComputeDRBDMap` function, in the Haskell codebase, basically just sucks the data out of that `tempres.data` JSON file, and dumps it into the Python side of things. Then the Python code looks for those disks in its internal disk list and compares. It's pretty unlikely that the warning would happen with the disks still being around, therefore. ### Fixing inconsistent disks Sometimes `gnt-cluster verify` will give this error: Loading