@@ -766,6 +766,44 @@ the cluster-level tag is a *prefix* that can be used to create
*multiple* such tags. This configuration also happens to be simpler
and easier to use...
## HDD migration restrictions
Cluster balancing works well until there are inconsistencies between
how nodes are configured. In our case, some nodes have HDDs (Hard Disk
Drives, AKA spinning rust) and others do not. Therefore, it's not
possible to move an instance from a node with a disk allocated on the
HDD to a node that does not have such a disk.
Yet somehow the allocator is not smart enough to tell, and you will
get the following error when doing an automatic rebalancing:
one of the migrate failed and stopped the cluster balance: Can't create block device: Can't create block device <LogicalVolume(/dev/vg_ganeti_hdd/98d30e7d-0a47-4a7d-aeed-6301645d8469.disk3_data, visible as /dev/, size=102400m)> on node fsn-node-07.torproject.org for instance gitlab-02.torproject.org: Can't create block device: Can't compute PV info for vg vg_ganeti_hdd
In this case, it is trying to migrate the `gitlab-02` server from
`fsn-node-01` (which has an HDD) to `fsn-node-07` (which hasn't),
which naturally fails. This is a known limitation of the Ganeti
code. There has been a [draft design document for multiple storage
unit support](http://docs.ganeti.org/ganeti/master/html/design-multi-storage-htools.html) since 2015, but it has [never been
implemented](https://github.com/ganeti/ganeti/issues/865). There has been multiple issues reported upstream on
the subject:
*[208: Bad behaviour when multiple volume groups exists on nodes](https://github.com/ganeti/ganeti/issues/208)
*[1199: unable to mark storage as unavailable for allocation](https://github.com/ganeti/ganeti/issues/1199)
*[1240: Disk space check with multiple VGs is broken](https://github.com/ganeti/ganeti/issues/1240)
*[1379: Support for displaying/handling multiple volume groups](https://github.com/ganeti/ganeti/issues/1379)
Unfortunately, there are no known workarounds for this, at least not
that fix the `hbal` command. It *is* possible to exclude the faulty
migration from the pool of possible moves, however, for example in the