Skip to content
Snippets Groups Projects
Verified Commit edcb4130 authored by anarcat's avatar anarcat
Browse files

explain the problems with HDD migration and possible workarounds

parent 50ae7ec5
No related branches found
No related tags found
No related merge requests found
......@@ -766,6 +766,44 @@ the cluster-level tag is a *prefix* that can be used to create
*multiple* such tags. This configuration also happens to be simpler
and easier to use...
## HDD migration restrictions
Cluster balancing works well until there are inconsistencies between
how nodes are configured. In our case, some nodes have HDDs (Hard Disk
Drives, AKA spinning rust) and others do not. Therefore, it's not
possible to move an instance from a node with a disk allocated on the
HDD to a node that does not have such a disk.
Yet somehow the allocator is not smart enough to tell, and you will
get the following error when doing an automatic rebalancing:
one of the migrate failed and stopped the cluster balance: Can't create block device: Can't create block device <LogicalVolume(/dev/vg_ganeti_hdd/98d30e7d-0a47-4a7d-aeed-6301645d8469.disk3_data, visible as /dev/, size=102400m)> on node fsn-node-07.torproject.org for instance gitlab-02.torproject.org: Can't create block device: Can't compute PV info for vg vg_ganeti_hdd
In this case, it is trying to migrate the `gitlab-02` server from
`fsn-node-01` (which has an HDD) to `fsn-node-07` (which hasn't),
which naturally fails. This is a known limitation of the Ganeti
code. There has been a [draft design document for multiple storage
unit support](http://docs.ganeti.org/ganeti/master/html/design-multi-storage-htools.html) since 2015, but it has [never been
implemented](https://github.com/ganeti/ganeti/issues/865). There has been multiple issues reported upstream on
the subject:
* [208: Bad behaviour when multiple volume groups exists on nodes](https://github.com/ganeti/ganeti/issues/208)
* [1199: unable to mark storage as unavailable for allocation](https://github.com/ganeti/ganeti/issues/1199)
* [1240: Disk space check with multiple VGs is broken](https://github.com/ganeti/ganeti/issues/1240)
* [1379: Support for displaying/handling multiple volume groups](https://github.com/ganeti/ganeti/issues/1379)
Unfortunately, there are no known workarounds for this, at least not
that fix the `hbal` command. It *is* possible to exclude the faulty
migration from the pool of possible moves, however, for example in the
above case:
hbal -L -v --exclude-instances gitlab-02.torproject.org
It's also possible to use the `--no-disk-moves` option to avoid disk
move operations altogether.
Both workarounds obviously do not correctly balance the cluster...
## Adding and removing addresses on instances
Say you created an instance but forgot to need to assign an extra
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment