... | @@ -595,7 +595,7 @@ yet). |
... | @@ -595,7 +595,7 @@ yet). |
|
|
|
|
|
Ganeti is smart about assigning instances to nodes. There's also a
|
|
Ganeti is smart about assigning instances to nodes. There's also a
|
|
command (`hbal`) to automatically rebalance the cluster (see
|
|
command (`hbal`) to automatically rebalance the cluster (see
|
|
below). If for some reason hbal doesn’t do what you want or you need
|
|
below). If for some reason `hbal` doesn’t do what you want or you need
|
|
to move things around for other reasons, here are a few commands that
|
|
to move things around for other reasons, here are a few commands that
|
|
might be handy.
|
|
might be handy.
|
|
|
|
|
... | @@ -1056,7 +1056,15 @@ special condition. |
... | @@ -1056,7 +1056,15 @@ special condition. |
|
This can be easily corrected with this command, which will spread
|
|
This can be easily corrected with this command, which will spread
|
|
instances around the cluster to balance it:
|
|
instances around the cluster to balance it:
|
|
|
|
|
|
hbal -L -C -v -X
|
|
hbal -L -C -v -P
|
|
|
|
|
|
|
|
The above will show the proposed solution, with the state of the
|
|
|
|
cluster before, and after (`-P`) and the commands to get there
|
|
|
|
(`-C`). To actually execute the commands, you can copy-paste those
|
|
|
|
commands. An alternative is to pass the `-X` argument, to tell `hbal`
|
|
|
|
to actually issue the commands itself:
|
|
|
|
|
|
|
|
hbal -L -C -v -P -X
|
|
|
|
|
|
This will automatically move the instances around and rebalance the
|
|
This will automatically move the instances around and rebalance the
|
|
cluster. Here's an example run on a small cluster:
|
|
cluster. Here's an example run on a small cluster:
|
... | @@ -1169,7 +1177,7 @@ that fix the `hbal` command. It *is* possible to exclude the faulty |
... | @@ -1169,7 +1177,7 @@ that fix the `hbal` command. It *is* possible to exclude the faulty |
|
migration from the pool of possible moves, however, for example in the
|
|
migration from the pool of possible moves, however, for example in the
|
|
above case:
|
|
above case:
|
|
|
|
|
|
hbal -L -v --exclude-instances gitlab-02.torproject.org
|
|
hbal -L -v -C -P --exclude-instances gitlab-02.torproject.org
|
|
|
|
|
|
It's also possible to use the `--no-disk-moves` option to avoid disk
|
|
It's also possible to use the `--no-disk-moves` option to avoid disk
|
|
move operations altogether.
|
|
move operations altogether.
|
... | @@ -1180,6 +1188,32 @@ to workaround that issue, but [those do not work for secondary |
... | @@ -1180,6 +1188,32 @@ to workaround that issue, but [those do not work for secondary |
|
instances](https://github.com/ganeti/ganeti/issues/1497). For this we would need to setup [node groups](http://docs.ganeti.org/ganeti/current/html/man-gnt-group.html)
|
|
instances](https://github.com/ganeti/ganeti/issues/1497). For this we would need to setup [node groups](http://docs.ganeti.org/ganeti/current/html/man-gnt-group.html)
|
|
instead.
|
|
instead.
|
|
|
|
|
|
|
|
A good trick is to look at the solution proposed by `hbal`:
|
|
|
|
|
|
|
|
Trying to minimize the CV...
|
|
|
|
1. tbb-nightlies-master fsn-node-01:fsn-node-02 => fsn-node-04:fsn-node-02 6.12095251 a=f r:fsn-node-04 f
|
|
|
|
2. bacula-director-01 fsn-node-01:fsn-node-03 => fsn-node-03:fsn-node-01 4.56735007 a=f
|
|
|
|
3. staticiforme fsn-node-02:fsn-node-04 => fsn-node-02:fsn-node-01 3.99398707 a=r:fsn-node-01
|
|
|
|
4. cache01 fsn-node-07:fsn-node-05 => fsn-node-07:fsn-node-01 3.55940346 a=r:fsn-node-01
|
|
|
|
5. vineale fsn-node-05:fsn-node-06 => fsn-node-05:fsn-node-01 3.18480313 a=r:fsn-node-01
|
|
|
|
6. pauli fsn-node-06:fsn-node-07 => fsn-node-06:fsn-node-01 2.84263128 a=r:fsn-node-01
|
|
|
|
7. neriniflorum fsn-node-05:fsn-node-02 => fsn-node-05:fsn-node-01 2.59000393 a=r:fsn-node-01
|
|
|
|
8. static-master-fsn fsn-node-01:fsn-node-02 => fsn-node-02:fsn-node-01 2.47345604 a=f
|
|
|
|
9. polyanthum fsn-node-02:fsn-node-07 => fsn-node-07:fsn-node-02 2.47257956 a=f
|
|
|
|
10. forrestii fsn-node-07:fsn-node-06 => fsn-node-06:fsn-node-07 2.45119245 a=f
|
|
|
|
Cluster score improved from 8.92360196 to 2.45119245
|
|
|
|
|
|
|
|
Look at the last column. The `a=` field shows what "action" will be
|
|
|
|
taken. A `f` is a failover (or "migrate"), and a `r:` is a
|
|
|
|
`replace-disks`, with the new secondary after the semi-colon (`:`). In
|
|
|
|
the above case, the proposed solution is correct: no secondary node is
|
|
|
|
in the range of nodes that lacks HDDs (`fsn-node-0[5-7]`). If one of
|
|
|
|
the disk replaces hits one of the nodes without HDD, then it's when
|
|
|
|
you use `--exclude-instances` to find a better solution. A typical
|
|
|
|
exclude is:
|
|
|
|
|
|
|
|
hbal -L -v -C -P --exclude-instance=bacula-director-01,tbb-nightlies-master,eugeni,winklerianum,woronowii,rouyi,loghost01,materculae,gayi,weissii
|
|
|
|
|
|
Another option is to specifically look for instances that do not have
|
|
Another option is to specifically look for instances that do not have
|
|
a HDD and migrate only those. In my situation, `gnt-cluster verify`
|
|
a HDD and migrate only those. In my situation, `gnt-cluster verify`
|
|
was complaining that `fsn-node-02` was full, so I looked for all the
|
|
was complaining that `fsn-node-02` was full, so I looked for all the
|
... | | ... | |