... | ... | @@ -39,7 +39,7 @@ communicate commands and synchronise disks (with |
|
|
a *primary* and a *secondary*: the *primary* is where the virtual
|
|
|
machine actually runs and the *secondary* acts as a hot failover.
|
|
|
|
|
|
See also the more extensive [glossary in the Ganeti documentation](http://docs.ganeti.org/ganeti/2.15/html/glossary.html).
|
|
|
See also the more extensive [glossary in the Ganeti documentation](http://docs.ganeti.org/docs/ganeti/3.0/html/glossary.html).
|
|
|
|
|
|
## Adding a new instance
|
|
|
|
... | ... | @@ -325,7 +325,7 @@ half-open filesystems and stray DRBD devices, do take a look at the |
|
|
### CPU, memory changes
|
|
|
|
|
|
It's possible to change the IP, CPU, or memory allocation of an instance
|
|
|
using the [gnt-instance modify](http://docs.ganeti.org/ganeti/2.15/man/gnt-instance.html#modify) command:
|
|
|
using the [gnt-instance modify](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-instance.html#modify) command:
|
|
|
|
|
|
gnt-instance modify -B vcpus=4 test1.torproject.org
|
|
|
gnt-instance modify -B memory=8g test1.torproject.org
|
... | ... | @@ -343,7 +343,7 @@ to the `/etc/network/interfaces*` files: |
|
|
|
|
|
### Resizing disks
|
|
|
|
|
|
The [gnt-instance grow-disk](http://docs.ganeti.org/ganeti/2.15/man/gnt-instance.html#grow-disk) command can be used to change the size
|
|
|
The [gnt-instance grow-disk](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-instance.html#grow-disk) command can be used to change the size
|
|
|
of the underlying device:
|
|
|
|
|
|
gnt-instance grow-disk --absolute test1.torproject.org 0 16g
|
... | ... | @@ -781,7 +781,7 @@ The latter does not show disk usage for secondary volume groups (see |
|
|
|
|
|
gnt-node list-storage
|
|
|
|
|
|
The [gnt-cluster verify](http://docs.ganeti.org/ganeti/2.15/man/gnt-cluster.html#verify) command will also check to see if there's
|
|
|
The [gnt-cluster verify](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-cluster.html#verify) command will also check to see if there's
|
|
|
enough space on secondaries to account for the failure of a
|
|
|
node. Healthy output looks like this:
|
|
|
|
... | ... | @@ -810,7 +810,7 @@ A sick node would have said something like this instead: |
|
|
Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy
|
|
|
Mon Oct 26 18:59:37 2009 - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail
|
|
|
|
|
|
See the [ganeti manual](http://docs.ganeti.org/ganeti/2.15/html/walkthrough.html#n-1-errors) for a more extensive example
|
|
|
See the [ganeti manual](http://docs.ganeti.org/docs/ganeti/3.0/html/walkthrough.html#n-1-errors) for a more extensive example
|
|
|
|
|
|
Also note the `hspace -L` command, which can tell you how many
|
|
|
instances can be created in a given cluster. It uses the "standard"
|
... | ... | @@ -838,16 +838,16 @@ downtime during the migration. It might be preferable to actually |
|
|
shutdown the machine for some reason (for example if we actually want
|
|
|
to reboot because of a security upgrade). Or we might not be able to
|
|
|
live-migrate because the node is down. In this case, we do a
|
|
|
[failover](http://docs.ganeti.org/ganeti/2.15/html/admin.html#failing-over-an-instance)
|
|
|
[failover](http://docs.ganeti.org/docs/ganeti/3.0/html/admin.html#failing-over-an-instance)
|
|
|
|
|
|
gnt-instance failover test1.torproject.org
|
|
|
|
|
|
The [gnt-node evacuate](http://docs.ganeti.org/ganeti/2.15/man/gnt-node.html#evacuate) command can also be used to "empty" a given
|
|
|
The [gnt-node evacuate](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-node.html#evacuate) command can also be used to "empty" a given
|
|
|
node altogether, in case of an emergency:
|
|
|
|
|
|
gnt-node evacuate -I . fsn-node-02.torproject.org
|
|
|
|
|
|
Similarly, the [gnt-node failover](http://docs.ganeti.org/ganeti/2.15/man/gnt-node.html#failover) command can be used to
|
|
|
Similarly, the [gnt-node failover](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-node.html#failover) command can be used to
|
|
|
hard-recover from a completely crashed node:
|
|
|
|
|
|
gnt-node failover fsn-node-02.torproject.org
|
... | ... | @@ -1232,7 +1232,7 @@ Import procedure: |
|
|
|
|
|
### References
|
|
|
|
|
|
* [Upstream docs](http://docs.ganeti.org/ganeti/2.15/html/admin.html#import-of-foreign-instances) have the canonical incantation:
|
|
|
* [Upstream docs](http://docs.ganeti.org/docs/ganeti/3.0/html/admin.html#import-of-foreign-instances) have the canonical incantation:
|
|
|
|
|
|
gnt-instance add -t plain -n HOME_NODE ... --disk 0:adopt=lv_name[,vg=vg_name] INSTANCE_NAME
|
|
|
|
... | ... | @@ -1242,7 +1242,7 @@ Import procedure: |
|
|
* [Riseup docs](https://we.riseup.net/riseup+tech/ganeti#move-an-instance-from-one-cluster-to-another-from-) suggest creating a VM without installing, shutting
|
|
|
down and then syncing
|
|
|
|
|
|
Ganeti [supports importing and exporting](http://docs.ganeti.org/ganeti/2.15/html/design-ovf-support.html?highlight=qcow) from the [Open
|
|
|
Ganeti [supports importing and exporting](http://docs.ganeti.org/docs/ganeti/3.0/html/design-ovf-support.html?highlight=qcow) from the [Open
|
|
|
Virtualization Format](https://en.wikipedia.org/wiki/Open_Virtualization_Format) (OVF), but unfortunately it [doesn't seem
|
|
|
libvirt supports *exporting* to OVF](https://forums.centos.org/viewtopic.php?t=49231). There's a [virt-convert](http://manpages.debian.org/virt-convert)
|
|
|
tool which can *import* OVF, but not the reverse. The [libguestfs](http://www.libguestfs.org/)
|
... | ... | @@ -1252,7 +1252,7 @@ exporting to OVF or anything Ganeti can load directly. |
|
|
So people have written [their own conversion tools](https://virtuallyhyper.com/2013/06/migrate-from-libvirt-kvm-to-virtualbox/) or [their own
|
|
|
conversion procedure](https://scienceofficersblog.blogspot.com/2014/04/using-cloud-images-with-ganeti.html).
|
|
|
|
|
|
Ganeti also supports [file-backed instances](http://docs.ganeti.org/ganeti/2.15/html/design-file-based-storage.html) but "adoption" is
|
|
|
Ganeti also supports [file-backed instances](http://docs.ganeti.org/docs/ganeti/3.0/html/design-file-based-storage.html) but "adoption" is
|
|
|
specifically designed for logical volumes, so it doesn't work for our
|
|
|
use case.
|
|
|
|
... | ... | @@ -1590,7 +1590,7 @@ it directly: |
|
|
|
|
|
### Node failure
|
|
|
|
|
|
Ganeti clusters are designed to be [self-healing](http://docs.ganeti.org/ganeti/2.15/html/admin.html#autorepair). As long as only
|
|
|
Ganeti clusters are designed to be [self-healing](http://docs.ganeti.org/docs/ganeti/3.0/html/admin.html#autorepair). As long as only
|
|
|
one machine disappears, the cluster should be able to recover by
|
|
|
failing over other nodes. This is currently done manually, however.
|
|
|
|
... | ... | @@ -1653,7 +1653,7 @@ exploring the root case of the failure, however, before readding the |
|
|
machine to the cluster.
|
|
|
|
|
|
Recoveries could eventually be automated if such situations occur more
|
|
|
often, by scheduling a [harep](http://docs.ganeti.org/ganeti/2.15/man/harep.html) cron job, which isn't enabled in
|
|
|
often, by scheduling a [harep](http://docs.ganeti.org/docs/ganeti/3.0/html/man-harep.html) cron job, which isn't enabled in
|
|
|
Debian by default. See also the [autorepair](http://docs.ganeti.org/docs/ganeti/2.15/html/admin.html#autorepair) section of the admin
|
|
|
manual.
|
|
|
|
... | ... | @@ -2093,7 +2093,7 @@ Note that the above assumes only a < 10 nodes cluster. |
|
|
|
|
|
### Other troubleshooting
|
|
|
|
|
|
The [walkthrough](http://docs.ganeti.org/ganeti/2.15/html/walkthrough.html) also has a few recipes to resolve common
|
|
|
The [walkthrough](http://docs.ganeti.org/docs/ganeti/3.0/html/walkthrough.html) also has a few recipes to resolve common
|
|
|
problems.
|
|
|
|
|
|
See also the [common issues page](https://github.com/ganeti/ganeti/wiki/Common-Issues) in the Ganeti wiki.
|
... | ... | @@ -2697,7 +2697,7 @@ the two networks in the future, so it's good to have some difference. |
|
|
|
|
|
We considered experimenting with the new AX line ([AX51-NVMe](https://www.hetzner.com/dedicated-rootserver/ax51-nvme?country=OTHER)) but
|
|
|
in the past DSA had problems live-migrating (it wouldn't immediately
|
|
|
fail but there were "issues" after). So we might need to [failover](http://docs.ganeti.org/ganeti/2.15/man/gnt-instance.html#failover)
|
|
|
fail but there were "issues" after). So we might need to [failover](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-instance.html#failover)
|
|
|
instead of migrate between those parts of the cluster. There are also
|
|
|
doubts that the Linux kernel supports those shiny new processors at
|
|
|
all: similar processors had [trouble booting before Linux 5.5](https://www.phoronix.com/scan.php?page=news_item&px=Threadripper-3000-MCE-5.5-Fix) for
|
... | ... | @@ -2750,7 +2750,7 @@ our work. Of course, it's also possible that live migrates work fine |
|
|
if *no* `cpu_type` at all is specified in the cluster, but that needs
|
|
|
to be verified.
|
|
|
|
|
|
Nodes could also [grouped](http://docs.ganeti.org/ganeti/2.15/man/gnt-group.html) to limit (automated) live migration to a
|
|
|
Nodes could also [grouped](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-group.html) to limit (automated) live migration to a
|
|
|
subset of nodes.
|
|
|
|
|
|
References:
|
... | ... | |