... | @@ -1583,6 +1583,86 @@ There are many sockets in the `ctrl` directory, including: |
... | @@ -1583,6 +1583,86 @@ There are many sockets in the `ctrl` directory, including: |
|
(the `-qmp` argument to `qemu`)
|
|
(the `-qmp` argument to `qemu`)
|
|
* `.kvmd`: same as the above?
|
|
* `.kvmd`: same as the above?
|
|
|
|
|
|
|
|
## Instance backup and migration
|
|
|
|
|
|
|
|
The [export/import](https://docs.ganeti.org/docs/ganeti/3.0/html/admin.html#export-import) mechanism can be used to export and import VMs
|
|
|
|
one at a time. This can be used, for example, to migrate a VM between
|
|
|
|
clusters or backup a VM before a critical change.
|
|
|
|
|
|
|
|
Note that this procedure is still a work in progress. A simulation was
|
|
|
|
performed in [tpo/tpa/team#40917](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40917), a proper procedure might vary
|
|
|
|
from this significantly. In particular, there are some optimizations
|
|
|
|
possible through things like [zerofree](https://tracker.debian.org/pkg/zerofree) and compression...
|
|
|
|
|
|
|
|
Also note that this migration has a lot of manual steps and is better
|
|
|
|
accomplished using the `move-instance` command, documented in the
|
|
|
|
[Cross-cluster migrations section](#cross-cluster-migrations).
|
|
|
|
|
|
|
|
Here is the procedure to export a single VM, copy it to another cluster,
|
|
|
|
and import it:
|
|
|
|
|
|
|
|
1. find nodes to host the exported VM on the source cluster and the
|
|
|
|
target cluster; it needs enough disk space in `/var/lib/ganeti/export` to
|
|
|
|
keep a copy of a snapshot of the VM:
|
|
|
|
|
|
|
|
df -h /var/lib/ganeti/export
|
|
|
|
|
|
|
|
Typically, you'd make a logical volume to fit more data in there:
|
|
|
|
|
|
|
|
lvcreate -n export vg_ganeti -L200g &&
|
|
|
|
mkfs -t ext4 /dev/vg_ganeti/export &&
|
|
|
|
mkdir -p /var/lib/ganeti/export &&
|
|
|
|
mount /dev/vg_ganeti/export /var/lib/ganeti/export
|
|
|
|
|
|
|
|
Make sure you do that on *both* ends of the migration.
|
|
|
|
|
|
|
|
2. have the right kernel modules loaded, which might require a
|
|
|
|
reboot of the source node:
|
|
|
|
|
|
|
|
modprobe dm_snapshot
|
|
|
|
|
|
|
|
3. on the master of the source Ganeti cluster, export the VM to the
|
|
|
|
source node, also use `--noshutdown` if you cannot afford to have
|
|
|
|
downtime on the VM *and* you are ready to lose data accumulated
|
|
|
|
after the snapshot:
|
|
|
|
|
|
|
|
gnt-backup export -n chi-node-01.torproject.org test-01.torproject.org
|
|
|
|
gnt-instance stop test-01.torproject.org
|
|
|
|
|
|
|
|
WARNING: this step is currently not working if there's a second
|
|
|
|
disk (or swap device? to be confirmed), see [this upstream issue
|
|
|
|
for details](https://github.com/ganeti/instance-debootstrap/issues/18). for now we're deploying the "nocloud"
|
|
|
|
export/import mechanisms through Puppet to workaround that problem
|
|
|
|
which means the whole disk is copied (as opposed to only the used
|
|
|
|
parts)
|
|
|
|
|
|
|
|
4. copy the VM snapshot from the source node to node in the target
|
|
|
|
cluster:
|
|
|
|
|
|
|
|
mkdir -p /var/lib/ganeti/export
|
|
|
|
rsync -ASHaxX --info=progress2 root@chi-node-01.torproject.org:/var/lib/ganeti/export/test-01.torproject.org/ /var/lib/ganeti/export/test-01.torproject.org/
|
|
|
|
|
|
|
|
Note that this assumes the target cluster has root access on the
|
|
|
|
source cluster. One way to make that happen is by creating a new
|
|
|
|
SSH key:
|
|
|
|
|
|
|
|
ssh-keygen -P "" -C 'sync key from dal-node-01'
|
|
|
|
|
|
|
|
And dump that public key in `/etc/ssh/userkeys/root.more` on the
|
|
|
|
source cluster.
|
|
|
|
|
|
|
|
5. on the master of the target Ganeti cluster, import the VM:
|
|
|
|
|
|
|
|
gnt-backup import -n dal-node-01:dal-node-02 --src-node=dal-node-01 --src-dir=/var/lib/ganeti/export/test-01.torproject.org --no-ip-check --no-name-check --net 0:ip=pool,network=gnt-dal-01 -t drbd --no-wait-for-sync test-01.torproject.org
|
|
|
|
|
|
|
|
6. enter the restored server console to change the IP address:
|
|
|
|
|
|
|
|
gnt-instance console test-01.torproject.org
|
|
|
|
|
|
|
|
7. if everything looks well, change the IP in LDAP
|
|
|
|
|
|
|
|
8. destroy the old VM
|
|
|
|
|
|
## Cross-cluster migrations
|
|
## Cross-cluster migrations
|
|
|
|
|
|
If an entire cluster needs to be evacuated, the [move-instance](https://docs.ganeti.org/docs/ganeti/3.0/html/move-instance.html)
|
|
If an entire cluster needs to be evacuated, the [move-instance](https://docs.ganeti.org/docs/ganeti/3.0/html/move-instance.html)
|
... | @@ -2285,79 +2365,6 @@ Look into logs on the relevant nodes (particularly |
... | @@ -2285,79 +2365,6 @@ Look into logs on the relevant nodes (particularly |
|
`/var/log/ganeti/node-daemon.log`, which shows all commands ran by
|
|
`/var/log/ganeti/node-daemon.log`, which shows all commands ran by
|
|
ganeti) when you have problems.
|
|
ganeti) when you have problems.
|
|
|
|
|
|
### Migrating a VM between clusters
|
|
|
|
|
|
|
|
The [export/import](https://docs.ganeti.org/docs/ganeti/3.0/html/admin.html#export-import) mechanism can also be used to export and import
|
|
|
|
VMs one at a time, if only a subset of the cluster needs to be
|
|
|
|
evacuated.
|
|
|
|
|
|
|
|
Note that this procedure is still a work in progress. A simulation was
|
|
|
|
performed in [tpo/tpa/team#40917](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40917), a proper procedure might vary
|
|
|
|
from this significantly. In particular, there are some optimizations
|
|
|
|
possible through things like [zerofree](https://tracker.debian.org/pkg/zerofree) and compression...
|
|
|
|
|
|
|
|
1. find nodes to host the exported VM on the source cluster and the
|
|
|
|
target cluster; it needs enough disk space in `/var/lib/ganeti/export` to
|
|
|
|
keep a copy of a snapshot of the VM:
|
|
|
|
|
|
|
|
df -h /var/lib/ganeti/export
|
|
|
|
|
|
|
|
Typically, you'd make a logical volume to fit more data in there:
|
|
|
|
|
|
|
|
lvcreate -n export vg_ganeti -L200g &&
|
|
|
|
mkfs -t ext4 /dev/vg_ganeti/export &&
|
|
|
|
mkdir -p /var/lib/ganeti/export &&
|
|
|
|
mount /dev/vg_ganeti/export /var/lib/ganeti/export
|
|
|
|
|
|
|
|
Make sure you do that on *both* ends of the migration.
|
|
|
|
|
|
|
|
2. have the right kernel modules loaded, which might require a
|
|
|
|
reboot of the source node:
|
|
|
|
|
|
|
|
modprobe dm_snapshot
|
|
|
|
|
|
|
|
3. on the master of the source Ganeti cluster, export the VM to the
|
|
|
|
source node, also use `--noshutdown` if you cannot afford to have
|
|
|
|
downtime on the VM *and* you are ready to lose data accumulated
|
|
|
|
after the snapshot:
|
|
|
|
|
|
|
|
gnt-backup export -n chi-node-01.torproject.org test-01.torproject.org
|
|
|
|
gnt-instance stop test-01.torproject.org
|
|
|
|
|
|
|
|
WARNING: this step is currently not working if there's a second
|
|
|
|
disk (or swap device? to be confirmed), see [this upstream issue
|
|
|
|
for details](https://github.com/ganeti/instance-debootstrap/issues/18). for now we're deploying the "nocloud"
|
|
|
|
export/import mechanisms through Puppet to workaround that problem
|
|
|
|
which means the whole disk is copied (as opposed to only the used
|
|
|
|
parts)
|
|
|
|
|
|
|
|
4. copy the VM snapshot from the source node to node in the target
|
|
|
|
cluster:
|
|
|
|
|
|
|
|
mkdir -p /var/lib/ganeti/export
|
|
|
|
rsync -ASHaxX --info=progress2 root@chi-node-01.torproject.org:/var/lib/ganeti/export/test-01.torproject.org/ /var/lib/ganeti/export/test-01.torproject.org/
|
|
|
|
|
|
|
|
Note that this assumes the target cluster has root access on the
|
|
|
|
source cluster. One way to make that happen is by creating a new
|
|
|
|
SSH key:
|
|
|
|
|
|
|
|
ssh-keygen -P "" -C 'sync key from dal-node-01'
|
|
|
|
|
|
|
|
And dump that public key in `/etc/ssh/userkeys/root.more` on the
|
|
|
|
source cluster.
|
|
|
|
|
|
|
|
5. on the master of the target Ganeti cluster, import the VM:
|
|
|
|
|
|
|
|
gnt-backup import -n dal-node-01:dal-node-02 --src-node=dal-node-01 --src-dir=/var/lib/ganeti/export/test-01.torproject.org --no-ip-check --no-name-check --net 0:ip=pool,network=gnt-dal-01 -t drbd --no-wait-for-sync test-01.torproject.org
|
|
|
|
|
|
|
|
6. enter the restored server console to change the IP address:
|
|
|
|
|
|
|
|
gnt-instance console test-01.torproject.org
|
|
|
|
|
|
|
|
7. if everything looks well, change the IP in LDAP
|
|
|
|
|
|
|
|
8. destroy the old VM
|
|
|
|
|
|
|
|
### Mass migrating instances to a new cluster
|
|
### Mass migrating instances to a new cluster
|
|
|
|
|
|
If an entire cluster needs to be evacuated, the [move-instance](https://docs.ganeti.org/docs/ganeti/3.0/html/move-instance.html)
|
|
If an entire cluster needs to be evacuated, the [move-instance](https://docs.ganeti.org/docs/ganeti/3.0/html/move-instance.html)
|
... | | ... | |