Newer
Older
[Ganeti](http://ganeti.org/) is software designed to facilitate the management of
virtual machines (KVM or Xen). It helps you move virtual machine
instances from one node to another, create an instance with DRBD
replication on another node and do the live migration from one to
another, etc.
This will show the running guests, known as "instances":
gnt-instance list
Our instances do serial console, starting in grub. To access it, run
gnt-instance console test01.torproject.org
To exit, use `^]` -- that is, Control-<Closing Bracket>.
In Ganeti, we use the following terms:
* **node** a physical machine is called a *node* and a
* **instance** a virtual machine
* **master**: a *node* where on which we issue Ganeti commands and
that supervises all the other nodes
Nodes are interconnected through a private network that is used to
communicate commands and synchronise disks (with
[howto/drbd](howto/drbd)). Instances are normally assigned two nodes:
a *primary* and a *secondary*: the *primary* is where the virtual
machine actually runs and the *secondary* acts as a hot failover.
See also the more extensive [glossary in the Ganeti documentation](http://docs.ganeti.org/ganeti/2.15/html/glossary.html).
This command creates a new guest, or "instance" in Ganeti's
vocabulary with 10G root, 2G swap, 20G spare on SSD, 800G on HDD, 8GB
ram and 2 CPU cores:
-o debootstrap+bullseye \
-t drbd --no-wait-for-sync \
--no-ip-check \
--no-name-check \
--disk 1:size=2G,name=swap \
--disk 2:size=20G \
--disk 3:size=800G,vg=vg_ganeti_hdd \
--backend-parameters memory=8g,vcpus=2 \
This configures the following:
* redundant disks in a DRBD mirror, use `-t plain` instead of `-t drbd` for
tests as that avoids syncing of disks and will speed things up considerably
(even with `--no-wait-for-sync` there are some operations that block on
synced mirrors). Only one node should be provided as the argument for
`--node` then.
* three partitions: one on the default VG (SSD), one on another (HDD)
and a swap file on the default VG, if you don't specify a swap device,
a 512MB swapfile is created in `/swapfile`. TODO: configure disk 2
and 3 automatically in installer. (`/var` and `/srv`?)
* an IP allocated from the public gnt-fsn pool:
`gnt-instance add` will print the IPv4 address it picked to stdout. The
IPv6 address can be found in `/var/log/ganeti/os/` on the primary node
of the instance, see below.
To find the root password, ssh host key fingerprints, and the IPv6
address, run this **on the node where the instance was created**, for
example:
egrep 'root password|configured eth0 with|SHA256' $(ls -tr /var/log/ganeti/os/* | tail -1) | grep -v $(hostname)
We copy root's authorized keys into the new instance, so you should be able to
log in with your token. You will be required to change the root password immediately.
Pick something nice and document it in `tor-passwords`.
Also set reverse DNS for both IPv4 and IPv6 in [hetzner's robot](https://robot.your-server.de/)
(Chek under servers -> vSwitch -> IPs) or in our own reverse zone
files (if delegated).
Then follow [howto/new-machine](howto/new-machine).
### Known issues
* **allocator failures**: Note that you may need to use the `--node`
parameter to pick on which machines you want the machine to end up,
otherwise Ganeti will choose for you (and may fail). Use, for
example, `--node fsn-node-01:fsn-node-02` to use `node-01` as
primary and `node-02` as secondary. The allocator can sometimes
fail if the allocator is upset about something in the cluster, for
example:
Can's find primary node using iallocator hail: Request failed: No valid allocation solutions, failure reasons: FailMem: 2, FailN1: 2
This situation is covered by [ticket 33785](https://bugs.torproject.org/33785). If this problem
occurs, it might be worth [rebalancing the cluster](#rebalancing-a-cluster).
* **ping failure**: there is a bug in `ganeti-instance-debootstrap`
which misconfigures `ping` (among other things), see [bug
31781](https://bugs.torproject.org/31781). It's currently patched in our version of the Debian
package, but that patch might disappear if Debian upgrade the
package without [shipping our patch](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944538). Note that this was fixed
in Debian bullseye and later.
### Other examples
This is the same without the HDD partition, in the `gnt-chi` cluster:
gnt-instance add \
-o debootstrap+bullseye \
-t drbd --no-wait-for-sync \
--net 0:ip=pool,network=gnt-chi-01 \
--no-ip-check \
--no-name-check \
--disk 0:size=10G \
--disk 1:size=2G,name=swap \
--disk 2:size=20G \
--backend-parameters memory=8g,vcpus=2 \
test-01.torproject.org
A simple test machine, with only 1G of disk, ram, and 1 CPU, without
DRBD, in the FSN cluster:
gnt-instance add \
-o debootstrap+bullseye \
-t plain --no-wait-for-sync \
--net 0:ip=pool,network=gnt-fsn13-02 \
--no-ip-check \
--no-name-check \
--disk 0:size=10G \
--disk 1:size=2G,name=swap \
--backend-parameters memory=1g,vcpus=1 \
test-01.torproject.org
Do not forget to follow the [next steps](#next-steps), above.
### iSCSI integration
To create a VM with iSCSI backing, a disk must first be created on the
SAN, then adopted in a VM, which needs to be *reinstalled* on top of
that. This is typical how large disks are provisionned in the
`gnt-chi` cluster, in the [Cymru POP](howto/new-machine-cymru).
The following instructions assume you are on a node with an [iSCSI
initiator properly setup](howto/new-machine-cymru#iscsi-initiator-setup), and the [SAN cluster management tools
setup](howto/new-machine-cymru#san-management-tools-setup). It also assumes you are familiar with the `SMcli` tool, see
the [storage servers documentation](howto/new-machine-cymru#storage-servers) for an introduction on that.
This assumes you are creating a 500GB VM, partitioned on the Linux
host, *not* on the iSCSI volume. TODO: change those instructions to
create one volume per partition, so that those can be resized more
1. create the disk on the SAN and assign it to the host group:
puppet agent --disable "creating a SAN disk"
$EDITOR /usr/local/sbin/tpo-create-san-disks
/usr/local/sbin/tpo-create-san-disks
puppet agent --enable
WARNING: the above script needs to be edited before it does the
right thing. It will show the LUN numbers in use below. This,
obviously, is not ideal, and should be replaced by a Ganeti
external storage provider.
NOTE: the `logicalUnitNumber` here must be an increment from the
previous highest LUN. See also the [disk creation instructions](howto/new-machine-cymru#creating-a-disk)
for a discussion.
2. configure the disk on all Ganeti nodes, in Puppet's
`profile::ganeti::chi` class:
iscsi::multipath::alias { 'web-chi-03':
wwid => '36782bcb00063c6a500000d67603f7abf',
}
3. propagate the magic to all nodes in the cluster:
gnt-cluster command "puppet agent -t ; iscsiadm -m node --rescan ; multipath -r"
4. confirm that multipath works, it should look something like this":
root@chi-node-01:~# multipath -ll
web-chi-03-srv (36782bcb00063c6a500000d67603f7abf) dm-20 DELL,MD32xxi
size=500G features='5 queue_if_no_path pg_init_retries 50 queue_mode mq' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| |- 11:0:0:4 sdi 8:128 active ready running
| |- 12:0:0:4 sdj 8:144 active ready running
| `- 9:0:0:4 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
|- 10:0:0:4 sdk 8:160 active ghost running
|- 7:0:0:4 sdl 8:176 active ghost running
`- 8:0:0:4 sdm 8:192 active ghost running
root@chi-node-01:~#
and the device `/dev/mapper/web-chi-03` should exist.
6. adopt the disks in Ganeti:
gnt-instance add \
-n chi-node-04.torproject.org \
-o debootstrap+bullseye \
-t blockdev --no-wait-for-sync \
--net 0:ip=pool,network=gnt-chi-01 \
--no-ip-check \
--no-name-check \
--disk 0:adopt=/dev/disk/by-id/dm-name-tb-build-03-root \
--disk 1:adopt=/dev/disk/by-id/dm-name-tb-build-03-swap,name=swap \
--disk 2:adopt=/dev/disk/by-id/dm-name-tb-build-03-srv \
--backend-parameters memory=16g,vcpus=8 \
NOTE: the actual node must be manually picked because the `hail`
allocator doesn't seem to know about block devices.
7. at this point, the VM probably doesn't boot, because for some
reason the `gnt-instance-debootstrap` doesn't fire when disks are
adopted. so you need to reinstall the machine, which involves
stopping it first:
gnt-instance shutdown --timeout=0 tb-build-03
gnt-instance reinstall tb-build-03
HACK: the current installer fails on weird partionning errors, see
[upstream bug 13](https://github.com/ganeti/instance-debootstrap/issues/13). We applied [patch 14](https://github.com/ganeti/instance-debootstrap/pull/14) on `chi-node-04`
and sent it upstream for review before committing to maintaining
this in Debian or elsewhere. It should be tested on other installs
beforehand as well.
From here on, follow the [next steps](#next-steps) above.
TODO: This would ideally be automated by an external storage provider,
see the [storage reference for more information](#storage).
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
### Troubleshooting
If a Ganeti instance install fails, it will show the end of the
install log, for example:
```
Thu Aug 26 14:11:09 2021 - INFO: Selected nodes for instance tb-pkgstage-01.torproject.org via iallocator hail: chi-node-02.torproject.org, chi-node-01.torproject.org
Thu Aug 26 14:11:09 2021 - INFO: NIC/0 inherits netparams ['br0', 'bridged', '']
Thu Aug 26 14:11:09 2021 - INFO: Chose IP 38.229.82.29 from network gnt-chi-01
Thu Aug 26 14:11:10 2021 * creating instance disks...
Thu Aug 26 14:12:58 2021 adding instance tb-pkgstage-01.torproject.org to cluster config
Thu Aug 26 14:12:58 2021 adding disks to cluster config
Thu Aug 26 14:13:00 2021 * checking mirrors status
Thu Aug 26 14:13:01 2021 - INFO: - device disk/0: 30.90% done, 3m 32s remaining (estimated)
Thu Aug 26 14:13:01 2021 - INFO: - device disk/2: 0.60% done, 55m 26s remaining (estimated)
Thu Aug 26 14:13:01 2021 * checking mirrors status
Thu Aug 26 14:13:02 2021 - INFO: - device disk/0: 31.20% done, 3m 40s remaining (estimated)
Thu Aug 26 14:13:02 2021 - INFO: - device disk/2: 0.60% done, 52m 13s remaining (estimated)
Thu Aug 26 14:13:02 2021 * pausing disk sync to install instance OS
Thu Aug 26 14:13:03 2021 * running the instance OS create scripts...
Thu Aug 26 14:16:31 2021 * resuming disk sync
Failure: command execution error:
Could not add os for instance tb-pkgstage-01.torproject.org on node chi-node-02.torproject.org: OS create script failed (exited with exit code 1), last lines in the log file:
Setting up openssh-sftp-server (1:7.9p1-10+deb10u2) ...
Setting up openssh-server (1:7.9p1-10+deb10u2) ...
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:ZTeMxYSUDTkhUUeOpDWpbuOzEAzOaehIHW/lJarOIQo root@chi-node-02 (RSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:MWKeA8vJKkEG4TW+FbG2AkupiuyFFyoVWNVwO2WG0wg root@chi-node-02 (ED25519)
Created symlink /etc/systemd/system/sshd.service \xe2\x86\x92 /lib/systemd/system/ssh.service.
Created symlink /etc/systemd/system/multi-user.target.wants/ssh.service \xe2\x86\x92 /lib/systemd/system/ssh.service.
invoke-rc.d: could not determine current runlevel
Setting up ssh (1:7.9p1-10+deb10u2) ...
Processing triggers for systemd (241-7~deb10u8) ...
Processing triggers for libc-bin (2.28-10) ...
Errors were encountered while processing:
linux-image-4.19.0-17-amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)
run-parts: /etc/ganeti/instance-debootstrap/hooks/ssh exited with return code 100
Using disk /dev/drbd4 as swap...
Setting up swapspace version 1, size = 2 GiB (2147479552 bytes)
no label, UUID=96111754-c57d-43f2-83d0-8e1c8b4688b4
Not using disk 2 (/dev/drbd5) because it is not named 'swap' (name: )
root@chi-node-01:~#
```
Here the failure which tripped the install is:
```
Errors were encountered while processing:
linux-image-4.19.0-17-amd64
E: Sub-process /usr/bin/dpkg returned an error code (1)
```
But the actual error is higher up, and we need to go look at the logs
on the server for this, in this case in
`chi-node-02:/var/log/ganeti/os/add-debootstrap+buster-tb-pkgstage-01.torproject.org-2021-08-26_14_13_04.log`,
we can find the real problem:
```
Setting up linux-image-4.19.0-17-amd64 (4.19.194-3) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.19.0-17-amd64
W: Couldn't identify type of root file system for fsck hook
/etc/kernel/postinst.d/zz-update-grub:
/usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?).
run-parts: /etc/kernel/postinst.d/zz-update-grub exited with return code 1
dpkg: error processing package linux-image-4.19.0-17-amd64 (--configure):
installed linux-image-4.19.0-17-amd64 package post-installation script subprocess returned error exit status 1
```
In this case, oddly enough, even though Ganeti thought the install had
failed, the machine can actually start:
```
gnt-instance start tb-pkgstage-01.torproject.org
```
... and after a while, we can even get a console:
```
gnt-instance start tb-pkgstage-01.torproject.org
```
And in *that* case, the procedure can just continue from here on:
reset the root password, and just make sure you finish the install:
```
apt install linux-image-amd64
```
In the above case, the `sources-list` post-install hook was buggy: it
wasn't mounting `/dev` and friends before launching the upgrades,
which was causing issues when a kernel upgrade was queued.
And *if* you are debugging an installer and by mistake end up with
half-open filesystems and stray DRBD devices, do take a look at the
[LVM](howto/lvm) and [DRBD documentation](howto/drbd).
### CPU, memory changes
It's possible to change the IP, CPU, or memory allocation of an instance
using the [gnt-instance modify](http://docs.ganeti.org/ganeti/2.15/man/gnt-instance.html#modify) command:
gnt-instance modify -B vcpus=4 test1.torproject.org
gnt-instance modify -B memory=8g test1.torproject.org
IP address changes require a full stop and will require manual changes
to the `/etc/network/interfaces*` files:
gnt-instance modify --net 0:modify,ip=116.202.120.175 test1.torproject.org
gnt-instance stop test1.torproject.org
gnt-instance start test1.torproject.org
gnt-instance console test1.torproject.org
The [gnt-instance grow-disk](http://docs.ganeti.org/ganeti/2.15/man/gnt-instance.html#grow-disk) command can be used to change the size
of the underlying device:
gnt-instance grow-disk --absolute test1.torproject.org 0 16g
gnt-instance reboot test1.torproject.org
The number `0` in this context, indicates the first disk of the
instance. The amount specified is the final disk size (because of the
`--absolute` flag). In the above example, the final disk size will be
16GB. To *add* space to the existing disk, remove the `--absolute`
flag:
gnt-instance grow-disk test1.torproject.org 0 16g
gnt-instance reboot test1.torproject.org
In the above example, 16GB will be **ADDED** to the disk. Be careful
with resizes, because it's not possible to revert such a change:
`grow-disk` does support shrinking disks. The only way to revert the
change is by exporting / importing the instance.
Note the reboot, above, will impose a downtime. See [upstream bug
28](https://github.com/ganeti/ganeti/issues/28) about improving that.
Then the filesystem needs to be resized inside the VM:
#### Resizing under LVM
Use `pvs` to display information about the physical volumes:
root@cupani:~# pvs
PV VG Fmt Attr PSize PFree
/dev/sdc vg_test lvm2 a-- <8.00g 1020.00m
Resize the physical volume to take up the new space:
pvresize /dev/sdc
Use `lvs` to display information about logical volumes:
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
var-opt vg_test-01 -wi-ao---- <10.00g
test-backup vg_test-01_hdd -wi-ao---- <20.00g
Use lvextend to add space to the volume:
lvextend -l '+100%FREE' vg_test-01/var-opt
Finally resize the filesystem:
resize2fs /dev/vg_test-01/var-opt
#### Resizing without LVM
If there's no LVM inside the VM (a more common configuration
nowadays), the above procedure will obviously not work.
You might need to resize the partition manually, which can be done
using fdisk. In the following example we have a `sda1` partition that
we want to extend from 10G to 20G to fill up the free space on
`/dev/sda`. Here is what the partition layout looks like before the resize:
```
# lsblk -a
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 10G 0 part /
sdb 8:16 0 2G 0 disk [SWAP]
sdc 8:32 0 40G 0 disk /srv
```
If `sdc` is the resized disk, the kernel might not have noticed the
size change, and you might need to kick it. There might be easier
ways, but a reboot would sure do it:
reboot
And in that case, the partition is *already* resized, so you do not
need to go through the `fdisk` process below and jump straight to the
last `resize2fs` step.
We use fdisk on the device:
```
# fdisk /dev/sda
Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): p # prints the partition table
Disk /dev/sda: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x73ab5f76
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 20971519 20969472 10G 83 Linux # note the starting sector for later
```
Now we delete the partition. Note that the data will not be deleted, only the partition table will be altered:
```
Command (m for help): d
Selected partition 1
Partition 1 has been deleted.
Command (m for help): p
Disk /dev/sda: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x73ab5f76
```
Now we create the new partition to take up the whole space:
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
```
Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-41943039, default 2048): 2048 # this is the starting sector from above.
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-41943039, default 41943039): 41943039
Created a new partition 1 of type 'Linux' and of size 20 GiB.
Partition #1 contains a ext4 signature.
Do you want to remove the signature? [Y]es/[N]o: n # we want to keep the previous signature
Command (m for help): p
Disk /dev/sda: 20 GiB, 21474836480 bytes, 41943040 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x73ab5f76
Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 41943039 41940992 20G 83 Linux
Command (m for help): w
The partition table has been altered.
Syncing disks.
```
Now we check the partitions:
```
# lsblk -a
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sdb 8:16 0 2G 0 disk [SWAP]
sdc 8:32 0 40G 0 disk /srv
```
If we check the free disk space on the device we will notice it has not changed yet:
```
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 9.8G 8.5G 874M 91% /
```
We need to resize it:
```
# resize2fs /dev/sda1
resize2fs 1.44.5 (15-Dec-2018)
Filesystem at /dev/sda1 is mounted on /; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 3
The filesystem on /dev/sda1 is now 5242624 (4k) blocks long.
```
The resize is now complete.
### Adding disks
A disk can be added to an instance with the `modify` command as
well. This, for example, will add a 100GB disk to the `test1` instance
on teh `vg_ganeti_hdd` volume group, which is "slow" rotating disks:
gnt-instance modify --disk add:size=100g,vg=vg_ganeti_hdd test1.torproject.org
gnt-instance reboot test1.torproject.org
### Changing disk type
If you have, say, a test instance that was created with a `plain` disk
template but we actually want it in production, with a `drbd` disk
template. Switching to `drbd` is easy:
gnt-instance shutdown test-01
gnt-instance modify -t drbd test-01
gnt-instance start test-01
The second command will use the allocator to find a secondary node. If
that fails, you can assign a node manually with `-n`.
You can also switch back to `plain`, although you should generally
never do that.
See also the [upstream procedure](https://docs.ganeti.org/docs/ganeti/3.0/html/admin.html#conversion-of-an-instance-s-disk-type) and [design document](https://docs.ganeti.org/docs/ganeti/3.0/html/design-disk-conversion.html).
### Adding a network interface on the rfc1918 vlan
We have a vlan that some VMs that do not have public addresses sit on.
Its vlanid is 4002 and its backed by Hetzner vswitch vSwitch #11973 "fsn-gnt-rfc1918-traffic".
Note that traffic on this vlan will travel in the clear between nodes.
To add an instance to this vlan, give it a second network interface using
gnt-instance modify --net add:link=br0,vlan=4002,mode=openvswitch test1.torproject.org
This totally deletes the instance, including all mirrors and
everything, be very careful with it:
gnt-instance remove test01.torproject.org
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
## Getting information
Information about an instances can be found in the rather verbose
`gnt-instance info`:
root@fsn-node-01:~# gnt-instance info tb-build-02.torproject.org
- Instance name: tb-build-02.torproject.org
UUID: 8e9f3ca6-204f-4b6c-8e3e-6a8fda137c9b
Serial number: 5
Creation time: 2020-12-15 14:06:41
Modification time: 2020-12-15 14:07:31
State: configured to be up, actual state is up
Nodes:
- primary: fsn-node-03.torproject.org
group: default (UUID 8c32fd09-dc4c-4237-9dd2-3da3dfd3189e)
- secondaries: fsn-node-04.torproject.org (group default, group UUID 8c32fd09-dc4c-4237-9dd2-3da3dfd3189e)
Operating system: debootstrap+buster
A quick command that can be done is this, which shows the
primary/secondary for a given instance:
gnt-instance info tb-build-02.torproject.org | grep -A 3 Nodes
An equivalent command will show the primary and secondary for *all*
instances, on top of extra information (like the CPU count, memory and
disk usage):
gnt-instance list -o pnode,snodes,name,be/vcpus,be/memory,disk_usage,disk_template,status | sort
It can be useful to run this in a loop to see changes:
watch -n5 -d 'gnt-instance list -o pnode,snodes,name,be/vcpus,be/memory,disk_usage,disk_template,status | sort'
Instances should be setup using the DRBD backend, in which case you
should probably take a look at [howto/drbd](howto/drbd) if you have problems with
that. Ganeti handles most of the logic there so that should generally
not be necessary.
## Evaluating cluster capacity
This will list instances repeatedly, but also show their assigned
memory, and compare it with the node's capacity:
gnt-instance list -o pnode,name,be/vcpus,be/memory,disk_usage,disk_template,status | sort &&
echo &&
gnt-node list
The latter does not show disk usage for secondary volume groups (see
[upstream issue 1379](https://github.com/ganeti/ganeti/issues/1379)), for a complete picture of disk usage, use:
gnt-node list-storage
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
The [gnt-cluster verify](http://docs.ganeti.org/ganeti/2.15/man/gnt-cluster.html#verify) command will also check to see if there's
enough space on secondaries to account for the failure of a
node. Healthy output looks like this:
root@fsn-node-01:~# gnt-cluster verify
Submitted jobs 48030, 48031
Waiting for job 48030 ...
Fri Jan 17 20:05:42 2020 * Verifying cluster config
Fri Jan 17 20:05:42 2020 * Verifying cluster certificate files
Fri Jan 17 20:05:42 2020 * Verifying hypervisor parameters
Fri Jan 17 20:05:42 2020 * Verifying all nodes belong to an existing group
Waiting for job 48031 ...
Fri Jan 17 20:05:42 2020 * Verifying group 'default'
Fri Jan 17 20:05:42 2020 * Gathering data (2 nodes)
Fri Jan 17 20:05:42 2020 * Gathering information about nodes (2 nodes)
Fri Jan 17 20:05:45 2020 * Gathering disk information (2 nodes)
Fri Jan 17 20:05:45 2020 * Verifying configuration file consistency
Fri Jan 17 20:05:45 2020 * Verifying node status
Fri Jan 17 20:05:45 2020 * Verifying instance status
Fri Jan 17 20:05:45 2020 * Verifying orphan volumes
Fri Jan 17 20:05:45 2020 * Verifying N+1 Memory redundancy
Fri Jan 17 20:05:45 2020 * Other Notes
Fri Jan 17 20:05:45 2020 * Hooks Results
A sick node would have said something like this instead:
Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy
Mon Oct 26 18:59:37 2009 - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail
See the [ganeti manual](http://docs.ganeti.org/ganeti/2.15/html/walkthrough.html#n-1-errors) for a more extensive example
Also note the `hspace -L` command, which can tell you how many
instances can be created in a given cluster. It uses the "standard"
instance template defined in the cluster (which we haven't configured
yet).
Ganeti is smart about assigning instances to nodes. There's also a
command (`hbal`) to automatically rebalance the cluster (see
below). If for some reason `hbal` doesn’t do what you want or you need
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
to move things around for other reasons, here are a few commands that
might be handy.
Make an instance switch to using it's secondary:
gnt-instance migrate test1.torproject.org
Make all instances on a node switch to their secondaries:
gnt-node migrate test1.torproject.org
The `migrate` commands does a "live" migrate which should avoid any
downtime during the migration. It might be preferable to actually
shutdown the machine for some reason (for example if we actually want
to reboot because of a security upgrade). Or we might not be able to
live-migrate because the node is down. In this case, we do a
[failover](http://docs.ganeti.org/ganeti/2.15/html/admin.html#failing-over-an-instance)
gnt-instance failover test1.torproject.org
The [gnt-node evacuate](http://docs.ganeti.org/ganeti/2.15/man/gnt-node.html#evacuate) command can also be used to "empty" a given
node altogether, in case of an emergency:
gnt-node evacuate -I . fsn-node-02.torproject.org
Similarly, the [gnt-node failover](http://docs.ganeti.org/ganeti/2.15/man/gnt-node.html#failover) command can be used to
hard-recover from a completely crashed node:
gnt-node failover fsn-node-02.torproject.org
Note that you might need the `--ignore-consistency` flag if the
node is unresponsive.
Assumptions:
* `INSTANCE`: name of the instance being migrated, the "old" one
being outside the cluster and the "new" one being the one created
inside the cluster (e.g. `chiwui.torproject.org`)
* `SPARE_NODE`: a ganeti node with free space
(e.g. `fsn-node-03.torproject.org`) where the `INSTANCE` will be
migrated
* `MASTER_NODE`: the master ganeti node
(e.g. `fsn-node-01.torproject.org`)
* `KVM_HOST`: the machine which we migrate the `INSTANCE` from
* the `INSTANCE` has only `root` and `swap` partitions
* the `SPARE_NODE` has space in `/srv/` to host all the virtual
machines to import, to check, use:
fab -H crm-ext-01.torproject.org,crm-int-01.torproject.org,forrestii.torproject.org,nevii.torproject.org,rude.torproject.org,troodi.torproject.org,vineale.torproject.org libvirt.du -p kvm3.torproject.org | sed '/-swap$/d;s/ .*$//' <f | awk '{s+=$1} END {print s}'
You will very likely need to create a `/srv` big enough for this,
for example:
lvcreate -L 300G vg_ganeti -n srv-tmp &&
mkfs /dev/vg_ganeti/srv-tmp &&
mount /dev/vg_ganeti/srv-tmp /srv
Import procedure:
1. pick a viable SPARE NODE to import the INSTANCE (see "evaluating
cluster capacity" above, when in doubt) and find on which KVM HOST
the INSTANCE lives
2. copy the disks, without downtime:
./ganeti -v -H $INSTANCE libvirt-import --ganeti-node $SPARE_NODE --libvirt-host $KVM_HOST
3. copy the disks again, this time suspending the machine:
./ganeti -v -H $INSTANCE libvirt-import --ganeti-node $SPARE_NODE --libvirt-host $KVM_HOST --suspend --adopt
./ganeti -v -H $INSTANCE renumber-instance --ganeti-node $SPARE_NODE
5. test services by changing your `/etc/hosts`, possibly warning
service admins:
> Subject: $INSTANCE IP address change planned for Ganeti migration
>
> I will soon migrate this virtual machine to the new ganeti cluster. this
> will involve an IP address change which might affect the service.
>
> Please let me know if there are any problems you can think of. in
> particular, do let me know if any internal (inside the server) or external
> (outside the server) services hardcodes the IP address of the virtual
> machine.
>
> A test instance has been setup. You can test the service by
> adding the following to your /etc/hosts:
>
> 116.202.120.182 $INSTANCE
> 2a01:4f8:fff0:4f:266:37ff:fe32:cfb2 $INSTANCE
7. lower TTLs to 5 minutes. this procedure varies a lot according to
the service, but generally if all DNS entries are `CNAME`s
pointing to the main machine domain name, the TTL can be lowered
by adding a `dnsTTL` entry in the LDAP entry for this host. For
example, this sets the TTL to 5 minutes:
dnsTTL: 300
Then to make the changes immediate, you need the following
commands:
ssh root@alberti.torproject.org sudo -u sshdist ud-generate &&
ssh root@nevii.torproject.org ud-replicate
Warning: if you migrate one of the hosts ud-ldap depends on, this
can fail and not only the TTL will not update, but it might also
fail to update the IP address in the below procedure. See [ticket
33766](https://bugs.torproject.org/33766) for
8. shutdown original instance and redo migration as in step 3 and 4:
fab -H $INSTANCE reboot.halt-and-wait --delay-shutdown 60 --reason='migrating to new server' &&
./ganeti -v -H $INSTANCE libvirt-import --ganeti-node $SPARE_NODE --libvirt-host $KVM_HOST --adopt &&
./ganeti -v -H $INSTANCE renumber-instance --ganeti-node $SPARE_NODE
TODO: establish host-level test procedure and run it here.
10. switch to DRBD, still on the Ganeti MASTER NODE:
gnt-instance stop $INSTANCE &&
gnt-instance modify -t drbd $INSTANCE &&
The above can sometimes fail if the allocator is upset about
something in the cluster, for example:
Can's find secondary node using iallocator hail: Request failed: No valid allocation solutions, failure reasons: FailMem: 2, FailN1: 2
This situation is covered by [ticket 33785](https://bugs.torproject.org/33785). To work around the
allocator, you can specify a secondary node directly:
gnt-instance modify -t drbd -n fsn-node-04.torproject.org $INSTANCE &&
gnt-instance failover -f $INSTANCE &&
gnt-instance start $INSTANCE
TODO: move into fabric, maybe in a `libvirt-import-live` or
`post-libvirt-import` job that would also do the renumbering below
11. change IP address in the following locations:
* LDAP (`ipHostNumber` field, but also change the `physicalHost` and `l` fields!). Also drop the dnsTTL attribute while you're at it.
* Puppet (grep in tor-puppet source, run `puppet agent -t; ud-replicate` on pauli)
* DNS (grep in tor-dns source, `puppet agent -t; ud-replicate` on nevii)
* nagios (don't forget to change the parent)
* reverse DNS (upstream web UI, e.g. Hetzner Robot)
* grep for the host's IP address on itself:
grep -r -e 78.47.38.227 -e 2a01:4f8:fff0:4f:266:37ff:fe77:1ad8 /etc
grep -r -e 78.47.38.227 -e 2a01:4f8:fff0:4f:266:37ff:fe77:1ad8 /srv
* grep for the host's IP on *all* hosts:
cumin-all-puppet
cumin-all 'grep -r -e 78.47.38.227 -e 2a01:4f8:fff0:4f:266:37ff:fe77:1ad8 /etc'
12. retire old instance (only a tiny part of [howto/retire-a-host](howto/retire-a-host)):
./retire -H $INSTANCE retire-instance --parent-host $KVM_HOST
12. update the [Nextcloud spreadsheet](https://nc.torproject.net/apps/onlyoffice/5395) to remove the machine from
the KVM host
13. warn users about the migration, for example:
> To: tor-project@lists.torproject.org
> Subject: cupani AKA git-rw IP address changed
>
> The main git server, cupani, is the machine you connect to when you push
> or pull git repositories over ssh to git-rw.torproject.org. That
> machines has been migrated to the new Ganeti cluster.
>
> This required an IP address change from:
>
> 78.47.38.228 2a01:4f8:211:6e8:0:823:4:1
>
> to:
>
> 116.202.120.182 2a01:4f8:fff0:4f:266:37ff:fe32:cfb2
>
> DNS has been updated and preliminary tests show that everything is
> mostly working. You *will* get a warning about the IP address change
> when connecting over SSH, which will go away after the first
> connection.
>
> Warning: Permanently added the ED25519 host key for IP address '116.202.120.182' to the list of known hosts.
>
> That is normal. The SSH fingerprints of the host did *not* change.
>
> Please do report any other anomaly using the normal channels:
>
> https://gitlab.torproject.org/tpo/tpa/team/-/wikis/support
>
> The service was unavailable for about an hour during the migration.
## Importing external libvirt instances, manual
This procedure is now easier to accomplish with the Fabric tools
written especially for this purpose. Use the above procedure
instead. This is kept for historical reference.
Assumptions:
* `INSTANCE`: name of the instance being migrated, the "old" one
being outside the cluster and the "new" one being the one created
inside the cluster (e.g. `chiwui.torproject.org`)
* `SPARE_NODE`: a ganeti node with free space
(e.g. `fsn-node-03.torproject.org`) where the `INSTANCE` will be
migrated
* `MASTER_NODE`: the master ganeti node
(e.g. `fsn-node-01.torproject.org`)
* `KVM_HOST`: the machine which we migrate the `INSTANCE` from
* the `INSTANCE` has only `root` and `swap` partitions
1. pick a viable SPARE NODE to import the instance (see "evaluating
cluster capacity" above, when in doubt), login to the three
servers, setting the proper environment everywhere, for example:
MASTER_NODE=fsn-node-01.torproject.org
SPARE_NODE=fsn-node-03.torproject.org
KVM_HOST=kvm1.torproject.org
INSTANCE=test.torproject.org
echo "$(qemu-img info --output=json $disk | jq '."virtual-size"') / 1024 / 1024 / 1024" | bc -l
sed -n '/<vcpu/{s/[^>]*>//;s/<.*//;p}' < /etc/libvirt/qemu/$INSTANCE.xml
* memory, assuming from KiB to GiB:
echo "$(sed -n '/<memory/{s/[^>]*>//;s/<.*//;p}' < /etc/libvirt/qemu/$INSTANCE.xml) /1024 /1024" | bc -l
TODO: make sure the memory line is in KiB and that the number
makes sense.
* on the INSTANCE, find the swap device UUID so we can recreate it later:
3. setup a copy channel, on the SPARE NODE:
ssh-agent bash
ssh-add /etc/ssh/ssh_host_ed25519_key
cat /etc/ssh/ssh_host_ed25519_key.pub
4. copy the `.qcow` file(s) over, from the KVM HOST to the SPARE NODE:
rsync -P $KVM_HOST:/srv/vmstore/$INSTANCE/$INSTANCE-root /srv/
rsync -P $KVM_HOST:/srv/vmstore/$INSTANCE/$INSTANCE-lvm /srv/ || true
Note: it's possible there is not enough room in `/srv`: in the
base Ganeti installs, everything is in the same root partition
(`/`) which will fill up if the instance is (say) over ~30GiB. In
that case, create a filesystem in `/srv`:
(mkdir /root/srv && mv /srv/* /root/srv true) || true &&
lvcreate -L 200G vg_ganeti -n srv &&
mkfs /dev/vg_ganeti/srv &&
echo "/dev/vg_ganeti/srv /srv ext4 rw,noatime,errors=remount-ro 0 2" >> /etc/fstab &&
mount /srv &&
( mv /root/srv/* ; rmdir /root/srv )
This partition can be reclaimed once the VM migrations are
completed, as it needlessly takes up space on the node.