Skip to content
Snippets Groups Projects
ganeti.md 71.51 KiB

Ganeti is software designed to facilitate the management of virtual machines (KVM or Xen). It helps you move virtual machine instances from one node to another, create an instance with DRBD replication on another node and do the live migration from one to another, etc.

Tutorial

Listing virtual machines (instances)

This will show the running guests, known as "instances":

gnt-instance list

Accessing serial console

Our instances do serial console, starting in grub. To access it, run

gnt-instance console test01.torproject.org

To exit, use ^] -- that is, Control-<Closing Bracket>.

How-to

Glossary

In Ganeti, a physical machine is called a node and a virtual machine is an instance. A node is elected to be the master where all commands should be ran from. Nodes are interconnected through a private network that is used to communicate commands and synchronise disks (with howto/drbd). Instances are normally assigned two nodes: a primary and a secondary: the primary is where the virtual machine actually runs and th secondary acts as a hot failover.

See also the more extensive glossary in the Ganeti documentation.

Adding a new instance

This command creates a new guest, or "instance" in Ganeti's vocabulary with 10G root, 2G swap, 20G spare on SSD, 800G on HDD, 8GB ram and 2 CPU cores:

gnt-instance add \
  -o debootstrap+buster \
  -t drbd --no-wait-for-sync \
  --net 0:ip=pool,network=gnt-fsn13-02 \
  --no-ip-check \
  --no-name-check \
  --disk 0:size=10G \
  --disk 1:size=2G,name=swap \
  --disk 2:size=20G \
  --disk 3:size=800G,vg=vg_ganeti_hdd \
  --backend-parameters memory=8g,vcpus=2 \
  test-01.torproject.org

What that does

This configures the following:

  • redundant disks in a DRBD mirror, use -t plain instead of -t drbd for tests as that avoids syncing of disks and will speed things up considerably (even with --no-wait-for-sync there are some operations that block on synced mirrors). Only one node should be provided as the argument for --node then.
  • three partitions: one on the default VG (SSD), one on another (HDD) and a swap file on the default VG, if you don't specify a swap device, a 512MB swapfile is created in /swapfile. TODO: configure disk 2 and 3 automatically in installer. (/var and /srv?)
  • 8GB of RAM with 2 virtual CPUs
  • an IP allocated from the public gnt-fsn pool: gnt-instance add will print the IPv4 address it picked to stdout. The IPv6 address can be found in /var/log/ganeti/os/ on the primary node of the instance, see below.
  • with the test-01.torproject.org hostname

Next steps

To find the root password, ssh host key fingerprints, and the IPv6 address, run this on the node where the instance was created, for example:

egrep 'root password|configured eth0 with|SHA256' $(ls -tr /var/log/ganeti/os/* | tail -1) | grep -v $(hostname)

We copy root's authorized keys into the new instance, so you should be able to log in with your token. You will be required to change the root password immediately. Pick something nice and document it in tor-passwords.

Also set reverse DNS for both IPv4 and IPv6 in hetzner's robot (Chek under servers -> vSwitch -> IPs) or in our own reverse zone files (if delegated).

Then follow howto/new-machine.

Known issues

  • usrmerge: that procedure creates a machine with usrmerge! See bug 34115 before proceeding.

  • allocator failures: Note that you may need to use the --node parameter to pick on which machines you want the machine to end up, otherwise Ganeti will choose for you (and may fail). Use, for example, --node fsn-node-01:fsn-node-02 to use node-01 as primary and node-02 as secondary. The allocator can sometimes fail if the allocator is upset about something in the cluster, for example:

     Can's find primary node using iallocator hail: Request failed: No valid allocation solutions, failure reasons: FailMem: 2, FailN1: 2

    This situation is covered by ticket 33785. If this problem occurs, it might be worth rebalancing the cluster.

  • ping failure: there is a bug in ganeti-instance-debootstrap which misconfigures ping (among other things), see bug 31781. It's currently patched in our version of the Debian package, but that patch might disappear if Debian upgrade the package without shipping our patch.

Other examples

This is the same without the HDD partition, in the gnt-chi cluster:

gnt-instance add \
  -o debootstrap+buster \
  -t drbd --no-wait-for-sync \
  --net 0:ip=pool,network=gnt-chi-01 \
  --no-ip-check \
  --no-name-check \
  --disk 0:size=10G \
  --disk 1:size=2G,name=swap \
  --disk 2:size=20G \
  --backend-parameters memory=8g,vcpus=2 \
  test-01.torproject.org

A simple test machine, with only 1G of disk, ram, and 1 CPU, without DRBD, in the FSN cluster:

gnt-instance add \
      -o debootstrap+buster \
      -t plain --no-wait-for-sync \
      --net 0:ip=pool,network=gnt-fsn13-02 \
      --no-ip-check \
      --no-name-check \
      --disk 0:size=10G \
      --disk 1:size=2G,name=swap \
      --backend-parameters memory=1g,vcpus=1 \
      test-01.torproject.org

Modifying an instance

CPU, memory changes

It's possible to change the IP, CPU, or memory allocation of an instance using the gnt-instance modify command:

gnt-instance modify -B vcpus=2 test1.torproject.org
gnt-instance modify -B memory=4g test1.torproject.org
gnt-instance reboot test1.torproject.org

IP address change

IP address changes require a full stop and will require manual changes to the /etc/network/interfaces* files:

gnt-instance modify --net 0:modify,ip=116.202.120.175 test1.torproject.org
gnt-instance stop test1.torproject.org
gnt-instance start test1.torproject.org
gnt-instance console test1.torproject.org

Resizing disks

The gnt-instance grow-disk command can be used to change the size of the underlying device:

gnt-instance grow-disk --absolute test1.torproject.org 0 16g
gnt-instance reboot test1.torproject.org

The number 0 in this context, indicates the first disk of the instance. The amount specified is the final disk size (because of the --absolute flag). In the above example, the final disk size will be 16GB. To add space to the existing disk, remove the --absolute flag:

gnt-instance grow-disk test1.torproject.org 0 16g
gnt-instance reboot test1.torproject.org

In the above example, 16GB will be ADDED to the disk. Be careful with resizes, because it's not possible to revert such a change: grow-disk does support shrinking disks. The only way to revert the change is by exporting / importing the instance.

Then the filesystem needs to be resized inside the VM:

ssh root@test1.torproject.org 

Use pvs to display information about the physical volumes:

root@cupani:~# pvs
PV         VG        Fmt  Attr PSize   PFree   
/dev/sdc   vg_test   lvm2 a--  <8.00g  1020.00m

Resize the physical volume to take up the new space:

pvresize /dev/sdc

Use lvs to display information about logical volumes:

# lvs
LV            VG               Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
var-opt       vg_test-01     -wi-ao---- <10.00g                                                    
test-backup vg_test-01_hdd   -wi-ao---- <20.00g            

Use lvextend to add space to the volume:

lvextend -l '+100%FREE' vg_test-01/var-opt

Finally resize the filesystem:

resize2fs /dev/vg_test-01/var-opt

See also the LVM howto.

Adding disks

A disk can be added to an instance with the modify command as well. This, for example, will add a 100GB disk to the test1 instance on teh vg_ganeti_hdd volume group, which is "slow" rotating disks:

gnt-instance modify --disk add:size=100g,vg=vg_ganeti_hdd test1.torproject.org
gnt-instance reboot test1.torproject.org

Adding a network interface on the rfc1918 vlan

We have a vlan that some VMs that do not have public addresses sit on. Its vlanid is 4002 and its backed by Hetzner vswitch vSwitch #11973 "fsn-gnt-rfc1918-traffic". Note that traffic on this vlan will travel in the clear between nodes.

To add an instance to this vlan, give it a second network interface using

gnt-instance modify --net add:link=br0,vlan=4002,mode=openvswitch test1.torproject.org

Destroying an instance

This totally deletes the instance, including all mirrors and everything, be very careful with it:

gnt-instance remove test01.torproject.org

Getting information