Skip to content
Snippets Groups Projects
ganeti.md 60.6 KiB
Newer Older
[Ganeti](http://ganeti.org/) is software designed to facilitate the management of
virtual machines (KVM or Xen). It helps you move virtual machine
instances from one node to another, create an instance with DRBD
replication on another node and do the live migration from one to
another, etc.

[[_TOC_]]
anarcat's avatar
anarcat committed

anarcat's avatar
anarcat committed
## Listing virtual machines (instances)
anarcat's avatar
anarcat committed

anarcat's avatar
anarcat committed
This will show the running guests, known as "instances":
anarcat's avatar
anarcat committed
## Accessing serial console
anarcat's avatar
anarcat committed
Our instances do serial console, starting in grub.  To access it, run

    gnt-instance console test01.torproject.org

To exit, use `^]` -- that is, Control-<Closing Bracket>.
# How-to
anarcat's avatar
anarcat committed
## Glossary

In Ganeti, a physical machine is called a *node* and a virtual machine
is an *instance*. A node is elected to be the *master* where all
commands should be ran from. Nodes are interconnected through a
private network that is used to communicate commands and synchronise
anarcat's avatar
anarcat committed
disks (with [howto/drbd](howto/drbd)). Instances are normally assigned two nodes: a
anarcat's avatar
anarcat committed
*primary* and a *secondary*: the *primary* is where the virtual
machine actually runs and th *secondary* acts as a hot failover.

anarcat's avatar
anarcat committed
See also the more extensive [glossary in the Ganeti documentation](http://docs.ganeti.org/ganeti/2.15/html/glossary.html).

## Adding a new instance

This command creates a new guest, or "instance" in Ganeti's
vocabulary with 10G root, 2G swap, 20G spare on SSD, 800G on HDD, 8GB
ram and 2 CPU cores:

    gnt-instance add \
      -o debootstrap+buster \
      -t drbd --no-wait-for-sync \
      --net 0:ip=pool,network=gnt-fsn \
      --no-ip-check \
      --no-name-check \
      --disk 0:size=10G \
      --disk 1:size=2G,name=swap \
      --disk 2:size=20G \
      --disk 3:size=800G,vg=vg_ganeti_hdd \
      --backend-parameters memory=8g,vcpus=2 \
      test-01.torproject.org
anarcat's avatar
anarcat committed
This is the same without the HDD partition, in the `gnt-chi` cluster:

    gnt-instance add \
      -o debootstrap+buster \
      -t drbd --no-wait-for-sync \
anarcat's avatar
anarcat committed
      --net 0:ip=pool,network=gnt-chi-01 \
      --no-ip-check \
      --no-name-check \
      --disk 0:size=10G \
      --disk 1:size=2G,name=swap \
      --disk 2:size=20G \
      --backend-parameters memory=8g,vcpus=2 \
      test-01.torproject.org

This configures the following:

 * redundant disks in a DRBD mirror, use `-t plain` instead of `-t drbd` for
   tests as that avoids syncing of disks and will speed things up considerably
   (even with `--no-wait-for-sync` there are some operations that block on
   synced mirrors).  Only one node should be provided as the argument for
   `--node` then.
 * three partitions: one on the default VG (SSD), one on another (HDD)
   and a swap file on the default VG, if you don't specify a swap device,
anarcat's avatar
anarcat committed
   a 512MB swapfile is created in `/swapfile`. TODO: configure disk 2
   and 3 automatically in installer. (`/var` and `/srv`?)
 * 8GB of RAM with 2 virtual CPUs
 * an IP allocated from the public gnt-fsn pool:
   `gnt-instance add` will print the IPv4 address it picked to stdout.  The
   IPv6 address can be found in `/var/log/ganeti/os/` on the primary node
   of the instance, see below.
 * with the `test-01.torproject.org` hostname
To find the root password, ssh host key fingerprints, and the IPv6
address, run this **on the node where the instance was created**, for
example:
    egrep 'root password|configured eth0 with|SHA256' $(ls -tr /var/log/ganeti/os/* | tail -1) | grep -v $(hostname)
We copy root's authorized keys into the new instance, so you should be able to
log in with your token.  You will be required to change the root password immediately.
Pick something nice and document it in `tor-passwords`.

anarcat's avatar
anarcat committed
Also set reverse DNS for both IPv4 and IPv6 in [hetzner's robot](https://robot.your-server.de/)
(Chek under servers -> vSwitch -> IPs) or in our own reverse zone
files (if delegated).

Then follow [howto/new-machine](howto/new-machine).

anarcat's avatar
anarcat committed
### Known issues

 * **usrmerge**: that procedure creates a machine with usrmerge! See
   [bug 34115](https://bugs.torproject.org/34115) before proceeding.
anarcat's avatar
anarcat committed
 * **allocator failures**: Note that you may need to use the `--node`
   parameter to pick on which machines you want the machine to end up,
   otherwise Ganeti will choose for you (and may fail). Use, for
   example, `--node fsn-node-01:fsn-node-02` to use `node-01` as
   primary and `node-02` as secondary. The allocator can sometimes
   fail if the allocator is upset about something in the cluster, for
   example:
anarcat's avatar
anarcat committed
        Can's find primary node using iallocator hail: Request failed: No valid allocation solutions, failure reasons: FailMem: 2, FailN1: 2

   This situation is covered by [ticket 33785](https://bugs.torproject.org/33785). If this problem
   occurs, it might be worth [rebalancing the cluster](#Rebalancing-a-cluster).

 * **ping failure**: there is a bug in `ganeti-instance-debootstrap`
   which misconfigures `ping` (among other things), see [bug
   31781](https://bugs.torproject.org/31781). It's currently patched in our version of the Debian
   package, but that patch might disappear if Debian upgrade the
   package without [shipping our patch](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944538).
anarcat's avatar
anarcat committed
## Modifying an instance

### CPU, memory changes

anarcat's avatar
anarcat committed
It's possible to change the IP, CPU, or memory allocation of an instance
using the [gnt-instance modify](http://docs.ganeti.org/ganeti/2.15/man/gnt-instance.html#modify) command:

    gnt-instance modify -B vcpus=2 test1.torproject.org
    gnt-instance modify -B memory=4g test1.torproject.org
    gnt-instance reboot test1.torproject.org

anarcat's avatar
anarcat committed
IP address changes require a full stop and will require manual changes
to the `/etc/network/interfaces*` files:

    gnt-instance modify --net 0:modify,ip=116.202.120.175 test1.torproject.org
    gnt-instance stop test1.torproject.org
    gnt-instance start test1.torproject.org
    gnt-instance console test1.torproject.org

anarcat's avatar
anarcat committed
The [gnt-instance grow-disk](http://docs.ganeti.org/ganeti/2.15/man/gnt-instance.html#grow-disk) command can be used to change the size
of the underlying device:

    gnt-instance grow-disk test1.torproject.org 0 16g
    gnt-instance reboot test1.torproject.org

The number `0` in this context, indicates the first disk of the
instance. Then the filesystem needs to be resized inside the VM:

Hiro's avatar
Hiro committed
    ssh root@test1.torproject.org 

Use lvs to display information about logical volumes:

    # lvs
    LV            VG               Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
    var-opt       vg_test-01     -wi-ao---- <10.00g                                                    
    test-backup vg_test-01_hdd   -wi-ao---- <20.00g            

Use lvextend to add space to the volume:
    # lvextend -l '+100%FREE' vg_test-01/var-opt

Finally resize the filesystem with:
    # resize2fs /dev/vg_test-01/var-opt
anarcat's avatar
anarcat committed

### Adding disks

A disk can be added to an instance with the `modify` command as
well. This, for example, will add a 100GB disk to the `test1` instance
on teh `vg_ganeti_hdd` volume group, which is "slow" rotating disks:

anarcat's avatar
anarcat committed
    gnt-instance modify --disk add:size=100g,vg=vg_ganeti_hdd test1.torproject.org
    gnt-instance reboot test1.torproject.org
### Adding a network interface on the rfc1918 vlan
Peter Palfrader's avatar
Peter Palfrader committed

We have a vlan that some VMs that do not have public addresses sit on.
Its vlanid is 4002 and its backed by Hetzner vswitch vSwitch #11973 "fsn-gnt-rfc1918-traffic".
Note that traffic on this vlan will travel in the clear between nodes.

To add an instance to this vlan, give it a second network interface using

    gnt-instance modify --net add:link=br0,vlan=4002,mode=openvswitch test1.torproject.org

## Destroying an instance

This totally deletes the instance, including all mirrors and
everything, be very careful with it:

    gnt-instance remove test01.torproject.org
anarcat's avatar
anarcat committed

## Disk operations (DRBD)
anarcat's avatar
anarcat committed

Instances should be setup using the DRBD backend, in which case you
anarcat's avatar
anarcat committed
should probably take a look at [howto/drbd](howto/drbd) if you have problems with
anarcat's avatar
anarcat committed
that. Ganeti handles most of the logic there so that should generally
not be necessary.
anarcat's avatar
anarcat committed
## Evaluating cluster capacity

This will list instances repeatedly, but also show their assigned
memory, and compare it with the node's capacity:

    watch -n5 -d 'gnt-instance list -o pnode,name,be/vcpus,be/memory,disk_usage,disk_template,status  |  sort; echo; gnt-node list'
anarcat's avatar
anarcat committed
The latter does not show disk usage for secondary volume groups (see
[upstream issue 1379](https://github.com/ganeti/ganeti/issues/1379)), for a complete picture of disk usage, use:
    gnt-node list-storage
anarcat's avatar
anarcat committed
The [gnt-cluster verify](http://docs.ganeti.org/ganeti/2.15/man/gnt-cluster.html#verify) command will also check to see if there's
enough space on secondaries to account for the failure of a
node. Healthy output looks like this:

    root@fsn-node-01:~# gnt-cluster verify
    Submitted jobs 48030, 48031
    Waiting for job 48030 ...
    Fri Jan 17 20:05:42 2020 * Verifying cluster config
    Fri Jan 17 20:05:42 2020 * Verifying cluster certificate files
    Fri Jan 17 20:05:42 2020 * Verifying hypervisor parameters
    Fri Jan 17 20:05:42 2020 * Verifying all nodes belong to an existing group
    Waiting for job 48031 ...
    Fri Jan 17 20:05:42 2020 * Verifying group 'default'
    Fri Jan 17 20:05:42 2020 * Gathering data (2 nodes)
    Fri Jan 17 20:05:42 2020 * Gathering information about nodes (2 nodes)
    Fri Jan 17 20:05:45 2020 * Gathering disk information (2 nodes)
    Fri Jan 17 20:05:45 2020 * Verifying configuration file consistency
    Fri Jan 17 20:05:45 2020 * Verifying node status
    Fri Jan 17 20:05:45 2020 * Verifying instance status
    Fri Jan 17 20:05:45 2020 * Verifying orphan volumes
    Fri Jan 17 20:05:45 2020 * Verifying N+1 Memory redundancy
    Fri Jan 17 20:05:45 2020 * Other Notes
    Fri Jan 17 20:05:45 2020 * Hooks Results

A sick node would have said something like this instead:

    Mon Oct 26 18:59:37 2009 * Verifying N+1 Memory redundancy
    Mon Oct 26 18:59:37 2009   - ERROR: node node2: not enough memory to accommodate instance failovers should node node1 fail

See the [ganeti manual](http://docs.ganeti.org/ganeti/2.15/html/walkthrough.html#n-1-errors) for a more extensive example

Also note the `hspace -L` command, which can tell you how many
instances can be created in a given cluster. It uses the "standard"
instance template defined in the cluster (which we haven't configured
yet).

anarcat's avatar
anarcat committed
## Moving instances and failover
anarcat's avatar
anarcat committed

Ganeti is smart about assigning instances to nodes. There's also a
command (`hbal`) to automatically rebalance the cluster (see
below). If for some reason hbal doesn’t do what you want or you need
to move things around for other reasons, here are a few commands that
might be handy.

Make an instance switch to using it's secondary:

    gnt-instance migrate test1.torproject.org

Make all instances on a node switch to their secondaries:

    gnt-node migrate test1.torproject.org

The `migrate` commands does a "live" migrate which should avoid any
downtime during the migration. It might be preferable to actually
shutdown the machine for some reason (for example if we actually want
to reboot because of a security upgrade). Or we might not be able to
live-migrate because the node is down. In this case, we do a
[failover](http://docs.ganeti.org/ganeti/2.15/html/admin.html#failing-over-an-instance)

    gnt-instance failover test1.torproject.org

The [gnt-node evacuate](http://docs.ganeti.org/ganeti/2.15/man/gnt-node.html#evacuate) command can also be used to "empty" a given
node altogether, in case of an emergency:

    gnt-node evacuate -I . fsn-node-02.torproject.org

Similarly, the [gnt-node failover](http://docs.ganeti.org/ganeti/2.15/man/gnt-node.html#failover) command can be used to
hard-recover from a completely crashed node:

    gnt-node failover fsn-node-02.torproject.org

Note that you might need the `--ignore-consistency` flag if the
node is unresponsive.

anarcat's avatar
anarcat committed
## Importing external instances
anarcat's avatar
anarcat committed

Assumptions:

 * `INSTANCE`: name of the instance being migrated, the "old" one
   being outside the cluster and the "new" one being the one created
   inside the cluster (e.g. `chiwui.torproject.org`)
 * `SPARE_NODE`: a ganeti node with free space
   (e.g. `fsn-node-03.torproject.org`) where the `INSTANCE` will be
   migrated
 * `MASTER_NODE`: the master ganeti node
   (e.g. `fsn-node-01.torproject.org`)
 * `KVM_HOST`: the machine which we migrate the `INSTANCE` from
 * the `INSTANCE` has only `root` and `swap` partitions
anarcat's avatar
anarcat committed
 * the `SPARE_NODE` has space in `/srv/` to host all the virtual
   machines to import, to check, use:

        fab -H crm-ext-01.torproject.org,crm-int-01.torproject.org,forrestii.torproject.org,nevii.torproject.org,rude.torproject.org,troodi.torproject.org,vineale.torproject.org libvirt.du -p kvm3.torproject.org | sed '/-swap$/d;s/ .*$//' <f | awk '{s+=$1} END {print s}'

   You will very likely need to create a `/srv` big enough for this,
   for example:

        lvcreate -L 300G vg_ganeti -n srv-tmp &&
        mkfs /dev/vg_ganeti/srv-tmp &&
        mount /dev/vg_ganeti/srv-tmp /srv
anarcat's avatar
anarcat committed

Import procedure:

 1. pick a viable SPARE NODE to import the INSTANCE (see "evaluating
    cluster capacity" above, when in doubt) and find on which KVM HOST
    the INSTANCE lives

 2. copy the disks, without downtime:
 
        ./ganeti -v -H $INSTANCE libvirt-import  --ganeti-node $SPARE_NODE --libvirt-host $KVM_HOST

 3. copy the disks again, this time suspending the machine:

        ./ganeti -v -H $INSTANCE libvirt-import  --ganeti-node $SPARE_NODE --libvirt-host $KVM_HOST --suspend --adopt

 4. renumber the host:
        ./ganeti -v -H $INSTANCE renumber-instance --ganeti-node $SPARE_NODE
 5. test services by changing your `/etc/hosts`, possibly warning
    service admins:

    > Subject: $INSTANCE IP address change planned for Ganeti migration
    >
    > I will soon migrate this virtual machine to the new ganeti cluster. this
    > will involve an IP address change which might affect the service.
    >
    > Please let me know if there are any problems you can think of. in
    > particular, do let me know if any internal (inside the server) or external
    > (outside the server) services hardcodes the IP address of the virtual
    > machine.
    >
    > A test instance has been setup. You can test the service by
    > adding the following to your /etc/hosts:
    >
    >     116.202.120.182 $INSTANCE
    >     2a01:4f8:fff0:4f:266:37ff:fe32:cfb2 $INSTANCE
anarcat's avatar
anarcat committed
 6. destroy test instance:
anarcat's avatar
anarcat committed
        gnt-instance remove $INSTANCE
 7. lower TTLs to 5 minutes. this procedure varies a lot according to
    the service, but generally if all DNS entries are `CNAME`s
    pointing to the main machine domain name, the TTL can be lowered
    by adding a `dnsTTL` entry in the LDAP entry for this host. For
    example, this sets the TTL to 5 minutes:
    
        dnsTTL: 300

    Then to make the changes immediate, you need the following
    commands:
    
        ssh root@alberti.torproject.org sudo -u sshdist ud-generate &&
        ssh root@nevii.torproject.org ud-replicate
    Warning: if you migrate one of the hosts ud-ldap depends on, this
    can fail and not only the TTL will not update, but it might also
    fail to update the IP address in the below procedure. See [ticket
    33766](https://bugs.torproject.org/33766) for
    details.
anarcat's avatar
anarcat committed
 
 8. shutdown original instance and redo migration as in step 3 and 4:
        fab -H $INSTANCE reboot.halt-and-wait --delay-shutdown 60 --reason='migrating to new server' &&
        ./ganeti -v -H $INSTANCE libvirt-import  --ganeti-node $SPARE_NODE --libvirt-host $KVM_HOST --adopt &&
        ./ganeti -v -H $INSTANCE renumber-instance --ganeti-node $SPARE_NODE
 9. final test procedure
    TODO: establish host-level test procedure and run it here.

 10. switch to DRBD, still on the Ganeti MASTER NODE:
         gnt-instance stop $INSTANCE &&
         gnt-instance modify -t drbd $INSTANCE &&
anarcat's avatar
anarcat committed
         gnt-instance failover -f $INSTANCE &&
anarcat's avatar
anarcat committed
         gnt-instance start $INSTANCE

anarcat's avatar
anarcat committed
    The above can sometimes fail if the allocator is upset about
    something in the cluster, for example:
    
        Can's find secondary node using iallocator hail: Request failed: No valid allocation solutions, failure reasons: FailMem: 2, FailN1: 2

    This situation is covered by [ticket 33785](https://bugs.torproject.org/33785). To work around the
anarcat's avatar
anarcat committed
    allocator, you can specify a secondary node directly:
    
        gnt-instance modify -t drbd -n fsn-node-04.torproject.org $INSTANCE &&
        gnt-instance failover -f $INSTANCE &&
        gnt-instance start $INSTANCE

    TODO: move into fabric, maybe in a `libvirt-import-live` or
    `post-libvirt-import` job that would also do the renumbering below

 11. change IP address in the following locations:
     * LDAP (`ipHostNumber` field, but also change the `physicalHost` and `l` fields!).  Also drop the dnsTTL attribute while you're at it.
     * Puppet (grep in tor-puppet source, run `puppet agent -t; ud-replicate` on pauli)
     * DNS (grep in tor-dns source, `puppet agent -t; ud-replicate` on nevii)
     * nagios (don't forget to change the parent)
     * reverse DNS (upstream web UI, e.g. Hetzner Robot)
anarcat's avatar
anarcat committed
     * grep for the host's IP address on itself:

            grep -r -e 78.47.38.227  -e 2a01:4f8:fff0:4f:266:37ff:fe77:1ad8 /etc
            grep -r -e 78.47.38.227  -e 2a01:4f8:fff0:4f:266:37ff:fe77:1ad8 /srv

     * grep for the host's IP on *all* hosts:

            cumin-all-puppet
            cumin-all 'grep -r -e 78.47.38.227  -e 2a01:4f8:fff0:4f:266:37ff:fe77:1ad8 /etc'
    TODO: move those jobs into fabric

anarcat's avatar
anarcat committed
 12. retire old instance (only a tiny part of [howto/retire-a-host](howto/retire-a-host)):
        ./retire -H $INSTANCE retire-instance --parent-host $KVM_HOST
 12. update the [Nextcloud spreadsheet](https://nc.torproject.net/apps/onlyoffice/5395) to remove the machine from
     the KVM host
 13. warn users about the migration, for example:
 
> To: tor-project@lists.torproject.org
> Subject: cupani AKA git-rw IP address changed
> 
> The main git server, cupani, is the machine you connect to when you push
> or pull git repositories over ssh to git-rw.torproject.org. That
> machines has been migrated to the new Ganeti cluster.
> 
> This required an IP address change from:
> 
>     78.47.38.228 2a01:4f8:211:6e8:0:823:4:1
> 
> to:
> 
>     116.202.120.182 2a01:4f8:fff0:4f:266:37ff:fe32:cfb2
> 
> DNS has been updated and preliminary tests show that everything is
> mostly working. You *will* get a warning about the IP address change
> when connecting over SSH, which will go away after the first
anarcat's avatar
anarcat committed
> connection. 
>
>     Warning: Permanently added the ED25519 host key for IP address '116.202.120.182' to the list of known hosts.
>
> That is normal. The SSH fingerprints of the host did *not* change.
> 
> Please do report any other anomaly using the normal channels:
> 
anarcat's avatar
anarcat committed
> https://gitlab.torproject.org/anarcat/wikitest/-/wikis/doc/how-to-get-help/
> 
> The service was unavailable for about an hour during the migration.

anarcat's avatar
anarcat committed
## Importing external instances, manual
This procedure is now easier to accomplish with the Fabric tools
written especially for this purpose. Use the above procedure
instead. This is kept for historical reference.

anarcat's avatar
anarcat committed
Assumptions:

 * `INSTANCE`: name of the instance being migrated, the "old" one
   being outside the cluster and the "new" one being the one created
   inside the cluster (e.g. `chiwui.torproject.org`)
 * `SPARE_NODE`: a ganeti node with free space
   (e.g. `fsn-node-03.torproject.org`) where the `INSTANCE` will be
   migrated
 * `MASTER_NODE`: the master ganeti node
   (e.g. `fsn-node-01.torproject.org`)
anarcat's avatar
anarcat committed
 * `KVM_HOST`: the machine which we migrate the `INSTANCE` from
anarcat's avatar
anarcat committed
 * the `INSTANCE` has only `root` and `swap` partitions

Import procedure:

anarcat's avatar
anarcat committed
 1. pick a viable SPARE NODE to import the instance (see "evaluating
    cluster capacity" above, when in doubt), login to the three
    servers, setting the proper environment everywhere, for example:
    
        MASTER_NODE=fsn-node-01.torproject.org
anarcat's avatar
anarcat committed
        SPARE_NODE=fsn-node-03.torproject.org
        KVM_HOST=kvm1.torproject.org
        INSTANCE=test.torproject.org
anarcat's avatar
anarcat committed
 2. establish VM specs, on the KVM HOST:
 
    * disk space in GiB:
    
anarcat's avatar
anarcat committed
          for disk in /srv/vmstore/$INSTANCE/*; do
anarcat's avatar
anarcat committed
              printf "$disk: "
anarcat's avatar
anarcat committed
              echo "$(qemu-img info --output=json $disk | jq '."virtual-size"') / 1024 / 1024 / 1024" | bc -l
anarcat's avatar
anarcat committed
          done

    * number of CPU cores:

anarcat's avatar
anarcat committed
          sed -n '/<vcpu/{s/[^>]*>//;s/<.*//;p}' < /etc/libvirt/qemu/$INSTANCE.xml

    * memory, assuming from KiB to GiB:

anarcat's avatar
anarcat committed
          echo "$(sed -n '/<memory/{s/[^>]*>//;s/<.*//;p}' < /etc/libvirt/qemu/$INSTANCE.xml) /1024 /1024" | bc -l
      TODO: make sure the memory line is in KiB and that the number
      makes sense.

anarcat's avatar
anarcat committed
    * on the INSTANCE, find the swap device UUID so we can recreate it later:
anarcat's avatar
anarcat committed

          blkid -t TYPE=swap -s UUID -o value

 3. setup a copy channel, on the SPARE NODE:
 
        ssh-agent bash
        ssh-add /etc/ssh/ssh_host_ed25519_key
        cat /etc/ssh/ssh_host_ed25519_key.pub

anarcat's avatar
anarcat committed
    on the KVM HOST:
anarcat's avatar
anarcat committed
        echo "$KEY_FROM_SPARE_NODE" >> /etc/ssh/userkeys/root
anarcat's avatar
anarcat committed
 4. copy the `.qcow` file(s) over, from the KVM HOST to the SPARE NODE:
        rsync -P $KVM_HOST:/srv/vmstore/$INSTANCE/$INSTANCE-root /srv/
        rsync -P $KVM_HOST:/srv/vmstore/$INSTANCE/$INSTANCE-lvm /srv/ || true
anarcat's avatar
anarcat committed
    Note: it's possible there is not enough room in `/srv`: in the
    base Ganeti installs, everything is in the same root partition
    (`/`) which will fill up if the instance is (say) over ~30GiB. In
    that case, create a filesystem in `/srv`:

        (mkdir /root/srv && mv /srv/* /root/srv true) || true &&
        lvcreate -L 200G vg_ganeti -n srv &&
        mkfs /dev/vg_ganeti/srv &&
        echo "/dev/vg_ganeti/srv /srv ext4 rw,noatime,errors=remount-ro 0 2" >> /etc/fstab &&
        mount /srv &&
        ( mv /root/srv/* ; rmdir /root/srv )

    This partition can be reclaimed once the VM migrations are
    completed, as it needlessly takes up space on the node.

anarcat's avatar
anarcat committed
 5. on the SPARE NODE, create and initialize a logical volume with the predetermined size:
anarcat's avatar
anarcat committed
        lvcreate -L 4GiB -n $INSTANCE-swap vg_ganeti
        mkswap --uuid $SWAP_UUID /dev/vg_ganeti/$INSTANCE-swap
        lvcreate -L 20GiB -n $INSTANCE-root vg_ganeti
        qemu-img convert /srv/$INSTANCE-root  -O raw /dev/vg_ganeti/$INSTANCE-root
anarcat's avatar
anarcat committed
        lvcreate -L 40GiB -n $INSTANCE-lvm vg_ganeti_hdd
        qemu-img convert /srv/$INSTANCE-lvm  -O raw /dev/vg_ganeti_hdd/$INSTANCE-lvm

    Note how we assume two disks above, but the instance might have a
    different configuration that would require changing the above. The
    above, common, configuration is to have an LVM disk separate from
    the "root" disk, the former being on a HDD, but the HDD is
    sometimes completely omitted and sizes can differ.
anarcat's avatar
anarcat committed
    
    Sometimes it might be worth using pv to get progress on long
    transfers:
    
anarcat's avatar
anarcat committed
        qemu-img convert /srv/$INSTANCE-lvm -O raw /srv/$INSTANCE-lvm.raw
anarcat's avatar
anarcat committed
        pv /srv/$INSTANCE-lvm.raw | dd of=/dev/vg_ganeti_hdd/$INSTANCE-lvm bs=4k
    TODO: ideally, the above procedure (and many steps below as well)
    would be automatically deduced from the disk listing established
    in the first step.

anarcat's avatar
anarcat committed
 6. on the MASTER NODE, create the instance, adopting the LV:
 
        gnt-instance add -t plain \
            -n fsn-node-03 \
anarcat's avatar
anarcat committed
            --disk 0:adopt=$INSTANCE-root \
            --disk 1:adopt=$INSTANCE-swap \
            --disk 2:adopt=$INSTANCE-lvm,vg=vg_ganeti_hdd \
            --backend-parameters memory=2g,vcpus=2 \
            --net 0:ip=pool,network=gnt-fsn \
            --no-name-check \
            --no-ip-check \
            -o debootstrap+default \
anarcat's avatar
anarcat committed
            $INSTANCE

 7. cross your fingers and watch the party:
 
anarcat's avatar
anarcat committed
        gnt-instance console $INSTANCE

 9. IP address change on new instance:
 
      edit `/etc/hosts` and `/etc/network/interfaces` by hand and add
      IPv4 and IPv6 ip. IPv4 configuration can be found in:
anarcat's avatar
anarcat committed
          gnt-instance show $INSTANCE
          
      Latter can be guessed by concatenating `2a01:4f8:fff0:4f::` and
      the IPv6 local local address without `fe80::`. For example: a
      link local address of `fe80::266:37ff:fe65:870f/64` should yield
      the following configuration:
      
          iface eth0 inet6 static
              accept_ra 0
              address 2a01:4f8:fff0:4f:266:37ff:fe65:870f/64
              gateway 2a01:4f8:fff0:4f::1

      TODO: reuse `gnt-debian-interfaces` from the ganeti puppet
      module script here?

anarcat's avatar
anarcat committed
 10. functional tests: change your `/etc/hosts` to point to the new
     server and see if everything still kind of works

 11. shutdown original instance

anarcat's avatar
anarcat committed
 12. resync and reconvert image, on the Ganeti MASTER NODE:
anarcat's avatar
anarcat committed
         gnt-instance stop $INSTANCE
anarcat's avatar
anarcat committed

     on the Ganeti node:

anarcat's avatar
anarcat committed
         rsync -P $KVM_HOST:/srv/vmstore/$INSTANCE/$INSTANCE-root /srv/ &&
         qemu-img convert /srv/$INSTANCE-root  -O raw /dev/vg_ganeti/$INSTANCE-root &&
         rsync -P $KVM_HOST:/srv/vmstore/$INSTANCE/$INSTANCE-lvm /srv/ &&
         qemu-img convert /srv/$INSTANCE-lvm  -O raw /dev/vg_ganeti_hdd/$INSTANCE-lvm
anarcat's avatar
anarcat committed
 13. switch to DRBD, still on the Ganeti MASTER NODE:
anarcat's avatar
anarcat committed
         gnt-instance modify -t drbd $INSTANCE
         gnt-instance failover $INSTANCE
         gnt-instance startup $INSTANCE
 14. redo IP adress change in `/etc/network/interfaces` and `/etc/hosts`
anarcat's avatar
anarcat committed

 15. final functional test

 16. change IP address in the following locations:
     * nagios (don't forget to change the parent)
     * LDAP (`ipHostNumber` field, but also change the `physicalHost` and `l` fields!)
     * Puppet (grep in tor-puppet source, run `puppet agent -t; ud-replicate` on pauli)
     * DNS (grep in tor-dns source, `puppet agent -t; ud-replicate` on nevii)
     * reverse DNS (upstream web UI, e.g. Hetzner Robot)

anarcat's avatar
anarcat committed
 17. decomission old instance ([howto/retire-a-host](howto/retire-a-host))
anarcat's avatar
anarcat committed
### Troubleshooting

 * if boot takes a long time and you see a message like this on the console:
 
        [  *** ] A start job is running for dev-disk-by\x2duuid-484b5...26s / 1min 30s)

   ... which is generally followed by:
   
        [DEPEND] Dependency failed for /dev/disk/by-…6f4b5-f334-4173-8491-9353d4f94e04.
        [DEPEND] Dependency failed for Swap.

   it means the swap device UUID wasn't setup properly, and does not
   match the one provided in `/etc/fstab`. That is probably because
   you missed the `mkswap -U` step documented above.

### References
 * [Upstream docs](http://docs.ganeti.org/ganeti/2.15/html/admin.html#import-of-foreign-instances) have the canonical incantation:
anarcat's avatar
anarcat committed

        gnt-instance add -t plain -n HOME_NODE ... --disk 0:adopt=lv_name[,vg=vg_name] INSTANCE_NAME

 * [DSA docs](https://dsa.debian.org/howto/install-ganeti/) also use disk adoption and have a procedure to
   migrate to DRBD
 * [Riseup docs](https://we.riseup.net/riseup+tech/ganeti#move-an-instance-from-one-cluster-to-another-from-) suggest creating a VM without installing, shutting
   down and then syncing

Ganeti [supports importing and exporting](http://docs.ganeti.org/ganeti/2.15/html/design-ovf-support.html?highlight=qcow) from the [Open
Virtualization Format](https://en.wikipedia.org/wiki/Open_Virtualization_Format) (OVF), but unfortunately it [doesn't seem
libvirt supports *exporting* to OVF](https://forums.centos.org/viewtopic.php?t=49231). There's a [virt-convert](http://manpages.debian.org/virt-convert)
tool which can *import* OVF, but not the reverse. The [libguestfs](http://www.libguestfs.org/)
library also has a [converter](http://www.libguestfs.org/virt-v2v.1.html) but it also doesn't support
exporting to OVF or anything Ganeti can load directly.

So people have written [their own conversion tools](https://virtuallyhyper.com/2013/06/migrate-from-libvirt-kvm-to-virtualbox/) or [their own
conversion procedure](https://scienceofficersblog.blogspot.com/2014/04/using-cloud-images-with-ganeti.html).

Ganeti also supports [file-backed instances](http://docs.ganeti.org/ganeti/2.15/html/design-file-based-storage.html) but "adoption" is
specifically designed for logical volumes, so it doesn't work for our
use case.

## Rebooting
anarcat's avatar
anarcat committed

Those hosts need special care, as we can accomplish zero-downtime
reboots on those machines. The `reboot` script in `tsa-misc` takes
care of the special steps involved (which is basically to empty a
node before rebooting it).
anarcat's avatar
anarcat committed

Such a reboot should be ran interactively, inside a `tmux` or `screen`
session, and takes over 15 minutes to complete right now, but depends
on the size of the cluster (in terms of core memory usage).

Once the reboot is completed, all instances might end up on a single
machine, and the cluster might need to be rebalanced, see
below. (Note: the update script should eventually do that, see [ticket
33406](https://bugs.torproject.org/33406)).
anarcat's avatar
anarcat committed

## Rebalancing a cluster

After a reboot or a downtime, all nodes might end up on the same
machine. This is normally handled by the reboot script, but it might
be desirable to do this by hand if there was a crash or another
special condition.
This can be easily corrected with this command, which will spread
instances around the cluster to balance it:

    hbal -L -C -v -X

This will automatically move the instances around and rebalance the
cluster. Here's an example run on a small cluster:

    root@fsn-node-01:~# gnt-instance list
    Instance                          Hypervisor OS                 Primary_node               Status  Memory
    loghost01.torproject.org          kvm        debootstrap+buster fsn-node-02.torproject.org running   2.0G
    onionoo-backend-01.torproject.org kvm        debootstrap+buster fsn-node-02.torproject.org running  12.0G
    static-master-fsn.torproject.org  kvm        debootstrap+buster fsn-node-02.torproject.org running   8.0G
    web-fsn-01.torproject.org         kvm        debootstrap+buster fsn-node-02.torproject.org running   4.0G
    web-fsn-02.torproject.org         kvm        debootstrap+buster fsn-node-02.torproject.org running   4.0G
    root@fsn-node-01:~# hbal -L -X
    Loaded 2 nodes, 5 instances
    Group size 2 nodes, 5 instances
    Selected node group: default
    Initial check done: 0 bad nodes, 0 bad instances.
    Initial score: 8.45007519
    Trying to minimize the CV...
        1. onionoo-backend-01 fsn-node-02:fsn-node-01 => fsn-node-01:fsn-node-02   4.98124611 a=f
        2. loghost01          fsn-node-02:fsn-node-01 => fsn-node-01:fsn-node-02   1.78271883 a=f
    Cluster score improved from 8.45007519 to 1.78271883
    Solution length=2
    Got job IDs 16345
    Got job IDs 16346
    root@fsn-node-01:~# gnt-instance list
    Instance                          Hypervisor OS                 Primary_node               Status  Memory
    loghost01.torproject.org          kvm        debootstrap+buster fsn-node-01.torproject.org running   2.0G
    onionoo-backend-01.torproject.org kvm        debootstrap+buster fsn-node-01.torproject.org running  12.0G
    static-master-fsn.torproject.org  kvm        debootstrap+buster fsn-node-02.torproject.org running   8.0G
    web-fsn-01.torproject.org         kvm        debootstrap+buster fsn-node-02.torproject.org running   4.0G
    web-fsn-02.torproject.org         kvm        debootstrap+buster fsn-node-02.torproject.org running   4.0G
anarcat's avatar
anarcat committed
In the above example, you should notice that the `web-fsn` instances both
ended up on the same node. That's because the balancer did not know
that they should be distributed. A special configuration was done,
below, to avoid that problem in the future. But as a workaround,
instances can also be moved by hand and the cluster re-balanced.

### Redundant instances distribution
anarcat's avatar
anarcat committed

Some instances are redundant across the cluster and should *not* end up
on the same node. A good example are the `web-fsn-01` and `web-fsn-02`
instances which, in theory, would serve similar traffic. If they end
up on the same node, it might flood the network on that machine or at
least defeats the purpose of having redundant machines.

The way to ensure they get distributed properly by the balancing
algorithm is to "tag" them. For the web nodes, for example, this was
performed on the master:

anarcat's avatar
anarcat committed
    gnt-cluster add-tags htools:iextags:service
    gnt-instance add-tags web-fsn-01.torproject.org service:web-fsn
    gnt-instance add-tags web-fsn-02.torproject.org service:web-fsn
anarcat's avatar
anarcat committed

This tells Ganeti that `web-fsn` is an "exclusion tag" and the
optimizer will not try to schedule instances with those tags on the
same node.

To see which tags are present, use:

    # gnt-cluster list-tags
anarcat's avatar
anarcat committed
    htools:iextags:service
anarcat's avatar
anarcat committed

You can also find which nodes are assigned to a tag with:

anarcat's avatar
anarcat committed
    # gnt-cluster search-tags service
    /cluster htools:iextags:service
    /instances/web-fsn-01.torproject.org service:web-fsn
    /instances/web-fsn-02.torproject.org service:web-fsn

IMPORTANT: a previous version of this article mistakenly indicated
that a new cluster-level tag had to be created for each service. That
method did *not* work. The [hbal manpage](http://docs.ganeti.org/ganeti/current/man/hbal.html#exclusion-tags) explicitely mentions that
the cluster-level tag is a *prefix* that can be used to create
*multiple* such tags. This configuration also happens to be simpler
and easier to use...
anarcat's avatar
anarcat committed

### HDD migration restrictions

Cluster balancing works well until there are inconsistencies between
how nodes are configured. In our case, some nodes have HDDs (Hard Disk
Drives, AKA spinning rust) and others do not. Therefore, it's not
possible to move an instance from a node with a disk allocated on the
HDD to a node that does not have such a disk.

Yet somehow the allocator is not smart enough to tell, and you will
get the following error when doing an automatic rebalancing:

    one of the migrate failed and stopped the cluster balance: Can't create block device: Can't create block device <LogicalVolume(/dev/vg_ganeti_hdd/98d30e7d-0a47-4a7d-aeed-6301645d8469.disk3_data, visible as /dev/, size=102400m)> on node fsn-node-07.torproject.org for instance gitlab-02.torproject.org: Can't create block device: Can't compute PV info for vg vg_ganeti_hdd

In this case, it is trying to migrate the `gitlab-02` server from
`fsn-node-01` (which has an HDD) to `fsn-node-07` (which hasn't),
which naturally fails. This is a known limitation of the Ganeti
code. There has been a [draft design document for multiple storage
unit support](http://docs.ganeti.org/ganeti/master/html/design-multi-storage-htools.html) since 2015, but it has [never been
implemented](https://github.com/ganeti/ganeti/issues/865). There has been multiple issues reported upstream on
the subject:

 * [208: Bad behaviour when multiple volume groups exists on nodes](https://github.com/ganeti/ganeti/issues/208)
 * [1199: unable to mark storage as unavailable for allocation](https://github.com/ganeti/ganeti/issues/1199)
 * [1240: Disk space check with multiple VGs is broken](https://github.com/ganeti/ganeti/issues/1240)
 * [1379: Support for displaying/handling multiple volume groups](https://github.com/ganeti/ganeti/issues/1379)

Unfortunately, there are no known workarounds for this, at least not
that fix the `hbal` command. It *is* possible to exclude the faulty
migration from the pool of possible moves, however, for example in the
above case:

    hbal -L -v --exclude-instances gitlab-02.torproject.org

It's also possible to use the `--no-disk-moves` option to avoid disk
move operations altogether.

anarcat's avatar
anarcat committed
Both workarounds obviously do not correctly balance the
cluster... Note that we have also tried to use `htools:migration` tags
to workaround that issue, but [those do not work for secondary
instances](https://github.com/ganeti/ganeti/issues/1497). For this we would need to setup [node groups](http://docs.ganeti.org/ganeti/current/html/man-gnt-group.html)
instead.
## Adding and removing addresses on instances

Say you created an instance but forgot to need to assign an extra
IP. You can still do so with:

    gnt-instance modify --net -1:add,ip=116.202.120.174,network=gnt-fsn test01.torproject.org

anarcat's avatar
anarcat committed
### I/O overload

In case of excessive I/O, it might be worth looking into which machine
anarcat's avatar
anarcat committed
is in cause. The [howto/drbd](howto/drbd) page explains how to map a DRBD device to a
anarcat's avatar
anarcat committed
VM. You can also find which logical volume is backing an instance (and
vice versa) with this command:

    lvs -o+tags

This will list all logical volumes and their associated tags. If you
already know which logical volume you're looking for, you can address
it directly:

    root@fsn-node-01:~# lvs -o tags /dev/vg_ganeti_hdd/4091b668-1177-41ac-9310-1eac45b46620.disk2_data
      LV Tags
      originstname+bacula-director-01.torproject.org

### Node failures

Ganeti clusters are designed to be [self-healing](http://docs.ganeti.org/ganeti/2.15/html/admin.html#autorepair). As long as only
one machine disappears, the cluster should be able to recover by
failing over other nodes. This is currently done manually, see the
migrate section above.

This could eventually be automated if such situations occur more
often, by scheduling a [harep](http://docs.ganeti.org/ganeti/2.15/man/harep.html) cron job, which isn't enabled in
Debian by default. See also the [autorepair](http://docs.ganeti.org/ganeti/2.15/html/admin.html#autorepair) section of the admin
manual.

### Bridge configuration failures

If you get the following error while trying to bring up the bridge:

    root@chi-node-02:~# ifup br0
    add bridge failed: Package not installed
    run-parts: /etc/network/if-pre-up.d/bridge exited with return code 1
    ifup: failed to bring up br0

... it might be the bridge cannot find a way to load the kernel
module, because kernel module loading has been disabled. Reboot with
the `/etc/no_modules_disabled` file present:

    touch /etc/no_modules_disabled
    reboot

It might be that the machine took too long to boot because it's not in
mandos and the operator took too long to enter the LUKS
passphrase. Re-enable the machine with this command on mandos:

    mandos-ctl --enable chi-node-02.torproject

anarcat's avatar
anarcat committed
### Other troubleshooting

Riseup has [documentation on various failure scenarios](https://we.riseup.net/riseup+tech/ganeti#failure-scenarios) including
master failover, which we haven't tested. There's also upstream
documentation on [changing node roles](http://docs.ganeti.org/ganeti/2.15/html/admin.html#changing-the-node-role) which might be useful for a
master failover scenario.

The [walkthrough](http://docs.ganeti.org/ganeti/2.15/html/walkthrough.html) also has a few recipes to resolve common
problems.

anarcat's avatar
anarcat committed
If things get completely out of hand and the cluster becomes too
unreliable for service, the only solution is to rebuild another one
elsewhere. Since Ganeti 2.2, there is a [move-instance](http://docs.ganeti.org/ganeti/2.15/html/move-instance.html) command to
move instances between cluster that can be used for that purpose.

If Ganeti is completely destroyed and its APIs don't work anymore, the
last resort is to restore all virtual machines from
anarcat's avatar
anarcat committed
[howto/backup](howto/backup). Hopefully, this should not happen except in the case of a
catastrophic data loss bug in Ganeti or [howto/drbd](howto/drbd).
anarcat's avatar
anarcat committed

# Reference

## Installation
anarcat's avatar
anarcat committed
### New gnt-fsn node
anarcat's avatar
anarcat committed
 1. To create a new box, follow [howto/new-machine-hetzner-robot](howto/new-machine-hetzner-robot) but change
    the following settings:
    * Server: [PX62-NVMe][]
    * Location: `FSN1`
    * Operating system: Rescue
    * Additional drives: 2x10TB HDD (update: starting from fsn-node-05,
      we are *not* ordering additional drives to save on costs, see
      [ticket 33083](https://bugs.torproject.org/33083) for rationale)
    * Add in the comment form that the server needs to be in the same
      datacenter as the other machines (FSN1-DC13, but double-check)
 [PX62-NVMe]: https://www.hetzner.com/dedicated-rootserver/px62-nvme?country=OTHER
anarcat's avatar
anarcat committed

anarcat's avatar
anarcat committed
 2. follow the [howto/new-machine](howto/new-machine) post-install configuration
anarcat's avatar
anarcat committed

 3. Add the server to the two `vSwitch` systems in [Hetzner Robot web
    UI](https://robot.your-server.de/vswitch)
 4. install openvswitch and allow modules to be loaded:
        touch /etc/no_modules_disabled
        reboot
        apt install openvswitch-switch
 5. Allocate a private IP address in the `30.172.in-addr.arpa` zone
    (and the `torproject.org` zone) for the node, in the
    `admin/dns/domains.git` repository
 6. copy over the `/etc/network/interfaces` from another ganeti node,
    changing the `address` and `gateway` fields to match the local
    entry.
 7. knock on wood, cross your fingers, pet a cat, help your local
    book store, and reboot:
anarcat's avatar
anarcat committed
         reboot
anarcat's avatar
anarcat committed
 8. Prepare all the nodes by configuring them in Puppet, by adding the
    class `roles::ganeti::fsn` to the node
 9. Re-enable modules disabling:

        rm /etc/no_modules_disabled

 10. run puppet across the ganeti cluster to ensure ipsec tunnels are
     up:

anarcat's avatar
anarcat committed
         cumin -p 0 'C:roles::ganeti::fsn' 'puppet agent -t'

 11. reboot again:
anarcat's avatar
anarcat committed
         reboot
 12. Then the node is ready to be added to the cluster, by running
     this on the master node:

anarcat's avatar
anarcat committed
         gnt-node add \
          --secondary-ip 172.30.135.2 \
          --no-ssh-key-check \
          --no-node-setup \
          fsn-node-02.torproject.org

    If this is an entirely new cluster, you need a different
    procedure, see [the cluster initialization procedure](#gnt-fsn-cluster-initialization) instead.