Skip to content
Snippets Groups Projects
ganeti.md 155 KiB
Newer Older
 2. follow the [howto/new-machine](howto/new-machine) post-install configuration

 3. Allocate a private IP address for the node in the
    `30.172.in-addr.arpa` zone and `torproject.org` zone, in the
    `admin/dns/domains.git` repository

 4. add the private IP address to the `eth1` interface, for example in
    `/etc/network/interfaces.d/eth1`:

        auto eth1
        iface eth1 inet static
            address 172.30.131.101/24

    Again, this IP must be allocated in the reverse DNS zone file
    (`30.172.in-addr.arpa`) and the `torproject.org` zone file in the
    `dns/domains.git` repository.

 5. enable the interface:


 6. setup a bridge on the public interface, replacing the `eth0` blocks
    with something like:

        auto eth0
        iface eth0 inet manual

        auto br0
        iface br0 inet static
            address 204.8.99.101/24
            gateway 204.8.99.254
            bridge_ports eth0
            bridge_stp off
            bridge_fd 0

        # IPv6 configuration
        iface br0 inet6 static
            accept_ra 0
            address 2620:7:6002:0:3eec:efff:fed5:6b2a/64
            gateway 2620:7:6002::1

 6. allow modules to be loaded, cross your fingers that you didn't
    screw up the network configuration above, and reboot:

        touch /etc/no_modules_disabled
        reboot

 7. configure the node in Puppet by adding it to the
anarcat's avatar
anarcat committed
    `roles::ganeti::dal` class, and run Puppet on the new node:

        puppet agent -t

 8. re-disable module loading:

         rm /etc/no_modules_disabled

 9. run puppet across the Ganeti cluster so firewalls are correctly
    configured:

anarcat's avatar
anarcat committed
         cumin -p 0 'C:roles::ganeti::dal 'puppet agent -t'
 10. partition the extra disks, SSD:

         for disk in /dev/sd[abcdef]; do
              parted -s $disk mklabel gpt;
              parted -s $disk -a optimal mkpart primary 0% 100%;
         done &&
anarcat's avatar
anarcat committed
         mdadm --create --verbose --level=10 --metadata=1.2 \
               --raid-devices=6 \
               /dev/md2 \
               /dev/sda1 \
               /dev/sdb1 \
               /dev/sdc1 \
               /dev/sdd1 \
               /dev/sde1 \
               /dev/sdf1 &&
         dd if=/dev/random bs=64 count=128 of=/etc/luks/crypt_dev_md2 &&
         chmod 0 /etc/luks/crypt_dev_md2 &&
         cryptsetup luksFormat --key-file=/etc/luks/crypt_dev_md2 /dev/md2 &&
         cryptsetup luksOpen --key-file=/etc/luks/crypt_dev_md2 /dev/md2 crypt_dev_md2 &&
         pvcreate /dev/mapper/crypt_dev_md2 &&
         vgcreate vg_ganeti /dev/mapper/crypt_dev_md2 &&
         echo crypt_dev_md2 UUID=$(lsblk -n -o UUID /dev/md2 | head -1) /etc/luks/crypt_dev_md2 luks,discard >> /etc/crypttab &&
         update-initramfs -u

    NVMe:

         for disk in /dev/nvme[23]n1; do
             parted -s $disk mklabel gpt;
             parted -s $disk -a optimal mkpart primary 0% 100%;
         done &&
anarcat's avatar
anarcat committed
         mdadm --create --verbose --level=1 --metadata=1.2 \
               --raid-devices=2 \
               /dev/md3 \
               /dev/nvme2n1p1 \
               /dev/nvme3n1p1 &&
         dd if=/dev/random bs=64 count=128 of=/etc/luks/crypt_dev_md3 &&
         chmod 0 /etc/luks/crypt_dev_md3 &&
         cryptsetup luksFormat --key-file=/etc/luks/crypt_dev_md3 /dev/md3 &&
         cryptsetup luksOpen --key-file=/etc/luks/crypt_dev_md3 /dev/md3 crypt_dev_md3 &&
         pvcreate /dev/mapper/crypt_dev_md3 &&
         vgcreate vg_ganeti_nvme /dev/mapper/crypt_dev_md3 &&
         echo crypt_dev_md3 UUID=$(lsblk -n -o UUID /dev/md3 | head -1) /etc/luks/crypt_dev_md3 luks,discard >> /etc/crypttab &&
         update-initramfs -u

    Normally, this would have been done in the `setup-storage`
    configuration, but we were in a rush. Note that we create
    partitions because we're worried replacement drives might not have
    exactly the same size as the ones we have. The above gives us a
    1.4MB buffer at the end of the drive, and avoids having to
    hard code disk sizes in bytes.

 11. Reboot to test the LUKS configuration:

         reboot

 10. Then the node is ready to be added to the cluster, by running
     this on the master node:

         gnt-node add \
anarcat's avatar
anarcat committed
          --secondary-ip 172.30.131.103 \
          --no-ssh-key-check \
          --no-node-setup \
anarcat's avatar
anarcat committed
          dal-node-03.torproject.org

    If this is an entirely new cluster, you need a different
    procedure, see [the cluster initialization procedure](#gnt-fsn-cluster-initialization) instead.

 11. make sure everything is great in the cluster:

         gnt-cluster verify

If the last step fails with SSH errors, you may need to re-synchronise
the SSH `known_hosts` file, see [SSH key verification failures](#ssh-key-verification-failures).

### gnt-dal cluster initialization

This procedure replaces the `gnt-node add` step in the initial setup
anarcat's avatar
anarcat committed
of the first Ganeti node when the `gnt-dal` cluster was setup.

Initialize the ganeti cluster:

    gnt-cluster init \
        --nic-parameters link=br0 \
        --vg-name vg_ganeti \
        --secondary-ip 172.30.131.101 \
        --enabled-hypervisors kvm \
        --mac-prefix 06:66:39 \
        --no-ssh-init \
        --no-etc-hosts \
        dalgnt.torproject.org

The above assumes that `dalgnt` is already in DNS. See the [MAC
address prefix selection](#mac-address-prefix-selection) section for information on how the
`--mac-prefix` argument was selected.

Then the following extra configuration was performed:

```
gnt-cluster modify --reserved-lvs vg_system/root,vg_system/swap
gnt-cluster modify -H kvm:kernel_path=,initrd_path=
gnt-cluster modify -H kvm:security_model=pool
gnt-cluster modify -H kvm:kvm_extra='-device virtio-rng-pci\,bus=pci.0\,addr=0x1e\,max-bytes=1024\,period=1000 -global isa-fdc.fdtypeA=none'
gnt-cluster modify -H kvm:disk_cache=none
gnt-cluster modify -H kvm:disk_discard=unmap
gnt-cluster modify -H kvm:scsi_controller_type=virtio-scsi-pci
gnt-cluster modify -H kvm:disk_type=scsi-hd
gnt-cluster modify -H kvm:migration_bandwidth=950
gnt-cluster modify -H kvm:migration_downtime=500
gnt-cluster modify -H kvm:migration_caps=postcopy-ram
gnt-cluster modify -H kvm:cpu_type=host
gnt-cluster modify -D drbd:c-plan-ahead=0,disk-custom='--c-plan-ahead 0'
gnt-cluster modify -D drbd:net-custom='--verify-alg sha1 --max-buffers 8k'
gnt-cluster modify --uid-pool 4000-4019
```

The upper limit for CPU count and memory size were doubled, to 16 and
64G, respectively, with:

```
gnt-cluster modify --ipolicy-bounds-specs \
max:cpu-count=32,disk-count=16,disk-size=1048576,\
memory-size=307200,nic-count=8,spindle-use=12\
/min:cpu-count=1,disk-count=1,disk-size=1024,\
memory-size=128,nic-count=1,spindle-use=1
```

NOTE: watch out for whitespace here. The [original source](https://johnny85v.wordpress.com/2016/06/13/ganeti-commands/) for this
command had too much whitespace, which fails with:

    Failure: unknown/wrong parameter name 'Missing value for key '' in option --ipolicy-bounds-specs'

The [network configuration](#network-configuration) (below) must also be performed for the
address blocks reserved in the cluster. This is the actual initial
configuration performed:

    gnt-network add --network 204.8.99.128/25 --gateway 204.8.99.254 --network6 2620:7:6002::/64 --gateway6 2620:7:6002::1 gnt-dal-01
    gnt-network connect --nic-parameters=link=br0 gnt-dal-01 default

anarcat's avatar
anarcat committed
Note that we reserve the first `/25` (204.8.99.0/25) for future
use. The above only uses the second half of the network in case we
need the rest of the network for other operations. A new network will
need to be added if we run out of IPs in the second half. This also 

No IP was reserved as the gateway is already automatically reserved by
Ganeti. The node's public addresses are in the other /25 and also do
not need to be reserved in this allocation.
anarcat's avatar
anarcat committed
### Network configuration

IP allocation is managed by Ganeti through the `gnt-network(8)`
system. Say we have `192.0.2.0/24` reserved for the cluster, with
anarcat's avatar
anarcat committed
the host IP `192.0.2.100` and the gateway on `192.0.2.1`. You will
anarcat's avatar
anarcat committed
create this network with:

    gnt-network add --network 192.0.2.0/24 --gateway 192.0.2.1 example-network

If there's also IPv6, it would look something like this:

anarcat's avatar
anarcat committed
    gnt-network add --network 192.0.2.0/24 --gateway 192.0.2.1 --network6 2001:db8::/32 --gateway6 fe80::1 example-network

Note: the actual name of the network (`example-network`) above, should
anarcat's avatar
anarcat committed
follow the convention established in [doc/naming-scheme](doc/naming-scheme).
anarcat's avatar
anarcat committed
Then we associate the new network to the default node group:

    gnt-network connect --nic-parameters=link=br0,vlan=4000,mode=openvswitch example-network default

The arguments to `--nic-parameters` come from the values configured in
the cluster, above. The current values can be found with `gnt-cluster
info`.

For example, the second ganeti network block was assigned with the
following commands:

    gnt-network add --network 49.12.57.128/27 --gateway 49.12.57.129 gnt-fsn13-02
    gnt-network connect --nic-parameters=link=br0,vlan=4000,mode=openvswitch gnt-fsn13-02 default

anarcat's avatar
anarcat committed
IP addresses can be reserved with the `--reserved-ips` argument to the
modify command, for example:

    gnt-network modify --add-reserved-ips=38.229.82.2,38.229.82.3,38.229.82.4,38.229.82.5,38.229.82.6,38.229.82.7,38.229.82.8,38.229.82.9,38.229.82.10,38.229.82.11,38.229.82.12,38.229.82.13,38.229.82.14,38.229.82.15,38.229.82.16,38.229.82.17,38.229.82.18,38.229.82.19 gnt-chi-01 gnt-chi-01

Note that the gateway and nodes IP addresses are automatically
reserved, this is for hosts outside of the cluster.

anarcat's avatar
anarcat committed
The network name must follow the [naming convention](doc/naming-scheme).

## Upgrades

Ganeti upgrades need to be handled specially, and have their own
documentation in the [howto/upgrades](howto/upgrades) documents.

TODO: move procedures here?

anarcat's avatar
anarcat committed
## SLA

As long as the cluster is not over capacity, it should be able to
survive the loss of a node in the cluster unattended.

Justified machines can be provisionned within a few business days
without problems.

New nodes can be provisioned within a week or two, depending on budget
and hardware availability.

## Design and architecture
anarcat's avatar
anarcat committed

Our first Ganeti cluster (`gnt-fsn`) is made of multiple machines
hosted with [Hetzner Robot](https://robot.your-server.de/), Hetzner's dedicated server hosting
service. All machines use the same hardware to avoid problems with
live migration. That is currently a customized build of the
anarcat's avatar
anarcat committed
[PX62-NVMe][] line.
anarcat's avatar
anarcat committed

### Network layout

Machines are interconnected over a [vSwitch](https://wiki.hetzner.de/index.php/Vswitch/en), a "virtual layer 2
network" probably implemented using [Software-defined Networking](https://en.wikipedia.org/wiki/Software-defined_networking)
(SDN) on top of Hetzner's network. The details of that implementation
do not matter much to us, since we do not trust the network and run an
IPsec layer on top of the vswitch. We communicate with the `vSwitch`
through [Open vSwitch](https://en.wikipedia.org/wiki/Open_vSwitch) (OVS), which is (currently manually)
configured on each node of the cluster.

There are two distinct IPsec networks:

 * `gnt-fsn-public`: the public network, which maps to the
   `fsn-gnt-inet-vlan` vSwitch at Hetzner, the `vlan-gntinet` OVS
   network, and the `gnt-fsn` network pool in Ganeti. it provides
   public IP addresses and routing across the network. instances get
   IP allocated in this network.

 * `gnt-fsn-be`: the private ganeti network which maps to the
   `fsn-gnt-backend-vlan` vSwitch at Hetzner and the `vlan-gntbe` OVS
   network. it has no matching `gnt-network` component and IP
   addresses are allocated manually in the 172.30.135.0/24 network
   through DNS. it provides internal routing for Ganeti commands and
anarcat's avatar
anarcat committed
   [howto/drbd](howto/drbd) storage mirroring.
### MAC address prefix selection

The MAC address prefix for the gnt-fsn cluster (`00:66:37:...`) seems
to have been picked arbitrarily. While it does not conflict with a
known existing prefix, it could eventually be issued to a manufacturer
and reused, possibly leading to a MAC address clash. The closest is
currently Huawei:

    $ grep ^0066 /var/lib/ieee-data/oui.txt
    00664B     (base 16)		HUAWEI TECHNOLOGIES CO.,LTD

Such a clash is fairly improbable, because that new manufacturer would
need to show up on the local network as well. Still, new clusters
SHOULD use a different MAC address prefix in a [locally administered
address](https://en.wikipedia.org/wiki/MAC_address#Universal_vs._local) (LAA) space, which "are distinguished by setting the
second-least-significant bit of the first octet of the address". In
other words, the MAC address must have 2, 6, A or E as a its second
[quad](https://en.wikipedia.org/wiki/Nibble). In other words, the MAC address must look like one of those:

    x2 - xx - xx - xx - xx - xx
    x6 - xx - xx - xx - xx - xx
    xA - xx - xx - xx - xx - xx
    xE - xx - xx - xx - xx - xx

We used `06:66:38` in the gnt-chi cluster for that reason. We picked
anarcat's avatar
anarcat committed
the `06:66` prefix to resemble the existing `00:66` prefix used in
`gnt-fsn` but varied the last quad (from `:37` to `:38`) to make them
slightly more different-looking.

Obviously, it's unlikely the MAC addresses will be compared across
clusters in the short term. But it's technically possible a MAC bridge
could be established if an exotic VPN bridge gets established between
the two networks in the future, so it's good to have some difference.

anarcat's avatar
anarcat committed
### Hardware variations

We considered experimenting with the new AX line ([AX51-NVMe](https://www.hetzner.com/dedicated-rootserver/ax51-nvme?country=OTHER)) but
in the past DSA had problems live-migrating (it wouldn't immediately
fail but there were "issues" after). So we might need to [failover](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-instance.html#failover)
anarcat's avatar
anarcat committed
instead of migrate between those parts of the cluster. There are also
doubts that the Linux kernel supports those shiny new processors at
all: similar processors had [trouble booting before Linux 5.5](https://www.phoronix.com/scan.php?page=news_item&px=Threadripper-3000-MCE-5.5-Fix) for
example, so it might be worth waiting a little before switching to
that new platform, even if it's cheaper. See the cluster configuration
section below for a larger discussion of CPU emulation.

### CPU emulation

Note that we might want to tweak the `cpu_type` parameter. By default,
it emulates a lot of processing that can be delegated to the host CPU
instead. If we use `kvm:cpu_type=host`, then each node will tailor the
emulation system to the CPU on the node. But that might make the live
migration more brittle: VMs or processes can crash after a live
migrate because of a slightly different configuration (microcode, CPU,
kernel and QEMU versions all play a role). So we need to find the
lowest common denominator in CPU families. The list of available
families supported by QEMU varies between releases, but is visible
with:

    # qemu-system-x86_64 -cpu help
    Available CPUs:
    x86 486
    x86 Broadwell             Intel Core Processor (Broadwell)
    [...]
    x86 Skylake-Client        Intel Core Processor (Skylake)
    x86 Skylake-Client-IBRS   Intel Core Processor (Skylake, IBRS)
    x86 Skylake-Server        Intel Xeon Processor (Skylake)
    x86 Skylake-Server-IBRS   Intel Xeon Processor (Skylake, IBRS)
    [...]

anarcat's avatar
anarcat committed
The current [PX62 line][PX62-NVMe] is based on the [Coffee Lake](https://en.wikipedia.org/wiki/Coffee_Lake) Intel
micro-architecture. The closest matching family would be
`Skylake-Server` or `Skylake-Server-IBRS`, [according to wikichip](https://en.wikichip.org/wiki/intel/microarchitectures/coffee_lake#Compiler_support).
Note that newer QEMU releases (4.2, currently in unstable) have more
supported features.

In that context, of course, supporting different CPU manufacturers
(say AMD vs Intel) is impractical: they will have totally different
families that are not compatible with each other. This will break live
migration, which can trigger crashes and problems in the migrated
virtual machines.

If there are problems live-migrating between machines, it is still
possible to "failover" (`gnt-instance failover` instead of `migrate`)
which shuts off the machine, fails over disks, and starts it on the
other side. That's not such of a big problem: we often need to reboot
the guests when we reboot the hosts anyways. But it does complicate
our work. Of course, it's also possible that live migrates work fine
if *no* `cpu_type` at all is specified in the cluster, but that needs
to be verified.

Nodes could also [grouped](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-group.html) to limit (automated) live migration to a
subset of nodes.

Update: this was enabled in the `gnt-dal` cluster.

References:

 * <https://dsa.debian.org/howto/install-ganeti/>
 * <https://qemu.weilnetz.de/doc/qemu-doc.html#recommendations_005fcpu_005fmodels_005fx86>

anarcat's avatar
anarcat committed
### Installer
anarcat's avatar
anarcat committed
The [ganeti-instance-debootstrap](https://tracker.debian.org/pkg/ganeti-instance-debootstrap) package is used to install
instances. It is configured through Puppet with the [shared ganeti
module](https://forge.puppet.com/smash/ganeti), which deploys a few hooks to automate the install as much
as possible. The installer will:
anarcat's avatar
anarcat committed
 1. setup grub to respond on the serial console
 2. setup and log a random root password
 3. make sure SSH is installed and log the public keys and
    fingerprints
 4. create a 512MB file-backed swap volume at `/swapfile`, or
    a swap partition if it finds one labeled `swap`
anarcat's avatar
anarcat committed
 5. setup basic static networking through `/etc/network/interfaces.d`
anarcat's avatar
anarcat committed
We have custom configurations on top of that to:
anarcat's avatar
anarcat committed
 1. add a few base packages
 2. do our own custom SSH configuration
 3. fix the hostname to be a FQDN
 4. add a line to `/etc/hosts`
 5. add a tmpfs
anarcat's avatar
anarcat committed
There is work underway to refactor and automate the install better,
see [ticket 31239](https://bugs.torproject.org/31239) for details.
## Services

TODO: document a bit how the different Ganeti services interface with
each other

## Storage
anarcat's avatar
anarcat committed

TODO: document how DRBD works in general, and how it's setup here in
particular.

See also the [DRBD documentation](howto/drbd).

anarcat's avatar
anarcat committed
The Cymru PoP has an iSCSI cluster for large filesystem
storage. Ideally, this would be automated inside Ganeti, some quick
links:

 * [search for iSCSI in the ganeti-devel mailing list](https://www.mail-archive.com/search?l=ganeti-devel@googlegroups.com&q=iscsi&submit.x=0&submit.y=0)
 * in particular a [discussion of integrating SANs into ganeti](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/P7JU_0YGn9s)
   seems to say "just do it manually" (paraphrasing) and [this
   discussion has an actual implementation](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/kkXFDgvg2rY), [gnt-storage-eql](https://github.com/atta/gnt-storage-eql)
 * it could be implemented as an [external storage provider](https://github.com/ganeti/ganeti/wiki/External-Storage-Providers), see
   the [documentation](http://docs.ganeti.org/ganeti/2.10/html/design-shared-storage.html)
 * the DSA docs are in two parts: [iscsi](https://dsa.debian.org/howto/iscsi/) and [export-iscsi](https://dsa.debian.org/howto/export-iscsi/)
 * someone made a [Kubernetes provisionner](https://github.com/nmaupu/dell-provisioner) for our hardware which
   could provide sample code

For now, iSCSI volumes are manually created and passed to new virtual
machines. 
anarcat's avatar
anarcat committed

## Queues

TODO: document gnt-job

## Interfaces

TODO: document the RAPI and ssh commandline

## Authentication

## Implementation

Ganeti is implemented in a mix of Python and Haskell, in a mature
codebase.

## Related services

Ganeti relies heavily on [DRBD](howto/drbd) for live migrations.
anarcat's avatar
anarcat committed
There is no issue tracker specifically for this project, [File][] or
[search][] for issues in the [team issue tracker][search] with the
~Ganeti label.
 [File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
 [search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues?label_name%5B%5D=Ganeti
Upstream Ganeti has of course its own [issue tracker on GitHub](https://github.com/ganeti/ganeti/issues).
TPA are the main direct operators of the services, but most if not all
TPI teams use its services either directly or indirectly.

Ganeti used to be a Google project until it was abandoned and spun off
to a separate, standalone free software community. Right now it is
maintained by a mixed collection of organisations and non-profits.

## Monitoring and metrics
Anarcat implemented a Prometheus metrics exporter that writes stats in
the node exporter "textfile" collector. The source code is available
in `tor-puppet.git`, as
`profile/files/ganeti/tpa-ganeti-prometheus-metrics.py`. Those metrics
are in turn displayed in the [Ganeti Health](https://grafana.torproject.org/d/ce2db5a5-b42b-4454-8d81-ee95b09e229a/ganeti-health) Grafana dashboard.
The WMF worked on a [proper Ganeti exporter](https://github.com/ganeti/prometheus-ganeti-exporter) we should probably
switch to, once it is [packaged in Debian](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054138).
To test if a cluster is working properly, the `verify` command can be
ran:

    gnt-cluster verify

Creating a VM and migrating it between machines is also a good test.

Ganeti logs a significant amount of information in
`/var/log/ganeti/`. Those logs are of particular interest:

 * `node-daemon.log`: all low-level commands and HTTP requests on the
   node daemon, includes, for example, LVM and DRBD commands
 * `os/*$hostname*.log`: installation log for machine `$hostname`,
   this also includes VM migration logs for the `move-instance` or
   `gnt-instance export` commands
There are no backups of virtual machines directly from Ganeti: each
machine is expected to perform its own backups. The Ganeti
configuration should be backed up as normal by our [backup
anarcat's avatar
anarcat committed
systems](service/backup).
## Other documentation

 * [Ganeti](http://www.ganeti.org/)
   * [Ganeti documentation home](http://docs.ganeti.org/)
   * [Main manual](http://docs.ganeti.org/ganeti/master/html/)
   * [Manual pages](http://docs.ganeti.org/ganeti/master/man/)
   * [Wiki](https://github.com/ganeti/ganeti/wiki)
   * [Issues](https://github.com/ganeti/ganeti/issues)
   * [Google group](https://groups.google.com/forum/#!forum/ganeti)
 * [Wikimedia foundation documentation](https://wikitech.wikimedia.org/wiki/Ganeti)
 * [Riseup documentation](https://we.riseup.net/riseup+tech/ganeti)
 * [DSA](https://dsa.debian.org/howto/install-ganeti/)
 * [OSUOSL wiki](https://wiki.osuosl.org/ganeti/)

The Ganeti cluster has served us well over the years. This section
aims at discussing the current limitations and possible future.

Ganeti works well for our purposes, which is hosting generic virtual
machine. It's less efficient at managing mixed-usage or specialized
setups like large file storage or high performance database, because
of cross-machine contamination and storage overhead.

## Security and risk assessment

No in-depth security review or risk assessment has been done on the
Ganeti clusters recently. It is believe the cryptography and design of
Ganeti cluster is sound. There's a concern with the server host keys
reuse and, in general, there's some confusion over what goes over TLS
and what goes over SSH.

Deleting VMs is relatively too easy in Ganeti. You just need one
confirmation, and a VM is completely wiped, so there's always a risk
of accidental removal.

## Technical debt and next steps

The ganeti-instance-debootstrap installer is slow and almost abandoned
upstream. It required significant patching to get cross-cluster
migrations working.

There are concerns that the DRBD and memory redundancy required by the
Ganeti allocators lead to resource waste, that is to be investigated
in [tpo/tpa/team#40799](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40799).

## Proposed Solution

No recent proposal was done for the Ganeti clusters, although the
Cymru migration is somewhat relevant:

 - [TPA-RFC-40: Cymru migration](policy/tpa-rfc-40-cymru-migration)
 - [TPA-RFC-43: Cymru migration plan](policy/tpa-rfc-43-cymru-migration-plan)
 - [TPA-RFC-52: Cymru migration timeline](policy/tpa-rfc-52-cymru-migration-timeline)

## Other alternatives

Proxmox is probably the biggest contender here. OpenStack is also
marginally similar.

# Old libvirt cluster retirement

anarcat's avatar
anarcat committed
The project of creating a Ganeti cluster for Tor has appeared in the
summer of 2019. The machines were delivered by Hetzner in July 2019
and setup by weasel by the end of the month.
anarcat's avatar
anarcat committed

anarcat's avatar
anarcat committed
The goal was to replace the aging group of KVM servers (`kvm[1-5]`, AKA
`textile`, `unifolium`, `macrum`, `kvm4` and `kvm5`).
anarcat's avatar
anarcat committed
 * arbitrary virtual machine provisionning
 * redundant setup
 * automated VM installation
 * replacement of existing infrastructure

anarcat's avatar
anarcat committed
 * fully configured in Puppet
 * full high availability with automatic failover
 * extra capacity for new projects

anarcat's avatar
anarcat committed
 * Docker or "container" provisionning - we consider this out of scope
   for now
 * self-provisionning by end-users: TPA remains in control of
   provisionning

## Approvals required
anarcat's avatar
anarcat committed

A budget was proposed by weasel in may 2019 and approved by Vegas in
June. An extension to the budget was approved in january 2020 by
Vegas.
anarcat's avatar
anarcat committed
Setup a Ganeti cluster of two machines with a Hetzner vSwitch backend.

anarcat's avatar
anarcat committed
The design based on the [PX62 line][PX62-NVMe] has the following monthly cost
structure:

 * per server: 118EUR (79EUR + 39EUR for 2x10TB HDDs)
 * IPv4 space: 35.29EUR (/27)
 * IPv6 space: 8.40EUR (/64)
 * bandwidth cost: 1EUR/TB (currently 38EUR)

At three servers, that adds up to around 435EUR/mth. Up to date costs
are available in the [Tor VM hosts.xlsx](https://nc.torproject.net/apps/onlyoffice/5395) spreadsheet.

## Alternatives considered

<!-- include benchmarks and procedure if relevant -->
anarcat's avatar
anarcat committed

Note that the instance install is possible also [through FAI, see the
Ganeti wiki for examples](https://github.com/ganeti/ganeti/wiki/System-template-with-FAI).
anarcat's avatar
anarcat committed

There are GUIs for Ganeti that we are not using, but could, if we want
to grant more users access:

 * [Ganeti Web manager](https://ganeti-webmgr.readthedocs.io/) is a
   "Django based web frontend for managing Ganeti virtualization
   clusters. Since Ganeti only provides a command-line interface,
   Ganeti Web Manager’s goal is to provide a user friendly web
   interface to Ganeti via Ganeti’s Remote API. On top of Ganeti it
   provides a permission system for managing access to clusters and
   virtual machines, an in browser VNC console, and vm state and
   resource visualizations"
 * [Synnefo](https://www.synnefo.org/) is a "complete open source
   cloud stack written in Python that provides Compute, Network,
   Image, Volume and Storage services, similar to the ones offered by
   AWS. Synnefo manages multiple Ganeti clusters at the backend for
   handling of low-level VM operations and uses Archipelago to unify
   cloud storage. To boost 3rd-party compatibility, Synnefo exposes
   the OpenStack APIs to users."