Skip to content
Snippets Groups Projects
ganeti.mdwn 45.4 KiB
Newer Older
    x86 Broadwell             Intel Core Processor (Broadwell)
    [...]
    x86 Skylake-Client        Intel Core Processor (Skylake)
    x86 Skylake-Client-IBRS   Intel Core Processor (Skylake, IBRS)
    x86 Skylake-Server        Intel Xeon Processor (Skylake)
    x86 Skylake-Server-IBRS   Intel Xeon Processor (Skylake, IBRS)
    [...]

anarcat's avatar
anarcat committed
The current [PX62 line][PX62-NVMe] is based on the [Coffee Lake](https://en.wikipedia.org/wiki/Coffee_Lake) Intel
micro-architecture. The closest matching family would be
`Skylake-Server` or `Skylake-Server-IBRS`, [according to wikichip](https://en.wikichip.org/wiki/intel/microarchitectures/coffee_lake#Compiler_support).
Note that newer QEMU releases (4.2, currently in unstable) have more
supported features.

In that context, of course, supporting different CPU manufacturers
(say AMD vs Intel) is impractical: they will have totally different
families that are not compatible with each other. This will break live
migration, which can trigger crashes and problems in the migrated
virtual machines.

If there are problems live-migrating between machines, it is still
possible to "failover" (`gnt-instance failover` instead of `migrate`)
which shuts off the machine, fails over disks, and starts it on the
other side. That's not such of a big problem: we often need to reboot
the guests when we reboot the hosts anyways. But it does complicate
our work. Of course, it's also possible that live migrates work fine
if *no* `cpu_type` at all is specified in the cluster, but that needs
to be verified.

Nodes could also [grouped](http://docs.ganeti.org/ganeti/2.15/man/gnt-group.html) to limit (automated) live migration to a
subset of nodes.

References:

 * <https://dsa.debian.org/howto/install-ganeti/>
 * <https://qemu.weilnetz.de/doc/qemu-doc.html#recommendations_005fcpu_005fmodels_005fx86>

anarcat's avatar
anarcat committed
### Installer
anarcat's avatar
anarcat committed
The [ganeti-instance-debootstrap](https://tracker.debian.org/pkg/ganeti-instance-debootstrap) package is used to install
instances. It is configured through Puppet with the [shared ganeti
module](https://forge.puppet.com/smash/ganeti), which deploys a few hooks to automate the install as much
as possible. The installer will:
anarcat's avatar
anarcat committed
 1. setup grub to respond on the serial console
 2. setup and log a random root password
 3. make sure SSH is installed and log the public keys and
    fingerprints
 4. setup swap if a labeled partition is present, or a 512MB swapfile
    otherwise
 5. setup basic static networking through `/etc/network/interfaces.d`
anarcat's avatar
anarcat committed
We have custom configurations on top of that to:
anarcat's avatar
anarcat committed
 1. add a few base packages
 2. do our own custom SSH configuration
 3. fix the hostname to be a FQDN
 4. add a line to `/etc/hosts`
 5. add a tmpfs
anarcat's avatar
anarcat committed
There is work underway to refactor and automate the install better,
see [ticket 31239](https://trac.torproject.org/projects/tor/ticket/31239) for details.
anarcat's avatar
anarcat committed
There is no issue tracker specifically for this project, [File][] or
[search][] for issues in the [generic internal services][search] component.

 [File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team
 [search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team
anarcat's avatar
anarcat committed
The project of creating a Ganeti cluster for Tor has appeared in the
summer of 2019. The machines were delivered by Hetzner in July 2019
and setup by weasel by the end of the month.
anarcat's avatar
anarcat committed

The goal was to replace the aging group of KVM servers (kvm[1-5], AKA
textile, unifolium, macrum, kvm4 and kvm5).
anarcat's avatar
anarcat committed
 * arbitrary virtual machine provisionning
 * redundant setup
 * automated VM installation
 * replacement of existing infrastructure

anarcat's avatar
anarcat committed
 * fully configured in Puppet
 * full high availability with automatic failover
 * extra capacity for new projects

anarcat's avatar
anarcat committed
 * Docker or "container" provisionning - we consider this out of scope
   for now
 * self-provisionning by end-users: TPA remains in control of
   provisionning

## Approvals required
anarcat's avatar
anarcat committed

A budget was proposed by weasel in may 2019 and approved by Vegas in
June. An extension to the budget was approved in january 2020 by
Vegas.
anarcat's avatar
anarcat committed
Setup a Ganeti cluster of two machines with a Hetzner vSwitch backend.

anarcat's avatar
anarcat committed
The design based on the [PX62 line][PX62-NVMe] has the following monthly cost
structure:

 * per server: 118EUR (79EUR + 39EUR for 2x10TB HDDs)
 * IPv4 space: 35.29EUR (/27)
 * IPv6 space: 8.40EUR (/64)
 * bandwidth cost: 1EUR/TB (currently 38EUR)

At three servers, that adds up to around 435EUR/mth. Up to date costs
are available in the [Tor VM hosts.xlsx](https://nc.torproject.net/apps/onlyoffice/5395) spreadsheet.

## Alternatives considered

<!-- include benchmarks and procedure if relevant -->