Newer
Older
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
Note that newer QEMU releases (4.2, currently in unstable) have more
supported features.
In that context, of course, supporting different CPU manufacturers
(say AMD vs Intel) is impractical: they will have totally different
families that are not compatible with each other. This will break live
migration, which can trigger crashes and problems in the migrated
virtual machines.
If there are problems live-migrating between machines, it is still
possible to "failover" (`gnt-instance failover` instead of `migrate`)
which shuts off the machine, fails over disks, and starts it on the
other side. That's not such of a big problem: we often need to reboot
the guests when we reboot the hosts anyways. But it does complicate
our work. Of course, it's also possible that live migrates work fine
if *no* `cpu_type` at all is specified in the cluster, but that needs
to be verified.
Nodes could also [grouped](http://docs.ganeti.org/ganeti/2.15/man/gnt-group.html) to limit (automated) live migration to a
subset of nodes.
References:
* <https://dsa.debian.org/howto/install-ganeti/>
* <https://qemu.weilnetz.de/doc/qemu-doc.html#recommendations_005fcpu_005fmodels_005fx86>
The [ganeti-instance-debootstrap](https://tracker.debian.org/pkg/ganeti-instance-debootstrap) package is used to install
instances. It is configured through Puppet with the [shared ganeti
module](https://forge.puppet.com/smash/ganeti), which deploys a few hooks to automate the install as much
as possible. The installer will:
1. setup grub to respond on the serial console
2. setup and log a random root password
3. make sure SSH is installed and log the public keys and
fingerprints
4. setup swap if a labeled partition is present, or a 512MB swapfile
otherwise
5. setup basic static networking through `/etc/network/interfaces.d`
1. add a few base packages
2. do our own custom SSH configuration
3. fix the hostname to be a FQDN
4. add a line to `/etc/hosts`
5. add a tmpfs
There is work underway to refactor and automate the install better,
see [ticket 31239](https://trac.torproject.org/projects/tor/ticket/31239) for details.
There is no issue tracker specifically for this project, [File][] or
[search][] for issues in the [generic internal services][search] component.
[File]: https://trac.torproject.org/projects/tor/newticket?component=Internal+Services%2FTor+Sysadmin+Team
[search]: https://trac.torproject.org/projects/tor/query?status=!closed&component=Internal+Services%2FTor+Sysadmin+Team
# Discussion
## Overview
The project of creating a Ganeti cluster for Tor has appeared in the
summer of 2019. The machines were delivered by Hetzner in July 2019
and setup by weasel by the end of the month.
The goal was to replace the aging group of KVM servers (kvm[1-5], AKA
textile, unifolium, macrum, kvm4 and kvm5).
* arbitrary virtual machine provisionning
* redundant setup
* automated VM installation
* replacement of existing infrastructure
* fully configured in Puppet
* full high availability with automatic failover
* extra capacity for new projects
* Docker or "container" provisionning - we consider this out of scope
for now
* self-provisionning by end-users: TPA remains in control of
provisionning
A budget was proposed by weasel in may 2019 and approved by Vegas in
June. An extension to the budget was approved in january 2020 by
Vegas.
## Proposed Solution
Setup a Ganeti cluster of two machines with a Hetzner vSwitch backend.
The design based on the [PX62 line][PX62-NVMe] has the following monthly cost
structure:
* per server: 118EUR (79EUR + 39EUR for 2x10TB HDDs)
* IPv4 space: 35.29EUR (/27)
* IPv6 space: 8.40EUR (/64)
* bandwidth cost: 1EUR/TB (currently 38EUR)
At three servers, that adds up to around 435EUR/mth. Up to date costs
are available in the [Tor VM hosts.xlsx](https://nc.torproject.net/apps/onlyoffice/5395) spreadsheet.
## Alternatives considered
<!-- include benchmarks and procedure if relevant -->