Newer
Older
2. follow the [howto/new-machine](howto/new-machine) post-install configuration
3. Allocate a private IP address for the node in the
`30.172.in-addr.arpa` zone and `torproject.org` zone, in the
`admin/dns/domains.git` repository
4. add the private IP address to the `eth1` interface, for example in
`/etc/network/interfaces.d/eth1`:
auto eth1
iface eth1 inet static
address 172.30.131.101/24
Again, this IP must be allocated in the reverse DNS zone file
(`30.172.in-addr.arpa`) and the `torproject.org` zone file in the
`dns/domains.git` repository.
5. enable the interface:
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
6. setup a bridge on the public interface, replacing the `eth0` blocks
with something like:
auto eth0
iface eth0 inet manual
auto br0
iface br0 inet static
address 204.8.99.101/24
gateway 204.8.99.254
bridge_ports eth0
bridge_stp off
bridge_fd 0
# IPv6 configuration
iface br0 inet6 static
accept_ra 0
address 2620:7:6002:0:3eec:efff:fed5:6b2a/64
gateway 2620:7:6002::1
6. allow modules to be loaded, cross your fingers that you didn't
screw up the network configuration above, and reboot:
touch /etc/no_modules_disabled
reboot
7. configure the node in Puppet by adding it to the
`roles::ganeti::dal` class, and run Puppet on the new node:
puppet agent -t
8. re-disable module loading:
rm /etc/no_modules_disabled
9. run puppet across the Ganeti cluster so firewalls are correctly
configured:
cumin -p 0 'C:roles::ganeti::dal 'puppet agent -t'
10. partition the extra disks, SSD:
for disk in /dev/sd[abcdef]; do
parted -s $disk mklabel gpt;
parted -s $disk -a optimal mkpart primary 0% 100%;
done &&
mdadm --create --verbose --level=10 --metadata=1.2 \
--raid-devices=6 \
/dev/md2 \
/dev/sda1 \
/dev/sdb1 \
/dev/sdc1 \
/dev/sdd1 \
/dev/sde1 \
/dev/sdf1 &&
dd if=/dev/random bs=64 count=128 of=/etc/luks/crypt_dev_md2 &&
chmod 0 /etc/luks/crypt_dev_md2 &&
cryptsetup luksFormat --key-file=/etc/luks/crypt_dev_md2 /dev/md2 &&
cryptsetup luksOpen --key-file=/etc/luks/crypt_dev_md2 /dev/md2 crypt_dev_md2 &&
pvcreate /dev/mapper/crypt_dev_md2 &&
vgcreate vg_ganeti /dev/mapper/crypt_dev_md2 &&
echo crypt_dev_md2 UUID=$(lsblk -n -o UUID /dev/md2 | head -1) /etc/luks/crypt_dev_md2 luks,discard >> /etc/crypttab &&
update-initramfs -u
NVMe:
for disk in /dev/nvme[23]n1; do
parted -s $disk mklabel gpt;
parted -s $disk -a optimal mkpart primary 0% 100%;
done &&
mdadm --create --verbose --level=1 --metadata=1.2 \
--raid-devices=2 \
/dev/md3 \
/dev/nvme2n1p1 \
/dev/nvme3n1p1 &&
dd if=/dev/random bs=64 count=128 of=/etc/luks/crypt_dev_md3 &&
chmod 0 /etc/luks/crypt_dev_md3 &&
cryptsetup luksFormat --key-file=/etc/luks/crypt_dev_md3 /dev/md3 &&
cryptsetup luksOpen --key-file=/etc/luks/crypt_dev_md3 /dev/md3 crypt_dev_md3 &&
pvcreate /dev/mapper/crypt_dev_md3 &&
vgcreate vg_ganeti_nvme /dev/mapper/crypt_dev_md3 &&
echo crypt_dev_md3 UUID=$(lsblk -n -o UUID /dev/md3 | head -1) /etc/luks/crypt_dev_md3 luks,discard >> /etc/crypttab &&
update-initramfs -u
Normally, this would have been done in the `setup-storage`
configuration, but we were in a rush. Note that we create
partitions because we're worried replacement drives might not have
exactly the same size as the ones we have. The above gives us a
1.4MB buffer at the end of the drive, and avoids having to
hard code disk sizes in bytes.
11. Reboot to test the LUKS configuration:
reboot
10. Then the node is ready to be added to the cluster, by running
this on the master node:
gnt-node add \
--no-ssh-key-check \
--no-node-setup \
If this is an entirely new cluster, you need a different
procedure, see [the cluster initialization procedure](#gnt-fsn-cluster-initialization) instead.
11. make sure everything is great in the cluster:
gnt-cluster verify
If the last step fails with SSH errors, you may need to re-synchronise
the SSH `known_hosts` file, see [SSH key verification failures](#ssh-key-verification-failures).
### gnt-dal cluster initialization
This procedure replaces the `gnt-node add` step in the initial setup
of the first Ganeti node when the `gnt-dal` cluster was setup.
Initialize the ganeti cluster:
--master-netdev eth1 \
--nic-parameters link=br0 \
--vg-name vg_ganeti \
--secondary-ip 172.30.131.101 \
--enabled-hypervisors kvm \
--mac-prefix 06:66:39 \
--no-ssh-init \
--no-etc-hosts \
dalgnt.torproject.org
The above assumes that `dalgnt` is already in DNS. See the [MAC
address prefix selection](#mac-address-prefix-selection) section for information on how the
`--mac-prefix` argument was selected.
Then the following extra configuration was performed:
```
gnt-cluster modify --reserved-lvs vg_system/root,vg_system/swap
gnt-cluster modify -H kvm:kernel_path=,initrd_path=
gnt-cluster modify -H kvm:security_model=pool
gnt-cluster modify -H kvm:kvm_extra='-device virtio-rng-pci\,bus=pci.0\,addr=0x1e\,max-bytes=1024\,period=1000 -global isa-fdc.fdtypeA=none'
gnt-cluster modify -H kvm:disk_cache=none
gnt-cluster modify -H kvm:disk_discard=unmap
gnt-cluster modify -H kvm:scsi_controller_type=virtio-scsi-pci
gnt-cluster modify -H kvm:disk_type=scsi-hd
gnt-cluster modify -H kvm:migration_bandwidth=950
gnt-cluster modify -H kvm:migration_downtime=500
gnt-cluster modify -H kvm:migration_caps=postcopy-ram
gnt-cluster modify -H kvm:cpu_type=host
gnt-cluster modify -D drbd:c-plan-ahead=0,disk-custom='--c-plan-ahead 0'
gnt-cluster modify -D drbd:net-custom='--verify-alg sha1 --max-buffers 8k'
gnt-cluster modify --uid-pool 4000-4019
```
The upper limit for CPU count and memory size were doubled, to 16 and
64G, respectively, with:
```
gnt-cluster modify --ipolicy-bounds-specs \
max:cpu-count=32,disk-count=16,disk-size=1048576,\
memory-size=307200,nic-count=8,spindle-use=12\
/min:cpu-count=1,disk-count=1,disk-size=1024,\
memory-size=128,nic-count=1,spindle-use=1
```
NOTE: watch out for whitespace here. The [original source](https://johnny85v.wordpress.com/2016/06/13/ganeti-commands/) for this
command had too much whitespace, which fails with:
Failure: unknown/wrong parameter name 'Missing value for key '' in option --ipolicy-bounds-specs'
The [network configuration](#network-configuration) (below) must also be performed for the
address blocks reserved in the cluster. This is the actual initial
configuration performed:
gnt-network add --network 204.8.99.128/25 --gateway 204.8.99.254 --network6 2620:7:6002::/64 --gateway6 2620:7:6002::1 gnt-dal-01
gnt-network connect --nic-parameters=link=br0 gnt-dal-01 default
Note that we reserve the first `/25` (204.8.99.0/25) for future
use. The above only uses the second half of the network in case we
need the rest of the network for other operations. A new network will
need to be added if we run out of IPs in the second half. This also
No IP was reserved as the gateway is already automatically reserved by
Ganeti. The node's public addresses are in the other /25 and also do
not need to be reserved in this allocation.
### Network configuration
IP allocation is managed by Ganeti through the `gnt-network(8)`
system. Say we have `192.0.2.0/24` reserved for the cluster, with
the host IP `192.0.2.100` and the gateway on `192.0.2.1`. You will
gnt-network add --network 192.0.2.0/24 --gateway 192.0.2.1 example-network
If there's also IPv6, it would look something like this:
gnt-network add --network 192.0.2.0/24 --gateway 192.0.2.1 --network6 2001:db8::/32 --gateway6 fe80::1 example-network
Note: the actual name of the network (`example-network`) above, should
follow the convention established in [doc/naming-scheme](doc/naming-scheme).
Then we associate the new network to the default node group:
gnt-network connect --nic-parameters=link=br0,vlan=4000,mode=openvswitch example-network default
The arguments to `--nic-parameters` come from the values configured in
the cluster, above. The current values can be found with `gnt-cluster
info`.
For example, the second ganeti network block was assigned with the
following commands:
gnt-network add --network 49.12.57.128/27 --gateway 49.12.57.129 gnt-fsn13-02
gnt-network connect --nic-parameters=link=br0,vlan=4000,mode=openvswitch gnt-fsn13-02 default
IP addresses can be reserved with the `--reserved-ips` argument to the
modify command, for example:
gnt-network modify --add-reserved-ips=38.229.82.2,38.229.82.3,38.229.82.4,38.229.82.5,38.229.82.6,38.229.82.7,38.229.82.8,38.229.82.9,38.229.82.10,38.229.82.11,38.229.82.12,38.229.82.13,38.229.82.14,38.229.82.15,38.229.82.16,38.229.82.17,38.229.82.18,38.229.82.19 gnt-chi-01 gnt-chi-01
Note that the gateway and nodes IP addresses are automatically
reserved, this is for hosts outside of the cluster.
The network name must follow the [naming convention](doc/naming-scheme).
## Upgrades
Ganeti upgrades need to be handled specially, and have their own
documentation in the [howto/upgrades](howto/upgrades) documents.
TODO: move procedures here?
## SLA
As long as the cluster is not over capacity, it should be able to
survive the loss of a node in the cluster unattended.
Justified machines can be provisionned within a few business days
without problems.
New nodes can be provisioned within a week or two, depending on budget
and hardware availability.
Our first Ganeti cluster (`gnt-fsn`) is made of multiple machines
hosted with [Hetzner Robot](https://robot.your-server.de/), Hetzner's dedicated server hosting
service. All machines use the same hardware to avoid problems with
live migration. That is currently a customized build of the
3275
3276
3277
3278
3279
3280
3281
3282
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292
3293
3294
3295
3296
3297
3298
### Network layout
Machines are interconnected over a [vSwitch](https://wiki.hetzner.de/index.php/Vswitch/en), a "virtual layer 2
network" probably implemented using [Software-defined Networking](https://en.wikipedia.org/wiki/Software-defined_networking)
(SDN) on top of Hetzner's network. The details of that implementation
do not matter much to us, since we do not trust the network and run an
IPsec layer on top of the vswitch. We communicate with the `vSwitch`
through [Open vSwitch](https://en.wikipedia.org/wiki/Open_vSwitch) (OVS), which is (currently manually)
configured on each node of the cluster.
There are two distinct IPsec networks:
* `gnt-fsn-public`: the public network, which maps to the
`fsn-gnt-inet-vlan` vSwitch at Hetzner, the `vlan-gntinet` OVS
network, and the `gnt-fsn` network pool in Ganeti. it provides
public IP addresses and routing across the network. instances get
IP allocated in this network.
* `gnt-fsn-be`: the private ganeti network which maps to the
`fsn-gnt-backend-vlan` vSwitch at Hetzner and the `vlan-gntbe` OVS
network. it has no matching `gnt-network` component and IP
addresses are allocated manually in the 172.30.135.0/24 network
through DNS. it provides internal routing for Ganeti commands and
3301
3302
3303
3304
3305
3306
3307
3308
3309
3310
3311
3312
3313
3314
3315
3316
3317
3318
3319
3320
3321
3322
3323
3324
3325
### MAC address prefix selection
The MAC address prefix for the gnt-fsn cluster (`00:66:37:...`) seems
to have been picked arbitrarily. While it does not conflict with a
known existing prefix, it could eventually be issued to a manufacturer
and reused, possibly leading to a MAC address clash. The closest is
currently Huawei:
$ grep ^0066 /var/lib/ieee-data/oui.txt
00664B (base 16) HUAWEI TECHNOLOGIES CO.,LTD
Such a clash is fairly improbable, because that new manufacturer would
need to show up on the local network as well. Still, new clusters
SHOULD use a different MAC address prefix in a [locally administered
address](https://en.wikipedia.org/wiki/MAC_address#Universal_vs._local) (LAA) space, which "are distinguished by setting the
second-least-significant bit of the first octet of the address". In
other words, the MAC address must have 2, 6, A or E as a its second
[quad](https://en.wikipedia.org/wiki/Nibble). In other words, the MAC address must look like one of those:
x2 - xx - xx - xx - xx - xx
x6 - xx - xx - xx - xx - xx
xA - xx - xx - xx - xx - xx
xE - xx - xx - xx - xx - xx
We used `06:66:38` in the gnt-chi cluster for that reason. We picked
`gnt-fsn` but varied the last quad (from `:37` to `:38`) to make them
slightly more different-looking.
Obviously, it's unlikely the MAC addresses will be compared across
clusters in the short term. But it's technically possible a MAC bridge
could be established if an exotic VPN bridge gets established between
the two networks in the future, so it's good to have some difference.
### Hardware variations
We considered experimenting with the new AX line ([AX51-NVMe](https://www.hetzner.com/dedicated-rootserver/ax51-nvme?country=OTHER)) but
in the past DSA had problems live-migrating (it wouldn't immediately
fail but there were "issues" after). So we might need to [failover](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-instance.html#failover)
instead of migrate between those parts of the cluster. There are also
doubts that the Linux kernel supports those shiny new processors at
all: similar processors had [trouble booting before Linux 5.5](https://www.phoronix.com/scan.php?page=news_item&px=Threadripper-3000-MCE-5.5-Fix) for
example, so it might be worth waiting a little before switching to
that new platform, even if it's cheaper. See the cluster configuration
section below for a larger discussion of CPU emulation.
### CPU emulation
Note that we might want to tweak the `cpu_type` parameter. By default,
it emulates a lot of processing that can be delegated to the host CPU
instead. If we use `kvm:cpu_type=host`, then each node will tailor the
emulation system to the CPU on the node. But that might make the live
migration more brittle: VMs or processes can crash after a live
migrate because of a slightly different configuration (microcode, CPU,
kernel and QEMU versions all play a role). So we need to find the
lowest common denominator in CPU families. The list of available
families supported by QEMU varies between releases, but is visible
with:
# qemu-system-x86_64 -cpu help
Available CPUs:
x86 486
x86 Broadwell Intel Core Processor (Broadwell)
[...]
x86 Skylake-Client Intel Core Processor (Skylake)
x86 Skylake-Client-IBRS Intel Core Processor (Skylake, IBRS)
x86 Skylake-Server Intel Xeon Processor (Skylake)
x86 Skylake-Server-IBRS Intel Xeon Processor (Skylake, IBRS)
[...]
The current [PX62 line][PX62-NVMe] is based on the [Coffee Lake](https://en.wikipedia.org/wiki/Coffee_Lake) Intel
micro-architecture. The closest matching family would be
`Skylake-Server` or `Skylake-Server-IBRS`, [according to wikichip](https://en.wikichip.org/wiki/intel/microarchitectures/coffee_lake#Compiler_support).
Note that newer QEMU releases (4.2, currently in unstable) have more
supported features.
In that context, of course, supporting different CPU manufacturers
(say AMD vs Intel) is impractical: they will have totally different
families that are not compatible with each other. This will break live
migration, which can trigger crashes and problems in the migrated
virtual machines.
If there are problems live-migrating between machines, it is still
possible to "failover" (`gnt-instance failover` instead of `migrate`)
which shuts off the machine, fails over disks, and starts it on the
other side. That's not such of a big problem: we often need to reboot
the guests when we reboot the hosts anyways. But it does complicate
our work. Of course, it's also possible that live migrates work fine
if *no* `cpu_type` at all is specified in the cluster, but that needs
to be verified.
Nodes could also [grouped](http://docs.ganeti.org/docs/ganeti/3.0/html/man-gnt-group.html) to limit (automated) live migration to a
Update: this was enabled in the `gnt-dal` cluster.
References:
* <https://dsa.debian.org/howto/install-ganeti/>
* <https://qemu.weilnetz.de/doc/qemu-doc.html#recommendations_005fcpu_005fmodels_005fx86>
The [ganeti-instance-debootstrap](https://tracker.debian.org/pkg/ganeti-instance-debootstrap) package is used to install
instances. It is configured through Puppet with the [shared ganeti
module](https://forge.puppet.com/smash/ganeti), which deploys a few hooks to automate the install as much
as possible. The installer will:
1. setup grub to respond on the serial console
2. setup and log a random root password
3. make sure SSH is installed and log the public keys and
fingerprints
4. create a 512MB file-backed swap volume at `/swapfile`, or
a swap partition if it finds one labeled `swap`
5. setup basic static networking through `/etc/network/interfaces.d`
1. add a few base packages
2. do our own custom SSH configuration
3. fix the hostname to be a FQDN
4. add a line to `/etc/hosts`
5. add a tmpfs
There is work underway to refactor and automate the install better,
see [ticket 31239](https://bugs.torproject.org/31239) for details.
## Services
TODO: document a bit how the different Ganeti services interface with
each other
## Storage
TODO: document how DRBD works in general, and how it's setup here in
particular.
See also the [DRBD documentation](howto/drbd).
The Cymru PoP has an iSCSI cluster for large filesystem
storage. Ideally, this would be automated inside Ganeti, some quick
links:
* [search for iSCSI in the ganeti-devel mailing list](https://www.mail-archive.com/search?l=ganeti-devel@googlegroups.com&q=iscsi&submit.x=0&submit.y=0)
* in particular a [discussion of integrating SANs into ganeti](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/P7JU_0YGn9s)
seems to say "just do it manually" (paraphrasing) and [this
discussion has an actual implementation](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/kkXFDgvg2rY), [gnt-storage-eql](https://github.com/atta/gnt-storage-eql)
* it could be implemented as an [external storage provider](https://github.com/ganeti/ganeti/wiki/External-Storage-Providers), see
the [documentation](http://docs.ganeti.org/ganeti/2.10/html/design-shared-storage.html)
* the DSA docs are in two parts: [iscsi](https://dsa.debian.org/howto/iscsi/) and [export-iscsi](https://dsa.debian.org/howto/export-iscsi/)
* someone made a [Kubernetes provisionner](https://github.com/nmaupu/dell-provisioner) for our hardware which
could provide sample code
For now, iSCSI volumes are manually created and passed to new virtual
machines.
## Queues
TODO: document gnt-job
## Interfaces
TODO: document the RAPI and ssh commandline
TODO: X509 certs and SSH
Ganeti is implemented in a mix of Python and Haskell, in a mature
codebase.
Ganeti relies heavily on [DRBD](howto/drbd) for live migrations.
There is no issue tracker specifically for this project, [File][] or
[search][] for issues in the [team issue tracker][search] with the
~Ganeti label.
[File]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
[search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues?label_name%5B%5D=Ganeti
Upstream Ganeti has of course its own [issue tracker on GitHub](https://github.com/ganeti/ganeti/issues).
TPA are the main direct operators of the services, but most if not all
TPI teams use its services either directly or indirectly.
Ganeti used to be a Google project until it was abandoned and spun off
to a separate, standalone free software community. Right now it is
maintained by a mixed collection of organisations and non-profits.
Anarcat implemented a Prometheus metrics exporter that writes stats in
the node exporter "textfile" collector. The source code is available
in `tor-puppet.git`, as
`profile/files/ganeti/tpa-ganeti-prometheus-metrics.py`. Those metrics
are in turn displayed in the [Ganeti Health](https://grafana.torproject.org/d/ce2db5a5-b42b-4454-8d81-ee95b09e229a/ganeti-health) Grafana dashboard.
The WMF worked on a [proper Ganeti exporter](https://github.com/ganeti/prometheus-ganeti-exporter) we should probably
switch to, once it is [packaged in Debian](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054138).
To test if a cluster is working properly, the `verify` command can be
ran:
gnt-cluster verify
Creating a VM and migrating it between machines is also a good test.
Ganeti logs a significant amount of information in
`/var/log/ganeti/`. Those logs are of particular interest:
* `node-daemon.log`: all low-level commands and HTTP requests on the
node daemon, includes, for example, LVM and DRBD commands
* `os/*$hostname*.log`: installation log for machine `$hostname`,
this also includes VM migration logs for the `move-instance` or
`gnt-instance export` commands
There are no backups of virtual machines directly from Ganeti: each
machine is expected to perform its own backups. The Ganeti
configuration should be backed up as normal by our [backup
## Other documentation
* [Ganeti](http://www.ganeti.org/)
* [Ganeti documentation home](http://docs.ganeti.org/)
* [Main manual](http://docs.ganeti.org/ganeti/master/html/)
* [Manual pages](http://docs.ganeti.org/ganeti/master/man/)
* [Wiki](https://github.com/ganeti/ganeti/wiki)
* [Issues](https://github.com/ganeti/ganeti/issues)
* [Google group](https://groups.google.com/forum/#!forum/ganeti)
* [Wikimedia foundation documentation](https://wikitech.wikimedia.org/wiki/Ganeti)
* [Riseup documentation](https://we.riseup.net/riseup+tech/ganeti)
* [DSA](https://dsa.debian.org/howto/install-ganeti/)
* [OSUOSL wiki](https://wiki.osuosl.org/ganeti/)
The Ganeti cluster has served us well over the years. This section
aims at discussing the current limitations and possible future.
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
3569
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
Ganeti works well for our purposes, which is hosting generic virtual
machine. It's less efficient at managing mixed-usage or specialized
setups like large file storage or high performance database, because
of cross-machine contamination and storage overhead.
## Security and risk assessment
No in-depth security review or risk assessment has been done on the
Ganeti clusters recently. It is believe the cryptography and design of
Ganeti cluster is sound. There's a concern with the server host keys
reuse and, in general, there's some confusion over what goes over TLS
and what goes over SSH.
Deleting VMs is relatively too easy in Ganeti. You just need one
confirmation, and a VM is completely wiped, so there's always a risk
of accidental removal.
## Technical debt and next steps
The ganeti-instance-debootstrap installer is slow and almost abandoned
upstream. It required significant patching to get cross-cluster
migrations working.
There are concerns that the DRBD and memory redundancy required by the
Ganeti allocators lead to resource waste, that is to be investigated
in [tpo/tpa/team#40799](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40799).
## Proposed Solution
No recent proposal was done for the Ganeti clusters, although the
Cymru migration is somewhat relevant:
- [TPA-RFC-40: Cymru migration](policy/tpa-rfc-40-cymru-migration)
- [TPA-RFC-43: Cymru migration plan](policy/tpa-rfc-43-cymru-migration-plan)
- [TPA-RFC-52: Cymru migration timeline](policy/tpa-rfc-52-cymru-migration-timeline)
## Other alternatives
Proxmox is probably the biggest contender here. OpenStack is also
marginally similar.
# Old libvirt cluster retirement
The project of creating a Ganeti cluster for Tor has appeared in the
summer of 2019. The machines were delivered by Hetzner in July 2019
and setup by weasel by the end of the month.
The goal was to replace the aging group of KVM servers (`kvm[1-5]`, AKA
`textile`, `unifolium`, `macrum`, `kvm4` and `kvm5`).
* arbitrary virtual machine provisionning
* redundant setup
* automated VM installation
* replacement of existing infrastructure
* fully configured in Puppet
* full high availability with automatic failover
* extra capacity for new projects
* Docker or "container" provisionning - we consider this out of scope
for now
* self-provisionning by end-users: TPA remains in control of
provisionning
A budget was proposed by weasel in may 2019 and approved by Vegas in
June. An extension to the budget was approved in january 2020 by
Vegas.
## Proposed Solution
Setup a Ganeti cluster of two machines with a Hetzner vSwitch backend.
The design based on the [PX62 line][PX62-NVMe] has the following monthly cost
structure:
* per server: 118EUR (79EUR + 39EUR for 2x10TB HDDs)
* IPv4 space: 35.29EUR (/27)
* IPv6 space: 8.40EUR (/64)
* bandwidth cost: 1EUR/TB (currently 38EUR)
At three servers, that adds up to around 435EUR/mth. Up to date costs
are available in the [Tor VM hosts.xlsx](https://nc.torproject.net/apps/onlyoffice/5395) spreadsheet.
## Alternatives considered
<!-- include benchmarks and procedure if relevant -->
Note that the instance install is possible also [through FAI, see the
Ganeti wiki for examples](https://github.com/ganeti/ganeti/wiki/System-template-with-FAI).
There are GUIs for Ganeti that we are not using, but could, if we want
to grant more users access:
* [Ganeti Web manager](https://ganeti-webmgr.readthedocs.io/) is a
"Django based web frontend for managing Ganeti virtualization
clusters. Since Ganeti only provides a command-line interface,
Ganeti Web Manager’s goal is to provide a user friendly web
interface to Ganeti via Ganeti’s Remote API. On top of Ganeti it
provides a permission system for managing access to clusters and
virtual machines, an in browser VNC console, and vm state and
resource visualizations"
* [Synnefo](https://www.synnefo.org/) is a "complete open source
cloud stack written in Python that provides Compute, Network,
Image, Volume and Storage services, similar to the ones offered by
AWS. Synnefo manages multiple Ganeti clusters at the backend for
handling of low-level VM operations and uses Archipelago to unify
cloud storage. To boost 3rd-party compatibility, Synnefo exposes
the OpenStack APIs to users."