Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
T
team
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 129
    • Issues 129
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Issue Boards

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • The Tor Project
  • TPA
  • team
  • Wiki
    • Howto
  • new machine

Last edited by Antoine Beaupré Nov 19, 2020
Page history

new machine

  • How to
    • Burn-in
    • Installation
    • Post-install configuration
      • Pre-requisites
      • Main procedure
  • Reference
    • Design
    • Issues
  • Discussion
    • Overview
    • Goals
      • Must have
      • Nice to have
      • Non-Goals
    • Approvals required
    • Proposed Solution
    • Cost
    • Alternatives considered

How to

Burn-in

Before we even install the machine, we should do some sort of stress-testing or burn-in so that we don't go through the lengthy install process and put into production fautly hardware.

This implies testing the various components to see if they support a moderate to high load. A tool like stressant can be used for that purpose, but a full procedure still needs to be established.

Example stressant run:

apt install stressant
stressant --email torproject-admin@torproject.org --overwrite --writeSize 10% --diskRuntime 120m --logfile fsn-node-04-sda.log --diskDevice /dev/sda

Stressant is still in development and currently has serious limitations (e.g. it tests one disk at a time) but should be a good way to get started.

Installation

This document assumes the machine is already installed with a Debian operating system. We preferably install stable or, when close to the release, testing. Here are site-specific installs:

  • Hetnzer Cloud
  • Hetzner Robot
  • Ganeti clusters:
    • new virtual machine: new instance procedure
    • new nodes (which host virtual machines) new node procedure, normally done as a post-install configuration
  • linaro: howto/openstack
  • Cymru

The following sites are not documented yet:

  • sunet: possible like Linaro's howto/openstack, each TPA admin has their own account there

The following sites are deprecated:

  • howto/KVM/libvirt (really at Hetzner) - replaced by Ganeti
  • scaleway - see ticket 32920

Post-install configuration

The post-install configuration mostly takes care of bootstrapping Puppet and everything else follows from there. There are, however, still some unrelated manual steps but those should eventually all be automated (see ticket #31239 for details of that work).

Pre-requisites

The procedure below assumes the following steps have already been taken by the installer:

  1. partitions have been correctly setup, including some (>=1GB) swap space (or at least a swap file) and a tmpfs in /tmp

  2. a minimal Debian install with security updates has been booted (see also ticket #31957 for upgrade automation)

  3. a hostname has been set, picked from the doc/naming-scheme

  4. a public IP address has been set and the host is available over SSH on that IP address. this can be fixed with:

    fab -H root@88.99.194.57 host.rewrite-interfaces 88.99.194.57 26 88.99.194.1 2a01:4f8:221:2193::2 64 fe80::1

    If the IPv6 address is not known, it might be guessable from the MAC address. Try this:

    ipv6calc --action prefixmac2ipv6 --in prefix+mac --out ipv6 $SUBNET $MAC

    ... where $SUBNET is the (known) subnet from the upstream provider and $MAC is the MAC address as found in ip link show up.

    Make sure reverse DNS is correct as well.

  5. the machine has a short hostname (e.g. test) which resolves to a fully qualified domain name (e.g. test.torproject.org) in the torproject.org domain (i.e. /etc/hosts is correctly configured). this can be fixed with:

    fab -H root@88.99.194.57 host.rewrite-hosts fsn-node-05.torproject.org 88.99.194.57
  6. DNS works on the machine (i.e. /etc/resolv.conf is configured to talk to a working resolver, but not necessarily ours, which Puppet will handle)

  7. a root password has been set in the password manager (TODO: move to trocla? #33332)

Main procedure

All commands to be run as root unless otherwise noted.

  1. if the machine is not inside a ganeti cluster (which has its own inventory), allocate and document the machine in the Nextcloud spreadsheet, and the services page, if it's a new service

  2. clone the tsa-misc git repository on the machine:

    git clone https://git.torproject.org/admin/tsa-misc.git
  3. add to ldap on alberti using:

    ldapvi -ZZ --encoding=ASCII --ldap-conf -h db.torproject.org -D "uid=$USER,ou=users,dc=torproject,dc=org"

    To generate the LDAP block, you can use the tor-install-generate-ldap script in tsa-misc/installer. Make sure you review all fields, in particular location (l), physicalHost, description and purpose which do not have good defaults.

    See the howto/upgrades section for information about the rebootPolicy field.

    See also the ldapvi manual for more information.

  4. generate host snippets for the new node, on alberti:

    sudo -u sshdist ud-generate && sudo -H ud-replicate && sudo puppet agent -t

    This step is necessary to have Puppet open its firewall to the new node.

  5. bootstrap puppet:

    • on the new machine run the installer/puppet-bootstrap-client from the tsa-misc git repo cloned earlier. copy-paste the generated checksum literally (including the filename) into the script waiting on the Puppetmaster above.

    • on the Puppetmaster (currently pauli), run the tpa-puppet-sign-client script, which will stop to prompt you for a checksum. it is generated in the next step

    Note that those scripts are new and haven't been thoroughly tested, see ticket #32914 for details.

  6. reboot the new machine to make sure that still works:

    reboot
  7. add to howto/nagios, in tor-nagios/config/nagios-master.cfg (TODO: puppetize, in ticket #32901)

  8. if the machine is handling mail, add it to dnswl.org (password in tor-passwords, hosts-extra-info)

Reference

Design

If you want to understand better the different installation procedures there is a install flowchart that was made on Draw.io.

install.png

There are also per-site install graphs:

  • install-hetzner-cloud.png
  • install-hetzner-robot.png
  • install-ganeti.png

To edit those graphics, head to the https://draw.io website (or install their Electron desktop app) and load the install.drawio file.

Those diagrams were created as part of the redesign of the install process, to better understand the various steps of the process and see how they could be refactored. They should not be considered an authoritative version of how the process should be followed.

The text representation in this wiki remains the reference copy.

Issues

Issues regarding installation on new machines are far ranging and do not have a specific component.

The install system is manual and not completely documented for all sites. It needs to be automated, which is discussed below and in ticket 31239: automate installs.

A good example of the problems that can come up with variations in the install process is ticket 31781: ping fails as a regular user on new VMs.

Discussion

This section discusses background and implementation details of installation of machines in the project. It shouldn't be necessary for day to day operation.

Overview

The current install procedures work, but have only recently been formalized, mostly because we rarely setup machines. We do expect, however, to setup a significant number of machines in 2019, or at least significant enough to warrant automating the install process better.

Automating installs is also critical according to Tom Limoncelli, the author of the Practice of System and Network Administration. In their Ops report card, question 20 explains:

If OS installation is automated then all machines start out the same. Fighting entropy is difficult enough. If each machine is hand-crafted, it is impossible.

If you install the OS manually, you are wasting your time twice: Once when doing the installation and again every time you debug an issue that would have been prevented by having consistently configured machines.

If two people install OSs manually, half are wrong but you don't know which half. Both may claim they use the same procedure but I assure you they are not. Put each in a different room and have them write down their procedure. Now show each sysadmin the other person's list. There will be a fistfight.

In that context, it's critical to automate a reproducible install process. This gives us a consistent platform that Puppet runs on top of, with no manual configuration.

Goals

The project of automating the install is documented in ticket 31239.

Must have

  • unattended installation
  • reproducible results
  • post-installer configuration (ie. not full installer, see below)
  • support for running in our different environments (Hetzner Cloud, Robot, bare metal, Ganeti...)

Nice to have

  • packaged in Debian
  • full installer support:
    • RAID, LUKS, etc filesystem configuration
    • debootstrap, users, etc

Non-Goals

  • full configuration management stack - that's done by howto/puppet

Approvals required

TBD.

Proposed Solution

The solution being explored right now is assume the existence of a rescue shell (SSH) of some sort and use fabric to deploy everything on top of it, up to puppet. Then everything should be "puppetized" to remove manual configuration steps. See also ticket 31239 for the discussion of alternatives, which are also detailed below.

Cost

TBD.

Alternatives considered

  • Ansible - configuration management that duplicates howto/puppet but which we may want to use to bootstrap machines instead of yet another custom thing that operators would need to learn.
  • cloud-init - builtin to many cloud images (e.g. Amazon), can do rudimentary filesystem setup (no RAID/LUKS/etc but ext4 and disk partitionning is okay), config can be fetched over HTTPS, assumes it runs on first boot, but could be coerced to run manually (e.g. fgrep -r cloud-init /lib/systemd/ | grep Exec)
  • cobbler - takes care of PXE and boot, delegates to kickstart the autoinstall, more relevant to RPM-based distros
  • curtin - "a "fast path" installer designed to install Ubuntu quickly. It is blunt, brief, snappish, snippety and unceremonious." ubuntu-specific, not in Debian, but has strong partitionning support with ZFS, LVM, LUKS, etc support. part of the larger MAAS project
  • FAI - built by a debian developer, used to build live images since buster, might require complex setup (e.g. an NFS server), setup-storage(8) might be reusable on its own. uses Tar-based images created by FAI itself, requires network control or custom ISO boot, requires a "server" (the fai-server package), not directly supported by Ganeti, although there are hacks to make it work
  • himblick has some interesting post-install configure bits in Python, along with pyparted bridges
  • list of debian setup tools, see also AutomatedInstallation
  • livewrapper is also one of those installers, in a way
  • vmdb2 - a rewrite of vmdeboostrap, which uses a YAML file to describe a set of "steps" to take to install Debian, should work on VM images but also disks, no RAID support and a significant number of bugs might affect reliability in production
  • MAAS - PXE-based, assumes network control which we don't have and has all sorts of features we don't want
  • howto/puppet - Puppet could bootstrap itself, with puppet apply ran from a clone of the git repo. could be extended as deep as we want.
  • terraform - config management for the cloud kind of thing, supports Hetzner Cloud, but not Hetzner Robot or Ganeti

Unfortuantely, I ruled out the official debian-installer because of the complexity of the preseeding system and partman. It also wouldn't work for installs on Hetzner Cloud or Ganeti.

Clone repository

Quick links

  • How to get help!
  • User documentation
  • Sysadmin howtos
  • Services
  • Policies
  • Meetings
  • Roadmaps