Skip to content
Snippets Groups Projects
retire-a-host.mdwn 3.4 KiB
Newer Older
anarcat's avatar
anarcat committed
 1. long before (weeks or months) the machine is decomissioned, make
    sure users are aware it will go away and of its replacement services
 1. remove the host from `tor-nagios/config/nagios-master.cfg`
anarcat's avatar
anarcat committed
 2. if applicable, stop the VM: `virsh destroy $host`, or at least
    stop the primary service on the machine
 3. if applicable, undefine the VM: `virsh undefine $host`
 4. wipe host data, possibly with a delay:
anarcat's avatar
anarcat committed
  
    * if applicable, remove the LVM logical volumes or virtual disk
      files:
      
          echo 'lvremove -y vgname/lvname' | at now + 7 days
anarcat's avatar
anarcat committed

    * for a normal machine or a machine we do not own the parent host
      for, wipe the disks using the method described below

anarcat's avatar
anarcat committed
 5. remove it from ud-ldap: the host entry and any `@<host>` group memberships there might be as well as any `sudo` passwords users might have configured for that host
 6. if it has any associated records in `tor-dns/domains` or `auto-dns`, or upstream's reverse dns thing, remove it from there too
 7. on pauli: `read host ; puppet node clean $host.torproject.org && puppet node deactivate $host.torproject.org`
 8. grep the `tor-puppet` repo for the host (and maybe its IP addresses) and clean up
 9. clean host from `tor-passwords`
 10. remove from the machine from the [Nextcloud spreadsheet](https://nc.riseup.net/remote.php/webdav/tpa/Tor%20VM%20Hosts.xlsx)
 11. schedule a removal of the host's backup, on the backup server
     (currently `bungei`):
anarcat's avatar
anarcat committed
        echo rm -rf /srv/backups/bacula/$host/ | at now + 30 days
 12. if it's a physical machine or a virtual host we don't control,
     schedule removal from racks or hosts with upstream

TODO: remove the client from the Bacula catalog, see <https://trac.torproject.org/projects/tor/ticket/30880>.

## Wiping disks

To wipe disks on servers without a serial console or management
interface, you need to be a little more creative. If there's a RAID
array, first wipe one of the disks by taking it offline and writing
garbage:

    mdadm --fail /dev/md0 /dev/sdb1 &&
    mdadm --remove /dev/md0 /dev/sdb1 &&
    mdadm --fail /dev/md1 /dev/sdb2 &&
    mdadm --remove /dev/md1 /dev/sdb2 &&
    : etc, for the other RAID elements (see /proc/mdstat) &&
    badblocks -w -s -v -p 2 /dev/sdb

This will take a long time. When you return:

 1. start a `screen` session with a static `busybox` as your `SHELL`
    that will survive disk wiping:
        mkdir /root/tmp
        mount -t tmpfs tmpfs /root/tmp
        cp /bin/busybox /root/tmp/sh
        export SHELL=/root/tmp/sh
        exec screen -s $SHELL

 2. kill all processes but the SSH daemon, your SSH connexion and
    shell. this will vary from machine to machine, but a good way is
    to list all processes with `systemctl status` and `systemctl stop`
    the services one by one. Hint: multiple services can be passed on
    the same `stop` command, for example:

        systemctl stop acpid atd bacula-df bind9 cron ntp postfix prometheus-node-exporter prometheus-bind-exporter

 3. disable swap:

        swapoff -a

 4. unmount everything that can be unmounted (except `/proc`):

        umount -a

 5. remount everything else readonly:

        mount -o remount,ro /

 6. sync disks:

        sync

 7. wipe the remaining disk (note the dangerous `-f`) and shutdown:
        badblocks -w -s -v -p 2 -f /dev/sda ; \
        echo "SHUTTING DOWN FOREVER IN ONE MINUTE" ; \
        sleep 60 ; \
        echo o > /proc/sysrq-trigger