textile is one of the first machines in the KVM* series. weasel proposed we move all its VM into the new FSN cluster and retire the box to start saving some money, and eventually grow the cluster.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
i'm all for making use of the new cluster, but I'm thinking that we should also keep space on the fsn* cluster to alleviate the load on kvm4, which is suffering a bit.
i'd also like to move the bacula director off of moly, which i distrust, see also #29974 (moved).
saxatile removed from ldap, disks queued for removal in 7 days, removed from puppet, nothing left in DNS, entry removed from spreadsheet, nagios, tor-passwords, and scheduled backup deletion in 30 days.
i've disconnected the SVN and chiwui shutdown from this procedure because I don't want to wait for those projects to complete before decommissionning textile.
i've done an import of gayi as well and improved the procedure to support the multiple disks in use there as well as the swap initialization. i've also reversed the sync path so we don't have to trust the old node.
unfortunately, fsn-node-03 has shown (HDD) disk problems which is (obviously) slowing down this work. they replaced the drive but the problem came up again, so i'll put this on hold for now.
the base images are all stored in /srv for now, a logical volume created for that purpose. i'll wait for news from hetzner before going any further.
worst case we setup a second ganeti node or just use the SSDs, which haven't shown sign of problems so far.
There are two old IP addresses, strangely: 138.201.14.212 and 138.201.14.213, respectively chiwui2 and chiwui4 in DNS, not sure why that is.
i migrated a first version of the machine over and things still seem to work, although it's hard to tell if TBB is pinging the old server or the new (probably the former, unfortunately). will start the final migration now.
the two IP addresses are necessary for check to operate, because there are two services on port 80 (the normal webserver and tordnssel). the latter also requires IP changes in /srv, which should be grepped on top of /etc for IP address in the final run.
in the meantime, the TTL is hurting us: there are 1h records on that thing, so everything is timing out.... https://check.torproject.org/ works in a regular browser when hacking at my /etc/hosts however, so presumably that part will work once the tor network catches up (through DNS propagation)
so let's wait. i flipped DNS at around Tue Feb 4 14:49:00 2020 -0500 (commit time, pushed shortly after). so things should coverge in maximum ~32 minutes now.
"done" (only removed from autostart but kept the xml file in case we want to restore this in a pinch)
done:
root@textile:/etc/libvirt# echo 'rm -r /srv/vmstore/chiwui.torproject.org/' | at now + 7 days
warning: commands will be executed using /bin/sh
job 6 at Tue Feb 11 20:31:00 2020
5. N/A 6. N/A 7. N/A 8. N/A 9. N/A 10. moved to the gnt-fsn cluster in the spreadsheet 11. N/A 12. N/A 13. N/A 14. N/A 15. N/Achiwui can be considered fully migrated now. next step is to decomission textile, on february 11th.
[...]Reading and comparing: doneTesting with pattern 0x55: doneReading and comparing: 60.29% done, 22:53:22 elapsed. (0/0/0 errors)
23 hours! that seems unreasonable. i have interrupted that process, installed nwipe, and changed the procedure to use nwipe instead of badblocks.
we're now at the final wiping stage. i have a screen open with nwipe running without a GUI. last estimates I saw (in the GUI) were about 6 hours for one drive, so we might expect 12 hours from now on for a complete wipe.
once the wipe completes, i'll tell hetzner to decom the server.
6. done, removed from auto-dns and domains 7. done: ```root@pauli:/home/anarcat# host=textile && puppet node clean $host.torproject.org && puppet node deactivate $host.torproject.orgNotice: Revoked certificate with serial 69Notice: Removing file Puppet::SSL::Certificate textile.torproject.org at '/var/lib/puppet/ssl/ca/signed/textile.torproject.org.pem'textile.torproject.orgSubmitted 'deactivate node' for textile.torproject.org with UUID fcf5579f-1369-45a3-b230-76382aa1f634
removed textile from a bunch of places in puppet (ipsec, hiera, hosters.yaml, tor-install-VM and virt.pp) see ccee6856 in tor-puppet.git. the grep pattern is actually grep -r -e 138.201.66.71 -e 2a01:4f8:172:1b46::2 -e 138.201.14. -e 2a01:4f8:172:1b46:0:abba: -e 172.30.131.
cleaned from tor-passwords/hosts and hosts-extra-info
deleted the textile worksheet in the spreadsheet, whoohoo! (it was empty)
removed from nagios
scheduled backup removal:
root@bungei:/srv/backups/bacula# echo rm -rf textile.torproject.org-OLD/ | at now + 30 dayswarning: commands will be executed using /bin/shjob 18 at Fri Mar 13 15:51:00 2020
N/A
"canceled" the server with hetzner, "The earliest possible cancellation date is 17 February 2020."
not a mail host
and we're all done, assuming hetzner does close down the server in 5 days. whoohoo!
Trac: Status: accepted to closed Resolution: N/Ato fixed