migrate gitlab-02 to new gnt-dal cluster

we're going to host more and more gitlab stuff in object storage (e.g. #41425 (closed)) and already have runners there. it makes sense to move gitlab-02 to the new gnt-dal cluster, which has faster disks and more powerful CPUs.

this should help us deal with the current overload in the gnt-fsn cluster as well (incident #41429 (closed)).

plan:

  • communicate date of outage to all of tor
  • add planned maintenance item on status.tpo
  • lower TTL for subdomains that need it to 5mins -- gitlab-02 (aliases which don't need to change: gitlab, containers, gitaly), *pages
  • prepare dal cluster for accepting instances from fsn, if needed (e.g. RAPI cert, RAPI passwords, firewall)
  • zerofree on the backup partition (@anarcat)
  • zerofree on the other partitions (not done because it requires downtime, as it needs readonly partition)
  • wipe free space on volume group (create lv that covers all the remaining free space and wipe it out with dd) (@lelutin)
  • on the day of the maintenance window
    • run steps 5 and 6 of prep
    • stop puppet on ganeti nodes on the dal cluster, finalize RAPI prep
    • enable maintenance mode on gitlab https://docs.gitlab.com/ee/administration/maintenance_mode/ and set message banner https://gitlab.torproject.org/admin/broadcast_messages if necessary to warn people of the read-only state
    • stop the instance and start the transfer
    • change IP in instance after the move
    • test that gitlab and all other websites are replying properly, reconfigure grub-pc, test reboot of instance
    • switch DNS entries to point to new IP
    • disable maintenance mode
    • schedule destruction of old instance
  • after the move is finished
    • communicate to everyone that the move is over and that things are back to normal operation
    • remove password files and cert files created by the prep before migration
    • bring the TTL back to the default value of 1h for gitlab-02 and *pages
    • verify that we still have data for all elements in the gitlab omnibus dashboard on grafana
    • update disaster recover section of gitlab service docs with our findings
Edited by lelutin