migrate gitlab-02 to new gnt-dal cluster
we're going to host more and more gitlab stuff in object storage (e.g. #41425 (closed)) and already have runners there. it makes sense to move gitlab-02 to the new gnt-dal cluster, which has faster disks and more powerful CPUs.
this should help us deal with the current overload in the gnt-fsn cluster as well (incident #41429 (closed)).
plan:
-
communicate date of outage to all of tor -
add planned maintenance item on status.tpo -
lower TTL for subdomains that need it to 5mins -- gitlab-02
(aliases which don't need to change: gitlab, containers, gitaly),*pages
-
prepare dal cluster for accepting instances from fsn, if needed (e.g. RAPI cert, RAPI passwords, firewall) -
zerofree on the backup partition (@anarcat) -
zerofree on the other partitions(not done because it requires downtime, as it needs readonly partition) -
wipe free space on volume group (create lv that covers all the remaining free space and wipe it out with dd) (@lelutin) - on the day of the maintenance window
-
run steps 5 and 6 of prep -
stop puppet on ganeti nodes on the dal cluster, finalize RAPI prep -
enable maintenance mode on gitlab https://docs.gitlab.com/ee/administration/maintenance_mode/ and set message banner https://gitlab.torproject.org/admin/broadcast_messages if necessary to warn people of the read-only state -
stop the instance and start the transfer -
change IP in instance after the move -
test that gitlab and all other websites are replying properly, reconfigure grub-pc, test reboot of instance -
switch DNS entries to point to new IP -
disable maintenance mode -
schedule destruction of old instance
-
- after the move is finished
-
communicate to everyone that the move is over and that things are back to normal operation -
remove password files and cert files created by the prep before migration -
bring the TTL back to the default value of 1h for gitlab-02
and*pages
-
verify that we still have data for all elements in the gitlab omnibus dashboard on grafana -
update disaster recover section of gitlab service docs with our findings
-
Edited by lelutin