From 18e807b4f1f0b8074978d67b06ca14267bea2e04 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org> Date: Wed, 15 Mar 2023 23:16:49 -0400 Subject: [PATCH] prepare announcement for cymru migration timeline (tpo/tpa/team#40972) --- policy.md | 1 + policy/tpa-rfc-52-cymru-migration-timeline.md | 81 +++++++++++++++++++ 2 files changed, 82 insertions(+) create mode 100644 policy/tpa-rfc-52-cymru-migration-timeline.md diff --git a/policy.md b/policy.md index c2432db9..84a4b0a8 100644 --- a/policy.md +++ b/policy.md @@ -22,6 +22,7 @@ and add it to the above list. * [TPA-RFC-38: Setting Up a Wiki Service](policy/tpa-rfc-38-new-wiki-service) * [TPA-RFC-45: Mail architecture](policy/tpa-rfc-45-mail-architecture) * [TPA-RFC-47: Email account retirement](policy/tpa-rfc-47-email-account-retirement) + * [TPA-RFC-52: Cymru migration timeline](policy/tpa-rfc-52-cymru-migration-timeline) ## Proposed diff --git a/policy/tpa-rfc-52-cymru-migration-timeline.md b/policy/tpa-rfc-52-cymru-migration-timeline.md new file mode 100644 index 00000000..0f43da93 --- /dev/null +++ b/policy/tpa-rfc-52-cymru-migration-timeline.md @@ -0,0 +1,81 @@ +--- +title: TPA-RFC-52: Cymru migration coming, ~1h expected on some machines +--- + +[[_TOC_]] + +Summary: migration of the remaining Cymru services in the coming week, +help needed to test new servers. + +# What? + +TPA will be migrating a little over a dozen machines off of the old +Cymru cluster in Chicago to a shiny new cluster in Dallas. The actual +list of affected machines is here: + +https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-43-cymru-migration-plan#instance-table + +Only the entries marked "gnt-chi" are being moved at this time. + +Members of the anticensorship and metrics teams are particularly +affected, but services like BTCpayserver, dangerzone, onionbalance, and +static site deplyements from GitLab will also be affected. + +# When? + +We hope to start migrating the VMs on Monday 2023-03-20, but this is +likely to continue during the rest of the week, as we may stop the +migration process if we encounter problems. + +# How? + +Each VM is migrated one by one, following roughly this process: + + 1. A snapshot is taken on the source cluster, then copied to the target + 2. the VM is shutdown on the source + 3. the target VM is renumbered so it's networked, but DNS still points + to the old VM + 4. the service is tested + 5. if it works, then DNS records are changed to point to the new + VM + 6. after a week, the old VMs are destroyed + +The TTL ("Time To Live") in DNS is currently an hour so the outage will +last at least that long, for each VM. Depending on the size of the VM, +the transfer could actually take much longer as well. So far a 20GB VM +is transfered in about 10 minutes. + +Affected team members are encouraged to coordinate with us over IRC +during the maintenance window to test the new service (step 4 +above). You may also ask for an extension before the destruction of the +old VM in step 6. + +# Why? + +The details of that move are discussed briefly in this past proposal: + +https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-40-cymru-migration + +The migration took longer than expected partly because I hit a snag in +the VM migration routines, which required some serious debugging and +patching. + +Now we finally have an automated job to batch-migrate VMs between Ganeti +clusters. This means that not only will we be evacuating the Cymru +cluster very soon, but we also have a clean mechanism to do this again, +much faster, the next time we're in such a situation. + +# Deadline + +This proposal will be enacted on Monday March 20th unless an objection +is raised. + +# Status + +This proposal is currently in the `draft` state. + +# References + +Comments welcome in [tpo/tpa/team#40972][] + +[tpo/tpa/team#40972]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40972) -- GitLab