From 18e807b4f1f0b8074978d67b06ca14267bea2e04 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
Date: Wed, 15 Mar 2023 23:16:49 -0400
Subject: [PATCH] prepare announcement for cymru migration timeline
 (tpo/tpa/team#40972)

---
 policy.md                                     |  1 +
 policy/tpa-rfc-52-cymru-migration-timeline.md | 81 +++++++++++++++++++
 2 files changed, 82 insertions(+)
 create mode 100644 policy/tpa-rfc-52-cymru-migration-timeline.md

diff --git a/policy.md b/policy.md
index c2432db9..84a4b0a8 100644
--- a/policy.md
+++ b/policy.md
@@ -22,6 +22,7 @@ and add it to the above list.
  * [TPA-RFC-38: Setting Up a Wiki Service](policy/tpa-rfc-38-new-wiki-service)
  * [TPA-RFC-45: Mail architecture](policy/tpa-rfc-45-mail-architecture)
  * [TPA-RFC-47: Email account retirement](policy/tpa-rfc-47-email-account-retirement)
+ * [TPA-RFC-52: Cymru migration timeline](policy/tpa-rfc-52-cymru-migration-timeline)
 
 ## Proposed
 
diff --git a/policy/tpa-rfc-52-cymru-migration-timeline.md b/policy/tpa-rfc-52-cymru-migration-timeline.md
new file mode 100644
index 00000000..0f43da93
--- /dev/null
+++ b/policy/tpa-rfc-52-cymru-migration-timeline.md
@@ -0,0 +1,81 @@
+---
+title: TPA-RFC-52: Cymru migration coming, ~1h expected on some machines
+---
+
+[[_TOC_]]
+
+Summary: migration of the remaining Cymru services in the coming week,
+help needed to test new servers.
+
+# What?
+
+TPA will be migrating a little over a dozen machines off of the old
+Cymru cluster in Chicago to a shiny new cluster in Dallas. The actual
+list of affected machines is here:
+
+https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-43-cymru-migration-plan#instance-table
+
+Only the entries marked "gnt-chi" are being moved at this time.
+
+Members of the anticensorship and metrics teams are particularly
+affected, but services like BTCpayserver, dangerzone, onionbalance, and
+static site deplyements from GitLab will also be affected.
+
+# When?
+
+We hope to start migrating the VMs on Monday 2023-03-20, but this is
+likely to continue during the rest of the week, as we may stop the
+migration process if we encounter problems.
+
+# How?
+
+Each VM is migrated one by one, following roughly this process:
+
+ 1. A snapshot is taken on the source cluster, then copied to the target
+ 2. the VM is shutdown on the source
+ 3. the target VM is renumbered so it's networked, but DNS still points
+    to the old VM
+ 4. the service is tested
+ 5. if it works, then DNS records are changed to point to the new
+    VM
+ 6. after a week, the old VMs are destroyed
+
+The TTL ("Time To Live") in DNS is currently an hour so the outage will
+last at least that long, for each VM. Depending on the size of the VM,
+the transfer could actually take much longer as well. So far a 20GB VM
+is transfered in about 10 minutes.
+
+Affected team members are encouraged to coordinate with us over IRC
+during the maintenance window to test the new service (step 4
+above). You may also ask for an extension before the destruction of the
+old VM in step 6.
+
+# Why?
+
+The details of that move are discussed briefly in this past proposal:
+
+https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-40-cymru-migration
+
+The migration took longer than expected partly because I hit a snag in
+the VM migration routines, which required some serious debugging and
+patching.
+
+Now we finally have an automated job to batch-migrate VMs between Ganeti
+clusters. This means that not only will we be evacuating the Cymru
+cluster very soon, but we also have a clean mechanism to do this again,
+much faster, the next time we're in such a situation.
+
+# Deadline
+
+This proposal will be enacted on Monday March 20th unless an objection
+is raised.
+
+# Status
+
+This proposal is currently in the `draft` state.
+
+# References
+
+Comments welcome in [tpo/tpa/team#40972][]
+
+[tpo/tpa/team#40972]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40972)
-- 
GitLab