Skip to content
Snippets Groups Projects
Verified Commit 18e807b4 authored by anarcat's avatar anarcat
Browse files

prepare announcement for cymru migration timeline (team#40972)

parent 04420ea5
No related branches found
No related tags found
No related merge requests found
......@@ -22,6 +22,7 @@ and add it to the above list.
* [TPA-RFC-38: Setting Up a Wiki Service](policy/tpa-rfc-38-new-wiki-service)
* [TPA-RFC-45: Mail architecture](policy/tpa-rfc-45-mail-architecture)
* [TPA-RFC-47: Email account retirement](policy/tpa-rfc-47-email-account-retirement)
* [TPA-RFC-52: Cymru migration timeline](policy/tpa-rfc-52-cymru-migration-timeline)
## Proposed
......
---
title: TPA-RFC-52: Cymru migration coming, ~1h expected on some machines
---
[[_TOC_]]
Summary: migration of the remaining Cymru services in the coming week,
help needed to test new servers.
# What?
TPA will be migrating a little over a dozen machines off of the old
Cymru cluster in Chicago to a shiny new cluster in Dallas. The actual
list of affected machines is here:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-43-cymru-migration-plan#instance-table
Only the entries marked "gnt-chi" are being moved at this time.
Members of the anticensorship and metrics teams are particularly
affected, but services like BTCpayserver, dangerzone, onionbalance, and
static site deplyements from GitLab will also be affected.
# When?
We hope to start migrating the VMs on Monday 2023-03-20, but this is
likely to continue during the rest of the week, as we may stop the
migration process if we encounter problems.
# How?
Each VM is migrated one by one, following roughly this process:
1. A snapshot is taken on the source cluster, then copied to the target
2. the VM is shutdown on the source
3. the target VM is renumbered so it's networked, but DNS still points
to the old VM
4. the service is tested
5. if it works, then DNS records are changed to point to the new
VM
6. after a week, the old VMs are destroyed
The TTL ("Time To Live") in DNS is currently an hour so the outage will
last at least that long, for each VM. Depending on the size of the VM,
the transfer could actually take much longer as well. So far a 20GB VM
is transfered in about 10 minutes.
Affected team members are encouraged to coordinate with us over IRC
during the maintenance window to test the new service (step 4
above). You may also ask for an extension before the destruction of the
old VM in step 6.
# Why?
The details of that move are discussed briefly in this past proposal:
https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-40-cymru-migration
The migration took longer than expected partly because I hit a snag in
the VM migration routines, which required some serious debugging and
patching.
Now we finally have an automated job to batch-migrate VMs between Ganeti
clusters. This means that not only will we be evacuating the Cymru
cluster very soon, but we also have a clean mechanism to do this again,
much faster, the next time we're in such a situation.
# Deadline
This proposal will be enacted on Monday March 20th unless an objection
is raised.
# Status
This proposal is currently in the `draft` state.
# References
Comments welcome in [tpo/tpa/team#40972][]
[tpo/tpa/team#40972]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40972)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment