Skip to content

GitLab

Explore

Sign in

TPA-RFC-72: move donate-01 VM to gnt-dal cluster

in tpo/web/donate-neo#134 (closed), we've identified severe latency issues with the donation site. @lavamind suspected the tunnel crossing the atlantic might be causing those issues, and analysis (tpo/web/donate-neo#134 (comment 3083019)) shows there's indeed a 200-400ms latency over that link, which is causing severe disruptions.

at first, @lavamind thought we should move the crm-int-01 machine next to the donate-01 machine in the gnt-dal cluster, but it's actually the other way around. the crm-* machines were moved to gnt-dal over a year ago, in #41109 (closed).

so what we need to do is to move the donate-01 VM instead.

next steps:

make a migration plan (for now, below is an inter-cluster migration plan inspired by #41109 (comment 2900087), ~~but should we just rebuild a donate-02?~~)
review the migration plan (@lavamind)

inter-cluster migration plan

Before:

schedule an outage with stakeholders
Look for any hard-coded IPs in donate and puppet code

Review cross-cluster transfer procedure, see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ganeti/#cross-cluster-migrations
Announce outage on status.tpo (status-site!66 (merged) waiting for merge)

During:

~~Toggle maintenance mode on frontend~~ we don't have one? tpo/web/donate-neo#107
Suspend Puppet on origin and destination clusters
Deploy required firewall rules on origin and destination nodes
Transfer donate-01 (see procedure in https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/ganeti/#actual-vm-migration)
Renumber IP addresses
Fix backend IPSec tunnel IPs

After:

Clear temporary firewall rules
Reenable Puppet
~~Disable frontend maintenance mode~~?
Validate donate site works, see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/donate#testing-the-donation-site
Mark status.tpo entry as resolved

Edited Oct 02, 2024 by Jérôme Charaoui

Assignee Loading

Time tracking Loading

Confidentiality

Confidentiality controls have moved to the issue actions menu () at the top of the page.