TPA-RFC-17: establish a global disaster recovery plan
in our service template, we have a "Disaster recovery" section, but it's not very detailed. furthermore, it's per service, and doesn't cover stuff like "Hetzner goes down for a long time", or "we get ransomware'd", or "john's laptop got hacked, now what".
so this ticket is about setting a global disaster recovery policy that takes all of those cases into accounts. it should also make sure we have disaster recovery scenarios for all services, which will include coordinating with other teams. and, of course, it will require coordinating with everyone to make sure the plan makes sense.
finally, it will probably require budgeting some work in the future so that we not only have an idea of a plan, but have measures in place to mitigate disasters (e.g. another backup server, everyone gets a yubikey, etc).
i started brain-dumping ideas in this RFC: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-17-disaster-recovery
this, in turn, could inform the security policy as well, see tpo/team#41