From 7f8c37ee58ad37643c5120d77fea26ca009daa75 Mon Sep 17 00:00:00 2001
From: "Iain R. Learmonth" <irl@fsfe.org>
Date: Tue, 29 Oct 2019 15:34:26 +0000
Subject: [PATCH] onionoo: disaster recovery

---
 metrics/ops/onionoo-ops.mdwn | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/metrics/ops/onionoo-ops.mdwn b/metrics/ops/onionoo-ops.mdwn
index 32ef1795..6193260e 100644
--- a/metrics/ops/onionoo-ops.mdwn
+++ b/metrics/ops/onionoo-ops.mdwn
@@ -141,12 +141,46 @@ Logs for the hourly updater can be found in
 
 # DISASTER RECOVERY
 
-## Single backend failure
+## Single backend data corruption, no hardware failure
+
+```
+sudo -u onionoo -i bash -c 'systemctl --user stop onionoo'
+sudo -u onionoo-unpriv -i bash -c 'systemctl --user stop onionoo-web'
+rm -rf /srv/onionoo.torproject.org/onionoo/home/{.,}\*
+rm -rf /srv/onionoo.torproject.org/onionoo/home-unpriv/{.,}\*
+rm -rf /srv/onionoo.torproject.org/onionoo/onionoo/{.,}\*
+```
+
+Then pretend you are deploying a new backend from the instructions above.
+
+
+## Single backend failure, hardware failure
+
+In the event of a single backend failure, ask TSA to trash it and make a new
+one. Once Puppet has configured their side of it, pretend you are deploying a
+new backend from the instructions above.
 
 ## Total loss
 
+In the event of a total loss, ask TSA to trash all the backends and make new
+ones. Once Puppet has configured one host, restore the state and out
+directories from the latest good backup. It may be necessary to refer to the
+logs to work out when the latest good backup might be, which should also be
+backed up. Once state and out are in place, pretend you are deploying a new
+backend from the instructions above.
+
 ## Total loss including all backups
 
+In the event that the backups have also been lost, it will not be possible to
+restore history. The data does exist in CollecTor to do this, but there is no
+code that actually does it.
+
+If no out directory is present on the instance when the Ansible playbook is run
+to install and start the service, it will perform an initial single run of the
+updater to bootstrap. This will be where history starts.
+
+Try to avoid this happening.
+
 # SERVICE LEVEL AGREEMENT
 
 # SEE ALSO
-- 
GitLab