From e29bf157883d31db4dfb373fb666eef19e1f76c4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org> Date: Mon, 4 Oct 2021 13:11:29 -0400 Subject: [PATCH] add a pager playbook for today's static mirror outage Related to tpo/tpa/team#40432. --- howto/static-component.md | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/howto/static-component.md b/howto/static-component.md index 1affb828..e44ced18 100644 --- a/howto/static-component.md +++ b/howto/static-component.md @@ -169,11 +169,27 @@ If we do *not* want to keep a vanity site, we should also do this: ## Pager playbook -TODO: add a pager playbook. +### Out of date mirror -<!-- information about common errors from the monitoring system and --> -<!-- how to deal with them. this should be easy to follow: think of --> -<!-- your future self, in a stressful situation, tired and hungry. --> +If you see an error like this in Nagios: + +> mirror static sync - deb: CRITICAL: 1 mirror(s) not in sync (from oldest to newest): 95.216.163.36 + +It means that Nagios has checked the given host +(`hetzner-hel1-03.torproject.org`, in this case) is not in sync for +the `deb` component, which is <https://deb.torproject.org>. + +In this case, it was because of a prolonged outage on that host, which +made it unreachable to the master server ([tpo/tpa/team#40432](https://gitlab.torproject.org/tpo/tpa/team/-/issues/incident/40432)). + +The solution is to run a manual sync. This can be done by, for +example, pushing to Jenkins or running `static-update-component` by +hand, see [doc/static-sites](doc/static-sites). + +In this particular case, the solution is simply to run this on the +static source (`palmeri` at the time of writing): + + static-update-component deb.torproject.org ## Disaster recovery -- GitLab