From 54ea56d46698e4d62c1d592b23da72c25887f232 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org> Date: Wed, 8 Jun 2022 15:56:55 -0400 Subject: [PATCH] document how to trace check calls (used in tpo/tpa/team#40795) --- howto/nagios.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/howto/nagios.md b/howto/nagios.md index f5ea195b..954c88da 100644 --- a/howto/nagios.md +++ b/howto/nagios.md @@ -89,6 +89,41 @@ authorized_for_full_command_resolution=user1,foo,bar,<new user> authorized_for_configuration_information=user1,foo,bar,<new user> ``` +## Pager playbook + +### What is this alert anyways? + +Say you receive a mysterious alert and you have no idea what it's +about. Take, for example, [tpo/tpa/team#40795](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40795): + + 09:35:23 <nsa> tor-nagios: [gettor-01] application service - gettor status is CRITICAL: 2: b[AUTHENTICATIONFAILED] Invalid credentials (Failure) + +To figure out what triggered this error, follow this procedure: + + 1. log into the Nagios web interface at https://nagios.torproject.org + + 2. find the broken service, for example by listing all [unhandled + problems](https://nagios.torproject.org/cgi-bin/icinga/status.cgi?allunhandledproblems) + + 3. click on the actual service name to see details + + 4. find the "executed command" field and click on "Command Expander" + + 5. this will show you the "Raw commandline" that nagios runs to do + this check, in this case it is a NRPE check that calls + `tor_application_service` on the other end + + 6. if it's an NRPE check, log on the remote host and run the command, + otherwise, the command is ran on the nagios host + +In this case, the error can be reproduced with: + + root@gettor-01:~# /usr/lib/nagios/plugins/dsa-check-statusfile /srv/gettor.torproject.org/check/status + 2: b'[AUTHENTICATIONFAILED] Invalid credentials (Failure)' + +In this case, it seems like the status file is under the control of +the service administrator, which should be contacted for followup. + # Reference ## Design -- GitLab