anarcat · 0a7ce22a
--- a/howto/puppet.md
+++ b/howto/puppet.md
@@ -629,11 +629,47 @@ Revocation procedures problems were discussed in [33587][] and [33446][].
 ## Pager playbook
-<!-- information about common errors from the monitoring system and -->
+### catalog run: PuppetDB warning: did not update since...
-<!-- how to deal with them. this should be easy to follow: think of -->
-<!-- your future self, in a stressful situation, tired and hungry. -->
-TODO.
+If you see an error like:
+    Check last node runs from PuppetDB WARNING - cupani.torproject.org did not update since 2020-05-11T04:38:54.512Z
+It can also be eventually accompanied with the puppet server reporting
+the same problem:
+    Subject: ** PROBLEM Service Alert: pauli/puppet - all catalog runs is WARNING **
+    [...]
+    Check last node runs from PuppetDB WARNING - cupani.torproject.org did not update since 2020-05-11T04:38:54.512Z
+One of the following is happening, in decreasing likeliness:
+ 1. the node's Puppet manifest has an error of some sort that makes it
+    impossible to run the catalog
+ 2. the node is down and has failed to report since the last time
+    specified
+ 3. the Puppet **server** is down and **all** nodes will fail to
+    report in the same way (in which case a lot more warnings will
+    show up, and other warnings about the server will come in)
+The first situation will usually happen after someone pushed a commit
+introducing the error. We try to keep all manifests compiling all the
+time and such errors should be immediately fixed. Look at the history
+of the Puppet source tree and try to identify the faulty
+commit. Reverting such a commit is acceptable to restore the service.
+The second situation can happen if a node is in maintenance for an
+extended duration. Normally, the node will recover when it goes back
+online. If a node is to be permanently retired, it should be removed
+from Puppet, using the [host retirement procedures][retire-a-host].
+Finally, if the main Puppet **server** is down, it should definitely
+be brought back up. See disaster recovery, below.
+In any case, running the Puppet agent on the affected node should give
+more information:
+    ssh NODE puppet agent -t
 ## Disaster recovery