Nagios/Icinga service for Tor Project infrastructure
How-to
Getting status updates
- Using a web browser: https://nagios.torproject.org/cgi-bin/icinga/status.cgi?allunhandledproblems&sortobject=services&sorttype=1&sortoption=2
- On IRC: /j #tor-nagios
- Over email: Add your email address to
tor-nagios/config/static/objects/contacts.cfg
How to run a nagios check manually on a host (TARGET.tpo)
NCHECKFILE=$(egrep -A 4 THE-SERVICE-TEXT-FROM-WEB | egrep '^ *nrpe:' | cut -d : -f 2 | tr -d ' |"')
NCMD=$(ssh -t TARGET.tpo grep "$NCHECKFILE" /etc/nagios -r)
: NCMD is the command that's being run. If it looks sane, run it. With --verbose if you like more output.
ssh -t TARGET.tpo "$NCMD" --verbose
Changing the Nagios configuration
Hosts and services are managed in the config/nagios-master.cfg
YAML
configuration file, kept in the admin/tor-nagios.git
repository. Make changes with a normal text editor, commit and push:
$EDITOR config/nagios-master.cfg
git commit -a
git push
Carefully watch the output of the git push
command! If there is an
error, your changes won't show up (and the commit is still accepted).
Forcing a rebuild of the configuration
If the Nagios configuration seems out of sync with the YAML config, a rebuild of the configuration can be forced with this command on the Nagios server:
touch /home/nagiosadm/tor-nagios/config/nagios-master.cfg && sudo -u nagiosadm make -C /home/nagiosadm/tor-nagios/config
Alternatively, changing the .cfg
file and pushing a new commit
should trigger this as well.
Batch jobs
You can run batch commands from the web interface, thanks to Icinga's changes to the UI. But there is also a commandline client called icli which can do this from the commandline, on the Icinga server.
This, for example, will queue recheck jobs on all problem hosts:
icli -z '!o,!A,!S,!D' -a recheck
This will run the dsa-update-apt-status
command on all problem
hosts:
cumin "$(ssh hetzner-hel1-01.torproject.org "icli -z'"'!o,!A,!S,!D'"'" | grep ^[a-z] | sed 's/$/.torproject.org or/') false" dsa-update-apt-status
It's kind of an awful hack -- take some time to appreciate the quoting
required for those !
-- which might not be necessary with later
Icinga releases. Icinga 2 has a REST API and its own command
line console which makes icli
completely obsolete.
Reference
Design
Config generation
The Nagios/Icinga configuration gets generated from the
config/nagios-master.cfg
YAML configuration file stored in the
admin/tor-nagios.git
repository. The generation works like this:
-
the git server has a post-receive hook (in
/srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin%tor-nagios/trigger-nagios-build
) -
... which launches a "trigger" on the Nagios server, like so:
ssh -i ~/.ssh/gitweb -l nagiosadm hetzner-hel1-01 -- -trigger-
-
that SSH key, deployed from Puppet (so in
/etc/ssh/puppetkeys/nagiosadm
), calls the/home/nagiosadm/bin/from-git-rw
which then... -
creates or updates (
git clone
orgit pull
) the git repository in~/tor-nagios/config
... -
then calls
make
in the directory, which calls./build-nagios
to generate the files in~/tor-nagios/config/generated/
-
then calls
make install
in theconfig
directory, which deploys the config file (usingrsync
) in/etc/inciga/from-git
and also pushes the NRPE config to the Puppet server innagiospush@pauli.torproject.org:/etc/puppet/modules/nagios/files/tor-nagios/generated/nrpe_tor.cfg
-
then finally reloads incinga