Skip to content
Snippets Groups Projects

Nagios/Icinga service for Tor Project infrastructure

How-to

Getting status updates

How to run a nagios check manually on a host (TARGET.tpo)

NCHECKFILE=$(egrep -A 4 THE-SERVICE-TEXT-FROM-WEB | egrep '^ *nrpe:' | cut -d : -f 2 | tr -d ' |"')
NCMD=$(ssh -t TARGET.tpo grep "$NCHECKFILE" /etc/nagios -r)
: NCMD is the command that's being run. If it looks sane, run it. With --verbose if you like more output.
ssh -t TARGET.tpo "$NCMD" --verbose

Changing the Nagios configuration

Hosts and services are managed in the config/nagios-master.cfg YAML configuration file, kept in the admin/tor-nagios.git repository. Make changes with a normal text editor, commit and push:

$EDITOR config/nagios-master.cfg
git commit -a
git push

Carefully watch the output of the git push command! If there is an error, your changes won't show up (and the commit is still accepted).

Forcing a rebuild of the configuration

If the Nagios configuration seems out of sync with the YAML config, a rebuild of the configuration can be forced with this command on the Nagios server:

touch /home/nagiosadm/tor-nagios/config/nagios-master.cfg && sudo -u nagiosadm make -C /home/nagiosadm/tor-nagios/config

Alternatively, changing the .cfg file and pushing a new commit should trigger this as well.

Batch jobs

You can run batch commands from the web interface, thanks to Icinga's changes to the UI. But there is also a commandline client called icli which can do this from the commandline, on the Icinga server.

This, for example, will queue recheck jobs on all problem hosts:

icli -z '!o,!A,!S,!D' -a recheck

This will run the dsa-update-apt-status command on all problem hosts:

cumin "$(ssh hetzner-hel1-01.torproject.org "icli -z'"'!o,!A,!S,!D'"'" | grep ^[a-z] | sed 's/$/.torproject.org or/') false" dsa-update-apt-status

It's kind of an awful hack -- take some time to appreciate the quoting required for those ! -- which might not be necessary with later Icinga releases. Icinga 2 has a REST API and its own command line console which makes icli completely obsolete.

Reference

Design

Config generation

The Nagios/Icinga configuration gets generated from the config/nagios-master.cfg YAML configuration file stored in the admin/tor-nagios.git repository. The generation works like this:

  1. the git server has a post-receive hook (in /srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin%tor-nagios/trigger-nagios-build)

  2. ... which launches a "trigger" on the Nagios server, like so:

    ssh -i ~/.ssh/gitweb -l nagiosadm hetzner-hel1-01 -- -trigger-
  3. that SSH key, deployed from Puppet (so in /etc/ssh/puppetkeys/nagiosadm), calls the /home/nagiosadm/bin/from-git-rw which then...

  4. creates or updates (git clone or git pull) the git repository in ~/tor-nagios/config...

  5. then calls make in the directory, which calls ./build-nagios to generate the files in ~/tor-nagios/config/generated/

  6. then calls make install in the config directory, which deploys the config file (using rsync) in /etc/inciga/from-git and also pushes the NRPE config to the Puppet server in nagiospush@pauli.torproject.org:/etc/puppet/modules/nagios/files/tor-nagios/generated/nrpe_tor.cfg

  7. then finally reloads incinga