Nagios/Icinga service for Tor Project infrastructure [[_TOC_]] # How-to ## Getting status updates - Using a web browser: https://nagios.torproject.org/cgi-bin/icinga/status.cgi?allunhandledproblems&sortobject=services&sorttype=1&sortoption=2 - On IRC: /j #tor-nagios - Over email: Add your email address to `tor-nagios/config/static/objects/contacts.cfg` ## How to run a nagios check manually on a host (TARGET.tpo) NCHECKFILE=$(egrep -A 4 THE-SERVICE-TEXT-FROM-WEB | egrep '^ *nrpe:' | cut -d : -f 2 | tr -d ' |"') NCMD=$(ssh -t TARGET.tpo grep "$NCHECKFILE" /etc/nagios -r) : NCMD is the command that's being run. If it looks sane, run it. With --verbose if you like more output. ssh -t TARGET.tpo "$NCMD" --verbose ## Changing the Nagios configuration Hosts and services are managed in the `config/nagios-master.cfg` YAML configuration file, kept in the `admin/tor-nagios.git` repository. Make changes with a normal text editor, commit and push: $EDITOR config/nagios-master.cfg git commit -a git push Carefully watch the output of the `git push` command! If there is an error, your changes won't show up (and the commit is still accepted). ## Forcing a rebuild of the configuration If the Nagios configuration seems out of sync with the YAML config, a rebuild of the configuration can be forced with this command on the Nagios server: touch /home/nagiosadm/tor-nagios/config/nagios-master.cfg && sudo -u nagiosadm make -C /home/nagiosadm/tor-nagios/config Alternatively, changing the `.cfg` file and pushing a new commit should trigger this as well. ## Batch jobs You can run batch commands from the web interface, thanks to Icinga's changes to the UI. But there is also a commandline client called [icli](https://tracker.debian.org/pkg/icli) which can do this from the commandline, on the Icinga server. This, for example, will queue recheck jobs on all problem hosts: icli -z '!o,!A,!S,!D' -a recheck This will run the `dsa-update-apt-status` command on all problem hosts: cumin "$(ssh hetzner-hel1-01.torproject.org "icli -z'"'!o,!A,!S,!D'"'" | grep ^[a-z] | sed 's/$/.torproject.org or/') false" dsa-update-apt-status It's kind of an awful hack -- take some time to appreciate the quoting required for those `!` -- which might not be necessary with later Icinga releases. Icinga 2 has a [REST API](https://icinga.com/docs/icinga-2/latest/doc/12-icinga2-api/) and its own [command line console](https://icinga.com/docs/icinga-2/latest/doc/11-cli-commands/#cli-command-console) which makes `icli` completely obsolete. ## Adding a new admin user When a user needs to be added to the admin group, follow the steps below in the `tor-nagios.git` repository 1. Create a new contact for the user in `config/static/objects/contacts.cfg`: ``` define contact{ contact_name <username> alias <username> service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,r service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email email <email>+nagios@torproject.org } ``` 2. Add the user to `authorized_for_full_command_resolution` and `authorized_for_configuration_information` in `config/static/cgi.cfg`: ``` authorized_for_full_command_resolution=user1,foo,bar,<new user> authorized_for_configuration_information=user1,foo,bar,<new user> ``` # Reference ## Design ### Config generation The Nagios/Icinga configuration gets generated from the `config/nagios-master.cfg` YAML configuration file stored in the `admin/tor-nagios.git` repository. The generation works like this: 1. the [git server](git) has a post-receive hook (in `/srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin%tor-nagios/trigger-nagios-build`) 2. ... which launches a "trigger" on the Nagios server, like so: ssh -i ~/.ssh/gitweb -l nagiosadm hetzner-hel1-01 -- -trigger- 3. that SSH key, deployed from Puppet (so in `/etc/ssh/puppetkeys/nagiosadm`), calls the `/home/nagiosadm/bin/from-git-rw` which then... 4. creates or updates (`git clone` or `git pull`) the git repository in `~/tor-nagios/config`... 5. then calls `make` in the directory, which calls `./build-nagios` to generate the files in `~/tor-nagios/config/generated/` 7. then calls `make install` in the `config` directory, which deploys the config file (using `rsync`) in `/etc/inciga/from-git` and also pushes the NRPE config to the [Puppet server](puppet) in `nagiospush@pauli.torproject.org:/etc/puppet/modules/nagios/files/tor-nagios/generated/nrpe_tor.cfg` 8. then finally reloads incinga