Newer
Older
Nagios/Icinga service for Tor Project infrastructure
## Getting status updates
- Using a web browser: https://nagios.torproject.org/cgi-bin/icinga/status.cgi?allunhandledproblems&sortobject=services&sorttype=1&sortoption=2
- On IRC: /j #tor-nagios
- Over email: Add your email address to `tor-nagios/config/static/objects/contacts.cfg`
## How to run a nagios check manually on a host (TARGET.tpo)
NCHECKFILE=$(egrep -A 4 THE-SERVICE-TEXT-FROM-WEB | egrep '^ *nrpe:' | cut -d : -f 2 | tr -d ' |"')
NCMD=$(ssh -t TARGET.tpo grep "$NCHECKFILE" /etc/nagios -r)
: NCMD is the command that's being run. If it looks sane, run it. With --verbose if you like more output.
ssh -t TARGET.tpo "$NCMD" --verbose
Hosts and services are managed in the `config/nagios-master.cfg` YAML
configuration file, kept in the `admin/tor-nagios.git`
repository. Make changes with a normal text editor, commit and push:
$EDITOR config/nagios-master.cfg
git commit -a
git push
Carefully watch the output of the `git push` command! If there is an
error, your changes won't show up (and the commit is still accepted).
## Forcing a rebuild of the configuration
If the Nagios configuration seems out of sync with the YAML config, a
rebuild of the configuration can be forced with this command on the
Nagios server:
touch /home/nagiosadm/tor-nagios/config/nagios-master.cfg && sudo -u nagiosadm make -C /home/nagiosadm/tor-nagios/config
Alternatively, changing the `.cfg` file and pushing a new commit
should trigger this as well.
## Batch jobs
You can run batch commands from the web interface, thanks to Icinga's
changes to the UI. But there is also a commandline client called
[icli](https://tracker.debian.org/pkg/icli) which can do this from the commandline, on the Icinga
server.
This, for example, will queue recheck jobs on all problem hosts:
icli -z '!o,!A,!S,!D' -a recheck
This will run the `dsa-update-apt-status` command on all problem
hosts:
cumin "$(ssh hetzner-hel1-01.torproject.org "icli -z'"'!o,!A,!S,!D'"'" | grep ^[a-z] | sed 's/$/.torproject.org or/') false" dsa-update-apt-status
It's kind of an awful hack -- take some time to appreciate the quoting
required for those `!` -- which might not be necessary with later
Icinga releases. Icinga 2 has a [REST API](https://icinga.com/docs/icinga-2/latest/doc/12-icinga2-api/) and its own [command
line console](https://icinga.com/docs/icinga-2/latest/doc/11-cli-commands/#cli-command-console) which makes `icli` completely obsolete.
## Adding a new admin user
When a user needs to be added to the admin group, follow the steps below in the `tor-nagios.git` repository
1. Create a new contact for the user in `config/static/objects/contacts.cfg`:
```
define contact{
contact_name <username>
alias <username>
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
email <email>+nagios@torproject.org
}
```
2. Add the user to `authorized_for_full_command_resolution` and `authorized_for_configuration_information` in `config/static/cgi.cfg`:
```
authorized_for_full_command_resolution=user1,foo,bar,<new user>
authorized_for_configuration_information=user1,foo,bar,<new user>
```
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# Reference
## Design
### Config generation
The Nagios/Icinga configuration gets generated from the
`config/nagios-master.cfg` YAML configuration file stored in the
`admin/tor-nagios.git` repository. The generation works like this:
1. the [git server](git) has a post-receive hook (in
`/srv/git.torproject.org/git-helpers/post-receive-per-repo.d/admin%tor-nagios/trigger-nagios-build`)
2. ... which launches a "trigger" on the Nagios server, like so:
ssh -i ~/.ssh/gitweb -l nagiosadm hetzner-hel1-01 -- -trigger-
3. that SSH key, deployed from Puppet (so in
`/etc/ssh/puppetkeys/nagiosadm`), calls the
`/home/nagiosadm/bin/from-git-rw` which then...
4. creates or updates (`git clone` or `git pull`) the git repository
in `~/tor-nagios/config`...
5. then calls `make` in the directory, which calls `./build-nagios`
to generate the files in `~/tor-nagios/config/generated/`
7. then calls `make install` in the `config` directory, which deploys
the config file (using `rsync`) in `/etc/inciga/from-git` and also
pushes the NRPE config to the [Puppet server](puppet) in
`nagiospush@pauli.torproject.org:/etc/puppet/modules/nagios/files/tor-nagios/generated/nrpe_tor.cfg`
8. then finally reloads incinga