Check OnionPerf instances from Nagios
There are a few things that we can check, some are easier than others.
- Is the host up and the webserver running? (this is easy with built-in checks)
- Is the tgen server running on the Internet? (this is easy with built-in checks)
- Is the analyze task running? (needs a plugin)
- Is the tgen server running on an Onion service? (needs a plugin)
For monitoring the Onion service, I'm looking at reusable plugins, so there are two tests. One checks to see how old the descriptor is and a second test actually tries connecting to the service. The first of these tests is affected by legacy/trac#28269 (moved) (but not blocked) and both are blocked by [redacted].
As a workaround for monitoring the Onion service, which really is the bit that is breaking, we can instead monitor the analysis of timeouts from Tor Metrics' CSV files.