provide relay health prometheus metrics via MetricsPort
tor recently got support for MetricsPort
in v0.4.5.1-alpha (#40063 (closed)).
For more context to this feature request see: https://lists.torproject.org/pipermail/tor-dev/2019-February/013655.html
I'm proposing to add the following prometheus metrics (incl. labels), all metrics show absolute counters since tor started: (feel free to add constraints like reducing granularity of counters or only updating counters once every x minutes for safety reasons)
on exit relays (DNS related metrics)
- tor_relay_exit_dns_errors{reason="timeout"}
- tor_relay_exit_dns_errors{reason="SERVFAIL"}
- tor_relay_exit_dns_errors{reason="REFUSED"}
DNS RCODEs: https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-6
I'm not sure if this is even visible to tor (unless ServerDNSResolvConfFile is used) but if possible this kind of data would ideally be available for each resolver IP so the relay operator can detect and disable the faulty resolver:
-
tor_relay_exit_dns_errors{reason="timeout", resolver="1.1.1.1"}
-
tor_relay_exit_dns_errors{reason="timeout", resolver="8.8.8.8"}
-
...
-
tor_relay_exit_maxdnsqueriespercircuit max amount of DNS queries caused by a single circuit since tor started
-
exit stats as defined in (if enabled in torrc) https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1197
other relay metrics
- tor_memory_bytes total amount of memory used by the tor process in bytes
- tor_relay_dos_circuitskilledwithtoomanycells
- tor_relay_dos_circuitsrejected
- tor_relay_dos_markedaddress
- tor_relay_dos_connectionsclosed
- tor_relay_dos_singlehopclientsrefused
- tor_relay_dos_introduce2rejected
- tor_relay_opencircuits currently open tor circuits
- tor_relay_connections{v="v1",direction="initiated"}
- tor_relay_connections{v="v1",direction="received"}
- tor_relay_connections{v="v2",direction="initiated"}
- tor_relay_connections{v="v2",direction="received"}
- tor_relay_connections{v="v3"...
- tor_relay_traffic{direction="sent"} total traffic sent in bytes
- tor_relay_traffic{direction="received"} total traffic received in bytes
- tor_relay_circuit_handshakes{proto="TAP"}
- tor_relay_circuit_handshakes{proto="NTor"}
- tor_relay_uptime tor process uptime in seconds
- tor_relay_version used tor version
- tor_relay_version_recommended boolean to indicate whether the used version is recommended
- ...
Flags
- tor_relay_flag_stable
- tor_relay_flag_guard
- tor_relay_flag_exit
- tor_relay_flag_...
Some more:
-
amount of closed/failed circuits broken down by their reason value https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1402
-
amount of closed/failed OR connections broken down by their reason value https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n2202
-
cell stats (if enabled in torrc) as defined in: https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1137