onionprobe exporter uses too much disk space
In #41070 (closed) we had a situation where the prometheus server was threatening to fill up its 160GB disk. part of the problem was the high cardinality of the systemd and logind components of the node exporter, which have been disabled, but we're still seeing high cardinality in another exporter, onionprobe.
It's unclear if we're still going to run out of disk space or when, so this is not an emergency yet, but it would be nice if the onionprobe folks (@rhatto) could look at this issue and see if we could reduce the cardinality in labels.
here's a part of the output of https://prometheus.torproject.org/classic/status (u: tor-guest, no password), copied here for convenience. you can see the updated_at
label has a lot of instances. i suspect that's part of onionprobe, but haven't checked, would be worth double-checking. you can definitely see it's rivaling the node exporter in terms of usage as the job=onionprobe
is using almost as many pairs as the job=node
, and the latter runs on every TPA server (~100 machines).. so that's a lot!
Highest Cardinality Labels
Name | Count |
---|---|
updated_at | 50736 |
hsdir | 1703 |
name | 1594 |
relname | 1018 |
name | 659 |
address | 411 |
device | 326 |
instance | 165 |
grpc_method | 151 |
endpoint | 149 |
Highest Cardinality Metric Names
Name | Count |
---|---|
onion_service_descriptor_fetch_attempts | 6826 |
onion_service_descriptor_reachable | 6798 |
node_cpu_seconds_total | 6424 |
onion_service_descriptor_latency | 6350 |
onion_service_introduction_points_number | 6350 |
onion_service_connection_attempts | 5984 |
onion_service_reachable | 5984 |
onion_service_status_code | 5864 |
onion_service_latency | 5864 |
node_scrape_collector_success | 3744 |
Label Names With Highest Cumulative Label Value Length
Name | Length |
---|---|
updated_at | 856780 |
hsdir | 88423 |
name | 52941 |
relname | 26462 |
name | 19333 |
filename | 11764 |
address | 9867 |
instance | 4997 |
endpoint | 4923 |
device | 3186 |
Most Common Label Pairs
Name | Count |
---|---|
job=node | 98023 |
alias=hetzner-nbg1-01.torproject.org | 52030 |
job=onionprobe | 51473 |
instance=hetzner-nbg1-01.torproject.org:9935 | 51473 |
classes=role::undefined | 27916 |
classes=role::ganeti::chi | 24791 |
protocol=http | 24176 |
port=80 | 24176 |
classes=role::ganeti::fsn | 22911 |
reachable=1 | 12214 |
/cc @rhatto