Skip to content

Evaluate loki for parsing metrics services logs

We are parsing metrics services logs and spotting specific error raised manually.

We also have a few scripts under https://gitlab.torproject.org/tpo/network-health/metrics/monitoring-and-alerting/-/tree/main/services that count numbers of warn and error messages in services logs and send those to prometheus.

In https://grafana2.torproject.org/d/8eKgI1knz/tor-metrics-services?orgId=1 I have also have a few lines to track warnings and errors and see where these are generated.

I have been thinking to evaluate Loki (https://grafana.com/oss/loki/) and use it within grafana to be able to analyze logs and spot issues that I am now missing. Something that I have found useful in the past is be able to see the stack trace of an error from the logs and assign labels to it without having to grep through the log manually.

I have also been thinking to add more labels to the log trace to track in prometheus but maybe that's not the optimal way to use prometheus for this.

CC: @anarcat

Edited by Hiro