Unverified Commit 305361d3 authored by anarcat's avatar anarcat
Browse files

expand on the work done

parent 9586db68
......@@ -22,20 +22,33 @@ alerts, metrics pipeline experience.
1. Documentation!
- document what prometheus2 is doing https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/prometheus#monitored-services
- document all of our anti-censorship alerts in one place (where?) https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/prometheus
- [documented what the prometheus2 server is doing](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/prometheus#monitored-services)
- document all of our anti-censorship alerts in one place
(where?): not completed?
2. Expand our prometheus metrics for anti-censorship services
- export existing snowflake metrics for prometheus - see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/prometheus#adding-metrics-for-users
- add disk space/RAM/CPU monitoring for anti-censorship services (isn't this already covered? i'm not sure :) not for snowflake, this is why documentation is the first step i guess XD) - just install a node exporter and tell me the endpoint :)
- expand the metrics tor exports for prometheus
- [export existing snowflake metrics for prometheus](https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/32) -
in general see [those guidelines for adding metrics](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/prometheus#adding-metrics-for-users)
- add disk space/RAM/CPU monitoring for anti-censorship services:
some of those are already covered for by TPA, on TPA
machines. external services should be monitored explicitly:
install the [Prometheus node exporter](https://github.com/prometheus/node_exporter) ([Debian package](https://tracker.debian.org/pkg/prometheus-node-exporter))
and tell TPA which URL to scrape
- expand the metrics tor exports for Prometheus: not done?
3. Play around with prometheus alert rules to recognize both outages and trends
3. Play around with prometheus alert rules to recognize both outages
and trends
- tor exports prometheus data out of the metrics port now!
- we did some work on alerting
- we setup basic alerts on bridgestrap metrics to monitor bridges
4. Figure out where to send all of our alerts
- We could end emails to our existing anti-censorship alerts mailing list: https://lists.torproject.org/cgi-bin/mailman/listinfo/anti-censorship-alerts
- Make sure we're also noticing logged errors for our services (we currently only use those for debugging)
- emails are sent to [our existing anti-censorship alerts mailing list](https://lists.torproject.org/cgi-bin/mailman/listinfo/anti-censorship-alerts)
- Make sure we're also noticing logged errors for our services (we
currently only use those for debugging) - advice from anarcat:
log analysis is hard and annoying; instead, export error- or
warning-specific counters in metrics and do alerting on that,
you can dig in the logs to see the exact errors afterwards
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment