Commit aa629739 authored by Gaba's avatar Gaba 🦋
Browse files

Merge branch 'Burnleydev-main-patch-45879' into 'main'

Add visualize Tor metric data notes

See merge request !12
parents bca4edab acf656bb
# Tor Hackweek Project: Visualize Tor metrics's data in ways that it can be useful for community
Summary: The goal is to have a dashboard with Tor usage per country in a way that is easy to see big changes happening. Right now we need to select each country to see the Tor usage. It would be good to have a way to see all the countries and the onews where usage is increasing (via bridge and direct connection).
Skills Needed: data visualization
# Team:
* gus (UTC-3)
* gaba (UTC-3)
* tara(?) (UTC +1)
* joydeep (UTC +5:30)
* djackson (UTC +1)
# Merging with the other project with similar goal:
- [hackweek-prometheus-alerts onion link](http://kfahv6wfkbezjyg4r6mlhpmieydbebr5vkok5r34ya464gqz6c44bnyd.onion/p/2021-hackweek-prometheus-alerts)
# PLAN THURSDAY MARCH 31ST
- get a domain for mb.torproject.org
- add shutdown data to metabase and add a dashboard - done
- import user relays data to metabase - done
- anomaly detection queries
- alert system [link here](https://www.metabase.com/docs/latest/users-guide/15-alerts.html)
# RESOURCES
## USE CASES (what do we want to do?):
- how tor is increasing
* show countries that have this cases:
- "big" decreasing user stats in the last 24 hours
- increase of uses of bridges in the last 24 hours
- which kind of transports are being increased/decreased
- anomalies in bridge usage by country
- ideas in [contribut to Tor metrics timeline](https://blog.torproject.org/contribute-to-tor-metrics-timeline)
- more in [exploring the Tor dataset with metabase](https://dustri.org/b/exploring-the-tor-dataset-with-metabase.html)
# DATA SETS:
* user stats per country: [userstats relay country](https://metrics.torproject.org/userstats-relay-country.csv) (the estimated number of directly-connecting clients)
* date: UTC date (YYYY-MM-DD) for which user numbers are estimated.
* country: Two-letter lower-case country code as found in a GeoIP database by resolving clients' IP addresses, or "??" if client IP addresses could not be resolved. If this column contains the empty string, all clients are included, regardless of their country code.
* users: Estimated number of clients.
* lower: Lower number of expected clients under the assumption that there has been no censorship event. If users < lower, a censorship-related event might have happened in this country on the given day. If this column contains the empty string, there are no expectations on the number of clients.
* upper: Upper number of expected clients under the assumption that there has been no release of censorship. If users > upper, a censorship-related event might have happened in this country on the given day. If this column contains the empty string, there are no expectations on the number of clients.
* frac: Fraction of relays in percent that the estimate is based on.
* bridge user per country / transport: [userstats-bridge-combined](https://metrics.torproject.org/userstats-bridge-combined.csv)
* date: UTC date (YYYY-MM-DD) for which user numbers are estimated.
* country: Two-letter lower-case country code as found in a GeoIP database by resolving clients' IP addresses, or "??" if client IP addresses could not be resolved.
* transport: Transport name used by clients to connect to the Tor network using bridges. Examples are "obfs4", "websocket" for Flash proxy/websocket, "fte" for FTE, "<??>" for unknown pluggable transport(s), or "<OR>" for the default OR protocol.
* high: Upper bound of estimated users from the given country and transport.
* low: Lower bound of estimated users from the given country and transport.
* frac: Fraction of bridges in percent that the estimate is based on.
## format for tor metrics data [userstats-relay-country](https://metrics.torproject.org/stats.html#userstats-relay-country)
* prometheus metrics that tor exports [issues-40063](https://gitlab.torproject.org/tpo/core/tor/-/issues/40063)
* Keepiton data on internet shutdowns
- link to download data: [keepiton-stop-data-2020](https://www.accessnow.org/keepiton-stop-data-2020)
- ID
- start_date
- end_date
- duration
- Info_source
- news_link
- continent
- country
- State/India
- geo_scope
- area_name
- ordered_by
- decision_maker
- shutdown_type_new
- affected_network
- full or service-based
- Facebook_affected
- Twitter_affected
- WhatsApp_affected
- Instagram_affected
- Telegram_affected
- other_service_details (specify)
- SMS_affected
- phone_call_affected
- telcos_involved
- gov_ack
- official_just
- other_just_details
- off_statement
- actual_cause
- other_cause_details
- election
- violence
- hr_abuse_reported
- users_notified
- users_affected/targetted
- legal_justif
- legal_method
- telco_resp
- telco_ ack
- econ_impact
- event
- an_link
- notes
# Todo:
- csv needs some cleaning
- country needs to be converted to country codes
* OONI data on censorship events
# TOOLS
- [dash-plo](https://dash.plot.ly)
- jupyter
- grafana: for the dashboard
- metabase: straight forward to use
- apache superset: to explore data
- csvkit
- https://pgloader.readthedocs.io/en/latest/ref/csv.html
- prometheus: for collecting data and alerts (alerts-manager)
- https://www.bamsoftware.com/git/tor-metrics-country.git/
- infolabe-anomalies: [link](http://lists.infolabe.net/lists/listinfo/infolabe-anomalies)
# Tasks:
- share the db (gus):
echo "Downloading userstats"
curl -O [userstat-relay-country](https://metrics.torproject.org/userstats-relay-country.csv)
echo "Removing the first 5 lines"
sed -i '1,5d' userstats-relay-country.csv
echo "Import db userstats-relay-country"
(echo .separator ,; echo .import userstats-relay-country.csv userstats) | sqlite3 userstats-relay-country.db
- a downloader to get data from metrics regularly to get into the db for the tool/s to use it.
- try grafana locally and gus will use his vps to install grafana
# How prometheus is working at TPI
[prometheus-monitored-services link](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/prometheus#monitored-services)
## monitoring:
* connectivity test (through blackbox exporter)
* rdsys
* bridgestrap
- prometheus is configured by puppet [prometheus-torproject](https://prometheus.torproject.org)
- grafana is configured by hand
#Tasks to work on:
- metrics data into prometheus
- install and configure component push gateway
- getting csv into promethus
- snowflake metrics data into prometheus
- grafana to visualize data
### [hackweek-metrics](https://gitlab.torproject.org/juga/hackweek_metrics/): to launch prometheus locally, parse a csv and send a silly metric to the pushgateway
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment