Update Dashboard Graphs authored by Barkin Simsek's avatar Barkin Simsek
This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository. This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository.
The following graph style will be used for all graphs unless otherwise specified: The following graph style will be used for all graphs unless otherwise specified:
* Type * Type
...@@ -23,7 +23,7 @@ The following graph style will be used for all graphs unless otherwise specified ...@@ -23,7 +23,7 @@ The following graph style will be used for all graphs unless otherwise specified
- [Weighted CAPTCHA rate by exit probability](#weighted-captcha-rate-by-exit-probability) - [Weighted CAPTCHA rate by exit probability](#weighted-captcha-rate-by-exit-probability)
- [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age) - [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age)
- [Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location) - [Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location)
- [Graphs for understanding the Cloudflare firewall](#graphs-about-understanding-the-cloudflare-firewall) - [Graphs for understanding the Cloudflare firewall](#graphs-for-understanding-the-cloudflare-firewall)
- [CAPTCHA rate by Cloudflare security level/firewall settings](#captcha-rate-by-cloudflare-security-levelfirewall-settings) - [CAPTCHA rate by Cloudflare security level/firewall settings](#captcha-rate-by-cloudflare-security-levelfirewall-settings)
- [CAPTCHA rate by traffic origin](#captcha-rate-by-traffic-origin) - [CAPTCHA rate by traffic origin](#captcha-rate-by-traffic-origin)
- [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age-1) - [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age-1)
...@@ -40,7 +40,7 @@ The following graph style will be used for all graphs unless otherwise specified ...@@ -40,7 +40,7 @@ The following graph style will be used for all graphs unless otherwise specified
## Weighted CAPTCHA rate by method ## Weighted CAPTCHA rate by method
### Purpose ### Purpose
Understanding the effect of using different methods (for example using Understanding the effect of using different methods (for example using
web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the
probability of seeing a CAPTCHA probability of seeing a CAPTCHA
### Steps to produce ### Steps to produce
...@@ -102,7 +102,7 @@ change over time? [ticket:33010] ...@@ -102,7 +102,7 @@ change over time? [ticket:33010]
## Weighted CAPTCHA rate by connection security ## Weighted CAPTCHA rate by connection security
### Purpose ### Purpose
Understanding the effect of using https and not using https on the probability Understanding the effect of using TLS and not using TLS on the probability
of seeing a CAPTCHA of seeing a CAPTCHA
### Steps to produce ### Steps to produce
...@@ -188,9 +188,7 @@ multiple HTTP requests to load on the probability of seeing a CAPTCHA ...@@ -188,9 +188,7 @@ multiple HTTP requests to load on the probability of seeing a CAPTCHA
completed using this exit relay and have `is_captcha_found` field completed using this exit relay and have `is_captcha_found` field
set to `1` set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using 3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in 3. Calculate the weighted average of the percentage values (obtained in
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor scaling factor
...@@ -245,9 +243,7 @@ CAPTCHA ...@@ -245,9 +243,7 @@ CAPTCHA
completed using this exit relay and have `is_captcha_found` field completed using this exit relay and have `is_captcha_found` field
set to `1` set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using 3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in 3. Calculate the weighted average of the percentage values (obtained in
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor scaling factor
...@@ -437,7 +433,7 @@ Understanding the effect of using older or younger exit relays ...@@ -437,7 +433,7 @@ Understanding the effect of using older or younger exit relays
https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen
6. Calculate the age of the exit relays in days using the `first_seen` field 6. Calculate the age of the exit relays in days using the `first_seen` field
of the "details document" and `valid-after` timestamp of the consensus of the "details document" and `valid-after` timestamp of the consensus
(`exit_age` = ceil_days(`valid-after` - `first_seen`)) `exit_age = ceil_days(valid-after - first_seen)`
7. Distribute the exit relay entries from the consensus into 7. Distribute the exit relay entries from the consensus into
`(max(exit_age) - min(exit_age)) / 365` bins based on their ages (calculated in Step 2.6) `(max(exit_age) - min(exit_age)) / 365` bins based on their ages (calculated in Step 2.6)
8. Repeat the following for each bin: 8. Repeat the following for each bin:
...@@ -525,7 +521,7 @@ Cloudflare's blocking practices? ...@@ -525,7 +521,7 @@ Cloudflare's blocking practices?
<!-- ####################################################################### --> <!-- ####################################################################### -->
<!-- ####################################################################### --> <!-- ####################################################################### -->
# Graphs about understanding the Cloudflare firewall # Graphs for understanding the Cloudflare firewall
## CAPTCHA rate by Cloudflare security level/firewall settings ## CAPTCHA rate by Cloudflare security level/firewall settings
### Purpose ### Purpose
Understanding the effect of different Cloudflare security levels and firewall Understanding the effect of different Cloudflare security levels and firewall
...@@ -553,7 +549,7 @@ We have a few different domains to test different configurations. Here they are: ...@@ -553,7 +549,7 @@ We have a few different domains to test different configurations. Here they are:
0. Determine a date range and granularity to plot. Here, we will plot last 30 days 0. Determine a date range and granularity to plot. Here, we will plot last 30 days
with a granularity of 1 hour. with a granularity of 1 hour.
1. Use CAPTCHA Monitor API to get measurements that were *completed 1. Use CAPTCHA Monitor API to get measurements that were *completed
using domains specified above* and during the chosen date range and using only domains specified above* and during the chosen date range and
5. Iterate over the chosen date range with the chosen time intervals. Repeat 5. Iterate over the chosen date range with the chosen time intervals. Repeat
the following for each iteration: the following for each iteration:
1. Distribute the measurements that were completed within the interval of 1. Distribute the measurements that were completed within the interval of
...@@ -594,7 +590,7 @@ experiments. This list contains the metadata about the URLs. ...@@ -594,7 +590,7 @@ experiments. This list contains the metadata about the URLs.
3. Join the measurements and URL list using the `URL` fields. Typically each 3. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements. URL maps to multiple measurements.
4. Discard the measurements that do not have `cloudflare` in their `cdn_provider` 4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields field
5. Iterate over the chosen date range with the chosen time intervals. Repeat 5. Iterate over the chosen date range with the chosen time intervals. Repeat
the following for each iteration: the following for each iteration:
1. Distribute the measurements that were completed within the interval of 1. Distribute the measurements that were completed within the interval of
...@@ -639,7 +635,7 @@ different treatment for older relays ...@@ -639,7 +635,7 @@ different treatment for older relays
6. Join the measurements and URL list using the `URL` fields. Typically each 6. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements. URL maps to multiple measurements.
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider` 7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields field
8. Obtain the "details document" from Onionoo and match the Onionoo data 8. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is with the relay entries from consensus using the relay fingerprints. The following query is
recommended for obtaining the "details document": recommended for obtaining the "details document":
...@@ -703,13 +699,13 @@ certain countries ...@@ -703,13 +699,13 @@ certain countries
6. Join the measurements and URL list using the `URL` fields. Typically each 6. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements. URL maps to multiple measurements.
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider` 7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields field
8. Obtain the "details document" from Onionoo and match the Onionoo data 8. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is with the relay entries from consensus using the relay fingerprints. The following query is
recommended for obtaining the "details document": recommended for obtaining the "details document":
https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name
9. Distribute the exit relay entries from the consensus into bins based on 9. Distribute the exit relay entries from the consensus into bins based on
their `country_name` value (obtained in Step 2.5) their `country_name` value (obtained in Step 2.8)
10. Repeat the following for each bin: 10. Repeat the following for each bin:
1. Repeat the following for each exit relay in the bin: 1. Repeat the following for each exit relay in the bin:
1. Count the total number of measurements that were completed using 1. Count the total number of measurements that were completed using
...@@ -746,7 +742,7 @@ experiments. This list contains the metadata about the URLs. ...@@ -746,7 +742,7 @@ experiments. This list contains the metadata about the URLs.
3. Join the measurements and URL list using the `URL` fields. Typically each 3. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements. URL maps to multiple measurements.
4. Discard the measurements that do not have `cloudflare` in their `cdn_provider` 4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields field
5. Iterate over the chosen date range with the chosen time intervals. Repeat 5. Iterate over the chosen date range with the chosen time intervals. Repeat
the following for each iteration: the following for each iteration:
1. Distribute the measurements that were completed within the 1. Distribute the measurements that were completed within the
...@@ -801,9 +797,7 @@ probability of seeing a CAPTCHA ...@@ -801,9 +797,7 @@ probability of seeing a CAPTCHA
completed using this exit relay and have `is_captcha_found` field completed using this exit relay and have `is_captcha_found` field
set to `1` set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using 3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in 3. Calculate the weighted average of the percentage values (obtained in
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor scaling factor
...@@ -852,9 +846,7 @@ Understanding the effect of using Tor Browser at different security levels ...@@ -852,9 +846,7 @@ Understanding the effect of using Tor Browser at different security levels
completed using this exit relay and have `is_captcha_found` field completed using this exit relay and have `is_captcha_found` field
set to `1` set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using 3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in 3. Calculate the weighted average of the percentage values (obtained in
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor scaling factor
... ...
......