|
|
This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository.
|
|
|
This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository.
|
|
|
|
|
|
The following graph style will be used for all graphs unless otherwise specified:
|
|
|
* Type
|
... | ... | @@ -23,7 +23,7 @@ The following graph style will be used for all graphs unless otherwise specified |
|
|
- [Weighted CAPTCHA rate by exit probability](#weighted-captcha-rate-by-exit-probability)
|
|
|
- [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age)
|
|
|
- [Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location)
|
|
|
- [Graphs for understanding the Cloudflare firewall](#graphs-about-understanding-the-cloudflare-firewall)
|
|
|
- [Graphs for understanding the Cloudflare firewall](#graphs-for-understanding-the-cloudflare-firewall)
|
|
|
- [CAPTCHA rate by Cloudflare security level/firewall settings](#captcha-rate-by-cloudflare-security-levelfirewall-settings)
|
|
|
- [CAPTCHA rate by traffic origin](#captcha-rate-by-traffic-origin)
|
|
|
- [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age-1)
|
... | ... | @@ -40,7 +40,7 @@ The following graph style will be used for all graphs unless otherwise specified |
|
|
## Weighted CAPTCHA rate by method
|
|
|
### Purpose
|
|
|
Understanding the effect of using different methods (for example using
|
|
|
web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the
|
|
|
web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the
|
|
|
probability of seeing a CAPTCHA
|
|
|
|
|
|
### Steps to produce
|
... | ... | @@ -102,7 +102,7 @@ change over time? [ticket:33010] |
|
|
|
|
|
## Weighted CAPTCHA rate by connection security
|
|
|
### Purpose
|
|
|
Understanding the effect of using https and not using https on the probability
|
|
|
Understanding the effect of using TLS and not using TLS on the probability
|
|
|
of seeing a CAPTCHA
|
|
|
|
|
|
### Steps to produce
|
... | ... | @@ -188,9 +188,7 @@ multiple HTTP requests to load on the probability of seeing a CAPTCHA |
|
|
completed using this exit relay and have `is_captcha_found` field
|
|
|
set to `1`
|
|
|
3. Calculate the percentage of measurements that received CAPTCHA using
|
|
|
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
|
|
|
exit relay exists in the consensus but there are no corresponding
|
|
|
measurements)
|
|
|
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
|
|
|
3. Calculate the weighted average of the percentage values (obtained in
|
|
|
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
|
|
|
scaling factor
|
... | ... | @@ -245,9 +243,7 @@ CAPTCHA |
|
|
completed using this exit relay and have `is_captcha_found` field
|
|
|
set to `1`
|
|
|
3. Calculate the percentage of measurements that received CAPTCHA using
|
|
|
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
|
|
|
exit relay exists in the consensus but there are no corresponding
|
|
|
measurements)
|
|
|
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
|
|
|
3. Calculate the weighted average of the percentage values (obtained in
|
|
|
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
|
|
|
scaling factor
|
... | ... | @@ -437,7 +433,7 @@ Understanding the effect of using older or younger exit relays |
|
|
https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen
|
|
|
6. Calculate the age of the exit relays in days using the `first_seen` field
|
|
|
of the "details document" and `valid-after` timestamp of the consensus
|
|
|
(`exit_age` = ceil_days(`valid-after` - `first_seen`))
|
|
|
`exit_age = ceil_days(valid-after - first_seen)`
|
|
|
7. Distribute the exit relay entries from the consensus into
|
|
|
`(max(exit_age) - min(exit_age)) / 365` bins based on their ages (calculated in Step 2.6)
|
|
|
8. Repeat the following for each bin:
|
... | ... | @@ -525,7 +521,7 @@ Cloudflare's blocking practices? |
|
|
<!-- ####################################################################### -->
|
|
|
<!-- ####################################################################### -->
|
|
|
|
|
|
# Graphs about understanding the Cloudflare firewall
|
|
|
# Graphs for understanding the Cloudflare firewall
|
|
|
## CAPTCHA rate by Cloudflare security level/firewall settings
|
|
|
### Purpose
|
|
|
Understanding the effect of different Cloudflare security levels and firewall
|
... | ... | @@ -553,7 +549,7 @@ We have a few different domains to test different configurations. Here they are: |
|
|
0. Determine a date range and granularity to plot. Here, we will plot last 30 days
|
|
|
with a granularity of 1 hour.
|
|
|
1. Use CAPTCHA Monitor API to get measurements that were *completed
|
|
|
using domains specified above* and during the chosen date range and
|
|
|
using only domains specified above* and during the chosen date range and
|
|
|
5. Iterate over the chosen date range with the chosen time intervals. Repeat
|
|
|
the following for each iteration:
|
|
|
1. Distribute the measurements that were completed within the interval of
|
... | ... | @@ -594,7 +590,7 @@ experiments. This list contains the metadata about the URLs. |
|
|
3. Join the measurements and URL list using the `URL` fields. Typically each
|
|
|
URL maps to multiple measurements.
|
|
|
4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
|
|
|
fields
|
|
|
field
|
|
|
5. Iterate over the chosen date range with the chosen time intervals. Repeat
|
|
|
the following for each iteration:
|
|
|
1. Distribute the measurements that were completed within the interval of
|
... | ... | @@ -639,7 +635,7 @@ different treatment for older relays |
|
|
6. Join the measurements and URL list using the `URL` fields. Typically each
|
|
|
URL maps to multiple measurements.
|
|
|
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
|
|
|
fields
|
|
|
field
|
|
|
8. Obtain the "details document" from Onionoo and match the Onionoo data
|
|
|
with the relay entries from consensus using the relay fingerprints. The following query is
|
|
|
recommended for obtaining the "details document":
|
... | ... | @@ -703,13 +699,13 @@ certain countries |
|
|
6. Join the measurements and URL list using the `URL` fields. Typically each
|
|
|
URL maps to multiple measurements.
|
|
|
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
|
|
|
fields
|
|
|
field
|
|
|
8. Obtain the "details document" from Onionoo and match the Onionoo data
|
|
|
with the relay entries from consensus using the relay fingerprints. The following query is
|
|
|
recommended for obtaining the "details document":
|
|
|
https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name
|
|
|
9. Distribute the exit relay entries from the consensus into bins based on
|
|
|
their `country_name` value (obtained in Step 2.5)
|
|
|
their `country_name` value (obtained in Step 2.8)
|
|
|
10. Repeat the following for each bin:
|
|
|
1. Repeat the following for each exit relay in the bin:
|
|
|
1. Count the total number of measurements that were completed using
|
... | ... | @@ -746,7 +742,7 @@ experiments. This list contains the metadata about the URLs. |
|
|
3. Join the measurements and URL list using the `URL` fields. Typically each
|
|
|
URL maps to multiple measurements.
|
|
|
4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
|
|
|
fields
|
|
|
field
|
|
|
5. Iterate over the chosen date range with the chosen time intervals. Repeat
|
|
|
the following for each iteration:
|
|
|
1. Distribute the measurements that were completed within the
|
... | ... | @@ -801,9 +797,7 @@ probability of seeing a CAPTCHA |
|
|
completed using this exit relay and have `is_captcha_found` field
|
|
|
set to `1`
|
|
|
3. Calculate the percentage of measurements that received CAPTCHA using
|
|
|
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
|
|
|
exit relay exists in the consensus but there are no corresponding
|
|
|
measurements)
|
|
|
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
|
|
|
3. Calculate the weighted average of the percentage values (obtained in
|
|
|
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
|
|
|
scaling factor
|
... | ... | @@ -852,9 +846,7 @@ Understanding the effect of using Tor Browser at different security levels |
|
|
completed using this exit relay and have `is_captcha_found` field
|
|
|
set to `1`
|
|
|
3. Calculate the percentage of measurements that received CAPTCHA using
|
|
|
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
|
|
|
exit relay exists in the consensus but there are no corresponding
|
|
|
measurements)
|
|
|
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
|
|
|
3. Calculate the weighted average of the percentage values (obtained in
|
|
|
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
|
|
|
scaling factor
|
... | ... | |