Update Dashboard Graphs authored by Barkin Simsek's avatar Barkin Simsek
......@@ -23,7 +23,7 @@ The following graph style will be used for all graphs unless otherwise specified
- [Weighted CAPTCHA rate by exit probability](#weighted-captcha-rate-by-exit-probability)
- [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age)
- [Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location)
- [Graphs for understanding the Cloudflare firewall](#graphs-about-understanding-the-cloudflare-firewall)
- [Graphs for understanding the Cloudflare firewall](#graphs-for-understanding-the-cloudflare-firewall)
- [CAPTCHA rate by Cloudflare security level/firewall settings](#captcha-rate-by-cloudflare-security-levelfirewall-settings)
- [CAPTCHA rate by traffic origin](#captcha-rate-by-traffic-origin)
- [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age-1)
......@@ -102,7 +102,7 @@ change over time? [ticket:33010]
## Weighted CAPTCHA rate by connection security
### Purpose
Understanding the effect of using https and not using https on the probability
Understanding the effect of using TLS and not using TLS on the probability
of seeing a CAPTCHA
### Steps to produce
......@@ -188,9 +188,7 @@ multiple HTTP requests to load on the probability of seeing a CAPTCHA
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
3. Calculate the weighted average of the percentage values (obtained in
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
......@@ -245,9 +243,7 @@ CAPTCHA
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
3. Calculate the weighted average of the percentage values (obtained in
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
......@@ -437,7 +433,7 @@ Understanding the effect of using older or younger exit relays
https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen
6. Calculate the age of the exit relays in days using the `first_seen` field
of the "details document" and `valid-after` timestamp of the consensus
(`exit_age` = ceil_days(`valid-after` - `first_seen`))
`exit_age = ceil_days(valid-after - first_seen)`
7. Distribute the exit relay entries from the consensus into
`(max(exit_age) - min(exit_age)) / 365` bins based on their ages (calculated in Step 2.6)
8. Repeat the following for each bin:
......@@ -525,7 +521,7 @@ Cloudflare's blocking practices?
<!-- ####################################################################### -->
<!-- ####################################################################### -->
# Graphs about understanding the Cloudflare firewall
# Graphs for understanding the Cloudflare firewall
## CAPTCHA rate by Cloudflare security level/firewall settings
### Purpose
Understanding the effect of different Cloudflare security levels and firewall
......@@ -553,7 +549,7 @@ We have a few different domains to test different configurations. Here they are:
0. Determine a date range and granularity to plot. Here, we will plot last 30 days
with a granularity of 1 hour.
1. Use CAPTCHA Monitor API to get measurements that were *completed
using domains specified above* and during the chosen date range and
using only domains specified above* and during the chosen date range and
5. Iterate over the chosen date range with the chosen time intervals. Repeat
the following for each iteration:
1. Distribute the measurements that were completed within the interval of
......@@ -594,7 +590,7 @@ experiments. This list contains the metadata about the URLs.
3. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements.
4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields
field
5. Iterate over the chosen date range with the chosen time intervals. Repeat
the following for each iteration:
1. Distribute the measurements that were completed within the interval of
......@@ -639,7 +635,7 @@ different treatment for older relays
6. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements.
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields
field
8. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is
recommended for obtaining the "details document":
......@@ -703,13 +699,13 @@ certain countries
6. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements.
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields
field
8. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is
recommended for obtaining the "details document":
https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name
9. Distribute the exit relay entries from the consensus into bins based on
their `country_name` value (obtained in Step 2.5)
their `country_name` value (obtained in Step 2.8)
10. Repeat the following for each bin:
1. Repeat the following for each exit relay in the bin:
1. Count the total number of measurements that were completed using
......@@ -746,7 +742,7 @@ experiments. This list contains the metadata about the URLs.
3. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements.
4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields
field
5. Iterate over the chosen date range with the chosen time intervals. Repeat
the following for each iteration:
1. Distribute the measurements that were completed within the
......@@ -801,9 +797,7 @@ probability of seeing a CAPTCHA
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
3. Calculate the weighted average of the percentage values (obtained in
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
......@@ -852,9 +846,7 @@ Understanding the effect of using Tor Browser at different security levels
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
3. Calculate the weighted average of the percentage values (obtained in
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
......
......