Changes

Barkin Simsek · 13ee05ae
--- a/Dashboard-Graphs.md
+++ b/Dashboard-Graphs.md
-This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository. 
+This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository.
 The following graph style will be used for all graphs unless otherwise specified:
 * Type
@@ -23,7 +23,7 @@ The following graph style will be used for all graphs unless otherwise specified
    - [Weighted CAPTCHA rate by exit probability](#weighted-captcha-rate-by-exit-probability)
    - [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age)
    - [Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location)
- [Graphs for understanding the Cloudflare firewall](#graphs-about-understanding-the-cloudflare-firewall)
+- [Graphs for understanding the Cloudflare firewall](#graphs-for-understanding-the-cloudflare-firewall)
    - [CAPTCHA rate by Cloudflare security level/firewall settings](#captcha-rate-by-cloudflare-security-levelfirewall-settings)
    - [CAPTCHA rate by traffic origin](#captcha-rate-by-traffic-origin)
    - [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age-1)
@@ -40,7 +40,7 @@ The following graph style will be used for all graphs unless otherwise specified
 ## Weighted CAPTCHA rate by method
 ### Purpose
 Understanding the effect of using different methods (for example using
-web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the 
+web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the
 probability of seeing a CAPTCHA
 ### Steps to produce
@@ -102,7 +102,7 @@ change over time? [ticket:33010]
 ## Weighted CAPTCHA rate by connection security
 ### Purpose
-Understanding the effect of using https and not using https on the probability
+Understanding the effect of using TLS and not using TLS on the probability
 of seeing a CAPTCHA
 ### Steps to produce
@@ -188,9 +188,7 @@ multiple HTTP requests to load on the probability of seeing a CAPTCHA
            completed using this exit relay and have `is_captcha_found` field
            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
-            exit relay exists in the consensus but there are no corresponding
-            measurements)
        3. Calculate the weighted average of the percentage values (obtained in
        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
        scaling factor
@@ -245,9 +243,7 @@ CAPTCHA
            completed using this exit relay and have `is_captcha_found` field
            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
-            exit relay exists in the consensus but there are no corresponding
-            measurements)
        3. Calculate the weighted average of the percentage values (obtained in
        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
        scaling factor
@@ -437,7 +433,7 @@ Understanding the effect of using older or younger exit relays
    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen
    6. Calculate the age of the exit relays in days using the `first_seen` field
    of the "details document" and `valid-after` timestamp of the consensus
-    (`exit_age` = ceil_days(`valid-after` - `first_seen`))
+    `exit_age = ceil_days(valid-after - first_seen)`
    7. Distribute the exit relay entries from the consensus into
    `(max(exit_age) - min(exit_age)) / 365` bins based on their ages (calculated in Step 2.6)
    8. Repeat the following for each bin:
@@ -525,7 +521,7 @@ Cloudflare's blocking practices?
 <!-- ####################################################################### -->
 <!-- ####################################################################### -->
-# Graphs about understanding the Cloudflare firewall
+# Graphs for understanding the Cloudflare firewall
 ## CAPTCHA rate by Cloudflare security level/firewall settings
 ### Purpose
 Understanding the effect of different Cloudflare security levels and firewall
@@ -553,7 +549,7 @@ We have a few different domains to test different configurations. Here they are:
 0. Determine a date range and granularity to plot. Here, we will plot last 30 days
 with a granularity of 1 hour.
 1. Use CAPTCHA Monitor API to get measurements that were *completed
-using domains specified above* and during the chosen date range and
+using only domains specified above* and during the chosen date range and
 5. Iterate over the chosen date range with the chosen time intervals. Repeat
 the following for each iteration:
    1. Distribute the measurements that were completed within the interval of
@@ -594,7 +590,7 @@ experiments. This list contains the metadata about the URLs.
 3. Join the measurements and URL list using the `URL` fields. Typically each
 URL maps to multiple measurements.
 4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-fields
+field
 5. Iterate over the chosen date range with the chosen time intervals. Repeat
 the following for each iteration:
    1. Distribute the measurements that were completed within the interval of
@@ -639,7 +635,7 @@ different treatment for older relays
    6. Join the measurements and URL list using the `URL` fields. Typically each
    URL maps to multiple measurements.
    7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-    fields
+    field
    8. Obtain the "details document" from Onionoo and match the Onionoo data
    with the relay entries from consensus using the relay fingerprints. The following query is
    recommended for obtaining the "details document":
@@ -703,13 +699,13 @@ certain countries
    6. Join the measurements and URL list using the `URL` fields. Typically each
    URL maps to multiple measurements.
    7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-    fields
+    field
    8. Obtain the "details document" from Onionoo and match the Onionoo data
    with the relay entries from consensus using the relay fingerprints. The following query is
    recommended for obtaining the "details document":
    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name
    9. Distribute the exit relay entries from the consensus into bins based on
-    their `country_name` value (obtained in Step 2.5)
+    their `country_name` value (obtained in Step 2.8)
    10. Repeat the following for each bin:
        1. Repeat the following for each exit relay in the bin:
            1. Count the total number of measurements that were completed using
@@ -746,7 +742,7 @@ experiments. This list contains the metadata about the URLs.
 3. Join the measurements and URL list using the `URL` fields. Typically each
 URL maps to multiple measurements.
 4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-fields
+field
 5. Iterate over the chosen date range with the chosen time intervals. Repeat
 the following for each iteration:
    1. Distribute the measurements that were completed within the
@@ -801,9 +797,7 @@ probability of seeing a CAPTCHA
            completed using this exit relay and have `is_captcha_found` field
            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
+            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
-            exit relay exists in the consensus but there are no corresponding
-            measurements)
        3. Calculate the weighted average of the percentage values (obtained in
        Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
        scaling factor
@@ -852,9 +846,7 @@ Understanding the effect of using Tor Browser at different security levels
            completed using this exit relay and have `is_captcha_found` field
            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
+            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
-            exit relay exists in the consensus but there are no corresponding
-            measurements)
        3. Calculate the weighted average of the percentage values (obtained in
        Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
        scaling factor