Changes

Barkin Simsek · 4242452b
--- a/Dashboard-Graphs.md
+++ b/Dashboard-Graphs.md
@@ -51,225 +51,7 @@ The following graph style will be used for all graphs unless otherwise specified
 * Sample Graph (Number of data points is reduced for simplicity)
 ![graph-style](uploads/e62c2716de6cd64e3a6bf949d1bd0726/graph-style.png)

-# Graphs for understanding CAPTCHA rates related to user decisions
-## Weighted CAPTCHA rate by method
-### Purpose
-Understanding the effect of using different methods (for example using
-web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the
-probability of seeing a CAPTCHA
-
-### Steps to produce
-1. Get consensuses from CollecTor
-2. Repeat the following for each consensus:
-    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
-    consensus header and `bandwidth-weights` values from the footer
-    2. Repeat the following for each *running exit relay* entry within the consensus:
-        1. Parse the `r` line and memorize the IPv4 address and identity
-        2. Parse the `w` line and memorize the bandwidth
-        3. Parse the `s` line and memorize the relay flags
-    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
-    from the consensus, `bandwidth` values, and `flags` for each exit relay
-    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
-    4. Use CAPTCHA Monitor API to get measurements that were completed
-    using Tor and between the `valid-after` & `fresh-until` timestamps of the
-    consensus
-    5. Join the measurements and relay data using the relay fingerprints.
-    Typically each relay maps to multiple measurements.
-    6. Distribute the joined data into bins based on `method` field's value
-    7. Repeat the following for each bin:
-        1. Further bin the measurements into sub-bins based on the exit relay used
-        to perform the measurement
-        2. Repeat the following for each exit relay in each sub-bin:
-            1. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay
-            2. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay and have `is_captcha_found` field
-            set to `1`
-            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
-        3. Calculate the weighted average of the percentage values (obtained in
-        Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
-        scaling factor
-    8. Plot the weighted percentage values for each `method` bin in the Y-axis and
-    the `valid-after` timestamp of the consensus in the X-axis
-3. Merge the graphs created for each consensus
-
-### Related questions
- [(2)](home#metrics-to-track) How does the HTTP request headers affect
-Cloudflare's decision-making mechanism? [ticket:33010#comment:4]
-    - [(2.1)](home#metrics-to-track) Is there a difference between using the
-    actual Tor Browser itself and tor-browser-selenium in terms of the HTTP headers?
-    - [(2.2)](home#metrics-to-track) How does Cloudflare react differently if the
-    browser doesn't support alt-svc headers? [ticket:32915]
- [(3)](home#metrics-to-track) How do different browsers with different
-User Agents get affected? [ticket:33010#comment:2], [ticket:32924], [ticket:31404]
-    - [(3.1)](home#metrics-to-track) Is there a difference between using a web
-    browser or fetching web pages via cURL or other HTTP libraries?
- [(7)](home#metrics-to-track) How does the time of the day affect the
-Cloudflare's blocking mechanism? Does it matter the day of the week or the time
-of the day? [ticket:33010#comment:15]
- [(15)](home#metrics-to-track) If browsers that should not face CAPTCHA face
-CAPTCHA, why does this happen?
- [(16)](home#metrics-to-track) How do the observed patterns in the results
-change over time? [ticket:33010]
-
-<!-- ####################################################################### -->
-
-## Weighted CAPTCHA rate by connection security
-### Purpose
-Understanding the effect of using TLS and not using TLS on the probability
-of seeing a CAPTCHA
-
-### Steps to produce
-1. Get consensuses from CollecTor
-2. Repeat the following for each consensus:
-    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
-    consensus header and `bandwidth-weights` values from the footer
-    2. Repeat the following for each *running exit relay* entry within the consensus:
-        1. Parse the `r` line and memorize the IPv4 address and identity
-        2. Parse the `w` line and memorize the bandwidth
-        3. Parse the `s` line and memorize the relay flags
-    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
-    from the consensus, `bandwidth` values, and `flags` for each exit relay
-    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
-    4. Use CAPTCHA Monitor API to get measurements that were completed
-    using Tor and between the `valid-after` & `fresh-until` timestamps of the
-    consensus
-    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
-    experiments. This list contains the metadata about the URLs.
-    6. Join the measurements, URL list, and relay data using the relay
-    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
-    7. Distribute the joined data into 2 bins based on whether the
-    `is_https` field of each entry is `1` or `0`
-    8. Repeat the following for each bin:
-        1. Further bin the measurements into sub-bins based on the exit relay used
-        to perform the measurement
-        2. Repeat the following for each exit relay in each sub-bin:
-            1. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay
-            2. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay and have `is_captcha_found` field
-            set to `1`
-            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
-        3. Calculate the weighted average of the percentage values (obtained in
-        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
-        scaling factor
-    9. Plot the weighted percentage values for each bin in the Y-axis and
-    the `valid-after` timestamp of the consensus in the X-axis
-3. Merge the graphs created for each consensus
-
-### Related questions
- [(14)](home#metrics-to-track) Is there a difference if the origin server has
-an SSL certificate or not?
-    - [(14.1)](home#metrics-to-track) Does the blocking change if the SSL
-    certificate is issued by Cloudflare or by another entity?
-
-<!-- ####################################################################### -->
-
-## Weighted CAPTCHA rate by HTTP request quantity
-### Purpose
-Understanding the effect of connecting to websites that require single or
-multiple HTTP requests to load on the probability of seeing a CAPTCHA
-
-### Steps to produce
-1. Get consensuses from CollecTor
-2. Repeat the following for each consensus:
-    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
-    consensus header and `bandwidth-weights` values from the footer
-    2. Repeat the following for each *running exit relay* entry within the consensus:
-        1. Parse the `r` line and memorize the IPv4 address and identity
-        2. Parse the `w` line and memorize the bandwidth
-        3. Parse the `s` line and memorize the relay flags
-    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
-    from the consensus, `bandwidth` values, and `flags` for each exit relay
-    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
-    4. Use CAPTCHA Monitor API to get measurements that were completed
-    using Tor and between the `valid-after` & `fresh-until` timestamps of the
-    consensus
-    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
-    experiments. This list contains the metadata about the URLs.
-    6. Join the measurements, URL list, and relay data using the relay
-    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
-    7. Distribute the joined data into 2 bins based on whether the
-    `requires_multiple_reqs` field of each entry is `1` or `0`
-    8. Repeat the following for each bin:
-        1. Further bin the measurements into sub-bins based on the exit relay used
-        to perform the measurement
-        2. Repeat the following for each exit relay in each sub-bin:
-            1. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay
-            2. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay and have `is_captcha_found` field
-            set to `1`
-            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
-        3. Calculate the weighted average of the percentage values (obtained in
-        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
-        scaling factor
-    9. Plot the weighted percentage values for each bin in the Y-axis and
-    the `valid-after` timestamp of the consensus in the X-axis
-3. Merge the graphs created for each consensus
-
-### Related questions
- [(13)](home#metrics-to-track) Is there a difference between websites that load
-resources from third-party resources and websites that contain all resources on
-the origin server? [ticket:33010#comment:6]
-    - [(13.1)](home#metrics-to-track) How do users of websites get affected if
-    the main website is not fronted by Cloudflare, but some of the resources are
-    fetched from a Cloudflare fronted web server? [ticket:33010#comment:6], [ticket:15450]
-
-
-<!-- ####################################################################### -->
-
-## Weighted CAPTCHA rate by CDN provider
-### Purpose
-Understanding the effect of connecting to websites that use CDN providers such
-as Cloudflare, Akamai, Amazon Cloudfront, etc. on the probability of seeing a
-CAPTCHA
-
-### Steps to produce
-1. Get consensuses from CollecTor
-2. Repeat the following for each consensus:
-    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
-    consensus header and `bandwidth-weights` values from the footer
-    2. Repeat the following for each *running exit relay* entry within the consensus:
-        1. Parse the `r` line and memorize the IPv4 address and identity
-        2. Parse the `w` line and memorize the bandwidth
-        3. Parse the `s` line and memorize the relay flags
-    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
-    from the consensus, `bandwidth` values, and `flags` for each exit relay
-    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
-    4. Use CAPTCHA Monitor API to get measurements that were completed
-    using Tor and between the `valid-after` & `fresh-until` timestamps of the
-    consensus
-    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
-    experiments. This list contains the metadata about the URLs.
-    6. Join the measurements, URL list, and relay data using the relay
-    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
-    7. Distribute the joined data into bins based on `cdn_provider` field's value
-    8. Repeat the following for each bin:
-        1. Further bin the measurements into sub-bins based on the exit relay used
-        to perform the measurement
-        2. Repeat the following for each exit relay in each sub-bin:
-            1. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay
-            2. Count the total number of measurements in this sub-bin that were
-            completed using this exit relay and have `is_captcha_found` field
-            set to `1`
-            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
-        3. Calculate the weighted average of the percentage values (obtained in
-        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
-        scaling factor
-    9. Plot the weighted percentage values for each bin in the Y-axis and
-    the `valid-after` timestamp of the consensus in the X-axis
-3. Merge the graphs created for each consensus
-
-<!-- ####################################################################### -->
-<!-- ####################################################################### -->
-
-# Graphs for understanding the overall network status
+# Graphs for understanding the overall network status (by CDN)
 ## Probability of a Tor client receiving CAPTCHA
 ### Purpose
 Understanding the probability of a Tor client choosing an exit relay in the normal
@@ -534,41 +316,29 @@ Cloudflare's blocking practices?
    blocked?

 <!-- ####################################################################### -->
-<!-- ####################################################################### -->

-# Graphs for understanding the Cloudflare firewall
-## CAPTCHA rate by Cloudflare security level/firewall settings
+## CAPTCHA rate by traffic origin (Tor traffic vs Non-Tor traffic)
 ### Purpose
-Understanding the effect of different Cloudflare security levels and firewall
-configurations on the probability of seeing a CAPTCHA.
-
-We have a few different domains to test different configurations. Here they are:
- captcha.wtf
-    - IPv4 only domain, no additional Cloudflare firewall rules
- yearlight.buzz
-    - IPv4 only domain, Cloudflare firewall is set to present "JS Challenge" for
-    traffic originating from the Tor network
- bottomlesspit.xyz
-    - IPv4 only domain, Cloudflare firewall is set to present "CAPTCHA Challenge" for
-    traffic originating from the Tor network
- broccolipizza.monster
-    - IPv4 only domain, Cloudflare firewall is set to block all traffic
-    originating from the Tor network
- exit11.online
-    - IPv6 only domain, no additional Cloudflare firewall rules
- icanhazcaptcha.xyz
-    - IPv6 only domain, Cloudflare firewall is set to present "CAPTCHA Challenge" for
-    traffic originating from the Tor network
+Understanding how Cloudflare treats to Tor traffic vs. non-Tor traffic (this one
+is stating the obvious but still good to have data to back up the obvious)

 ### Steps to produce
 0. Determine a date range and granularity to plot. Here, we will plot last 30 days
 with a granularity of 1 hour.
-1. Use CAPTCHA Monitor API to get measurements that were *completed
-using only domains specified above* and during the chosen date range and
+1. Use CAPTCHA Monitor API to get measurements that were completed during the
+chosen date range
+2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+experiments. This list contains the metadata about the URLs.
+3. Join the measurements and URL list using the `URL` fields. Typically each
+URL maps to multiple measurements.
+4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
+field
 5. Iterate over the chosen date range with the chosen time intervals. Repeat
 the following for each iteration:
    1. Distribute the measurements that were completed within the interval of
-    this iteration into bins based on `url` field's value
+    this iteration into 2 bins based on `method` field's value. Put the methods
+    without "tor" (ex. "firefox") into the `Non-Tor Traffic` bin and the rest
+    (ex. "firefox_over_tor") into the `Tor Traffic` bin.
    2. Repeat the following for each bin:
        1. Count the total number of measurements in this bin
        2. Count the total number of measurements in this bin that have
@@ -580,55 +350,68 @@ the following for each iteration:
    time of this interval in the X-axis
 5. Merge the graphs created for each iteration

-### Related questions
-<!-- - [(3.4)](home#metrics-to-track) How does Cloudflare react to browsers with
-and without JavaScript enabled? [ticket:31404] -->
- [(6)](home#metrics-to-track) How do different security levels of Cloudflare
-affect the blocking mechanism? [ticket:33010#comment:5]
-    - [(6.1)](home#metrics-to-track) Do some of the Cloudflare security levels
-    block users immediately without presenting a CAPTCHA challenge at all?

 <!-- ####################################################################### -->
+<!-- ####################################################################### -->
+

-## CAPTCHA rate by traffic origin
+# Graphs for understanding CAPTCHA rates related to website decisions
+## Weighted CAPTCHA rate by connection security
 ### Purpose
-Understanding how Cloudflare treats to Tor traffic vs. non-Tor traffic (this one
-is stating the obvious but still good to have data to back up the obvious)
+Understanding the effect of using TLS and not using TLS on the probability
+of seeing a CAPTCHA

 ### Steps to produce
-0. Determine a date range and granularity to plot. Here, we will plot last 30 days
-with a granularity of 1 hour.
-1. Use CAPTCHA Monitor API to get measurements that were completed during the
-chosen date range
-2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
    experiments. This list contains the metadata about the URLs.
-3. Join the measurements and URL list using the `URL` fields. Typically each
-URL maps to multiple measurements.
-4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-field
-5. Iterate over the chosen date range with the chosen time intervals. Repeat
-the following for each iteration:
-    1. Distribute the measurements that were completed within the interval of
-    this iteration into 2 bins based on `method` field's value. Put the methods
-    without "tor" (ex. "firefox") into the `Non-Tor Traffic` bin and the rest
-    (ex. "firefox_over_tor") into the `Tor Traffic` bin.
-    2. Repeat the following for each bin:
-        1. Count the total number of measurements in this bin
-        2. Count the total number of measurements in this bin that have
-        `is_captcha_found` field set to `1`
+    6. Join the measurements, URL list, and relay data using the relay
+    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
+    7. Distribute the joined data into 2 bins based on whether the
+    `is_https` field of each entry is `1` or `0`
+    8. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-        $`\frac{Step 5.2.2}{Step 5.2.1} \times 100`$ (Leave this bin's value
-        empty if there are no corresponding measurements)
-    3. Plot the percentage values for each bin in the Y-axis and the beginning
-    time of this interval in the X-axis
-5. Merge the graphs created for each iteration
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    9. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related questions
+- [(14)](home#metrics-to-track) Is there a difference if the origin server has
+an SSL certificate or not?
+    - [(14.1)](home#metrics-to-track) Does the blocking change if the SSL
+    certificate is issued by Cloudflare or by another entity?

 <!-- ####################################################################### -->

-## Weighted CAPTCHA rate by exit relay age
+## Weighted CAPTCHA rate by HTTP request quantity
 ### Purpose
-Understanding how quickly Cloudflare blocks the newer relays and if there is a
-different treatment for older relays
+Understanding the effect of connecting to websites that require single or
+multiple HTTP requests to load on the probability of seeing a CAPTCHA

 ### Steps to produce
 1. Get consensuses from CollecTor
@@ -647,52 +430,44 @@ different treatment for older relays
    consensus
    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
    experiments. This list contains the metadata about the URLs.
-    6. Join the measurements and URL list using the `URL` fields. Typically each
-    URL maps to multiple measurements.
-    7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-    field
-    8. Obtain the "details document" from Onionoo and match the Onionoo data
-    with the relay entries from consensus using the relay fingerprints. The following query is
-    recommended for obtaining the "details document":
-    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen
-    9. Calculate the age of the exit relays in days using the `first_seen` field
-    of the "details document" and `valid-after` timestamp of the consensus
-    (`exit_age` = ceil_days(`valid-after` - `first_seen`))
-    10. Distribute the exit relay entries from the consensus into
-    `(max(exit_age) - min(exit_age)) / 365` bins based on their ages
-    (calculated in Step 2.9)
-    11. Repeat the following for each bin:
-        1. Repeat the following for each exit relay in the bin:
-            1. Count the total number of measurements that were
+    6. Join the measurements, URL list, and relay data using the relay
+    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
+    7. Distribute the joined data into 2 bins based on whether the
+    `requires_multiple_reqs` field of each entry is `1` or `0`
+    8. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
            completed using this exit relay
-            2. Count the total number of measurements that were
+            2. Count the total number of measurements in this sub-bin that were
            completed using this exit relay and have `is_captcha_found` field
            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.8.1.2}{Step 2.8.1.1} \times 100`$ (Assume `0%` if an
-            exit relay exists in the consensus but there are no corresponding
-            measurements)
-        2. Calculate the weighted average of the percentage values (obtained in
-        Step 2.8.1.3) using exit probabilities (obtained in Step 2.3) as the
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
        scaling factor
-    7. Plot the weighted percentage values for each bin in the Y-axis and
+    9. Plot the weighted percentage values for each bin in the Y-axis and
    the `valid-after` timestamp of the consensus in the X-axis
 3. Merge the graphs created for each consensus

 ### Related questions
- [(8)](home#metrics-to-track) How often does Cloudflare's blocking mechanism
-change/update itself?
- [(10)](home#metrics-to-track) How well does Cloudflare keep track of the new
-or old Tor exit nodes?
- [(10.1)](home#metrics-to-track) How frequently Cloudflare updates its Tor exit
-node list?
+- [(13)](home#metrics-to-track) Is there a difference between websites that load
+resources from third-party resources and websites that contain all resources on
+the origin server? [ticket:33010#comment:6]
+    - [(13.1)](home#metrics-to-track) How do users of websites get affected if
+    the main website is not fronted by Cloudflare, but some of the resources are
+    fetched from a Cloudflare fronted web server? [ticket:33010#comment:6], [ticket:15450]
+

 <!-- ####################################################################### -->

-## Weighted CAPTCHA rate by exit relay location
+## Weighted CAPTCHA rate by CDN provider
 ### Purpose
-Understanding if Cloudflare prefers to block requests more from exit relays in
-certain countries
+Understanding the effect of connecting to websites that use CDN providers such
+as Cloudflare, Akamai, Amazon Cloudfront, etc. on the probability of seeing a
+CAPTCHA

 ### Steps to produce
 1. Get consensuses from CollecTor
@@ -711,73 +486,96 @@ certain countries
    consensus
    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
    experiments. This list contains the metadata about the URLs.
-    6. Join the measurements and URL list using the `URL` fields. Typically each
-    URL maps to multiple measurements.
-    7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-    field
-    8. Obtain the "details document" from Onionoo and match the Onionoo data
-    with the relay entries from consensus using the relay fingerprints. The following query is
-    recommended for obtaining the "details document":
-    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name
-    9. Distribute the exit relay entries from the consensus into bins based on
-    their `country_name` value (obtained in Step 2.8)
-    10. Repeat the following for each bin:
-        1. Repeat the following for each exit relay in the bin:
-            1. Count the total number of measurements that were completed using
-            this exit relay
-            2. Count the total number of measurements that were completed using
-            this exit relay and have `is_captcha_found` field set to `1`
+    6. Join the measurements, URL list, and relay data using the relay
+    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
+    7. Distribute the joined data into bins based on `cdn_provider` field's value
+    8. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-            $`\frac{Step 2.10.1.2}{Step 2.10.1.1} \times 100`$ (Assume `0%` if an
-            exit relay exists in the consensus but there are no corresponding
-            measurements)
-        2. Calculate the weighted average of the percentage values (obtained in
-        Step 2.10.1.3) using exit probabilities (obtained in Step 2.3) as the
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
        scaling factor
-    7. Plot the weighted percentage values for each bin in the Y-axis and
+    9. Plot the weighted percentage values for each bin in the Y-axis and
    the `valid-after` timestamp of the consensus in the X-axis
-3. Merge the graphs with top 10 highest percentage values and discard the rest
-(or keep if you want to have them as well)
+3. Merge the graphs created for each consensus

+
+<!-- ####################################################################### -->
 <!-- ####################################################################### -->

-## Code injection rate
+
+# Graphs for understanding CAPTCHA rates related to user decisions
+## Weighted CAPTCHA rate by method
 ### Purpose
-Cloudflare sometimes injects third-party code to the websites without letting the
-users know. This graph aims to visualize the percentage of measurements were
-affected by third-party code injection over time.
+Understanding the effect of using different methods (for example using
+web browsers like Tor Browser, Firefox over Tor, Brave's Tor Tabs, etc.) on the
+probability of seeing a CAPTCHA

 ### Steps to produce
-0. Determine a date range and granularity to plot. Here, we will plot last 30 days
-with a granularity of 1 hour.
-1. Use CAPTCHA Monitor API to get measurements that were during between the
-chosen date range
-2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
-experiments. This list contains the metadata about the URLs.
-3. Join the measurements and URL list using the `URL` fields. Typically each
-URL maps to multiple measurements.
-4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
-field
-5. Iterate over the chosen date range with the chosen time intervals. Repeat
-the following for each iteration:
-    1. Distribute the measurements that were completed within the
-    interval of this iteration into 2 bins based on `is_data_modified` field's
-    value. Skip the measurements that do not have `is_data_modified` field.
-    2. Repeat the following for each bin:
-        1. Count the total number of measurements in this bin
-        2. Count the total number of measurements in this bin that have
-        `is_captcha_found` field set to `1`
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Join the measurements and relay data using the relay fingerprints.
+    Typically each relay maps to multiple measurements.
+    6. Distribute the joined data into bins based on `method` field's value
+    7. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
            3. Calculate the percentage of measurements that received CAPTCHA using
-        $`\frac{Step 5.2.2}{Step 5.2.1} \times 100`$ (Leave this bin's value
-        empty if there are no corresponding measurements)
-    3. Plot the percentage values for each bin in the Y-axis and the beginning
-    time of this interval in the X-axis
-5. Merge the graphs created for each iteration
+            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    8. Plot the weighted percentage values for each `method` bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related questions
+- [(2)](home#metrics-to-track) How does the HTTP request headers affect
+Cloudflare's decision-making mechanism? [ticket:33010#comment:4]
+    - [(2.1)](home#metrics-to-track) Is there a difference between using the
+    actual Tor Browser itself and tor-browser-selenium in terms of the HTTP headers?
+    - [(2.2)](home#metrics-to-track) How does Cloudflare react differently if the
+    browser doesn't support alt-svc headers? [ticket:32915]
+- [(3)](home#metrics-to-track) How do different browsers with different
+User Agents get affected? [ticket:33010#comment:2], [ticket:32924], [ticket:31404]
+    - [(3.1)](home#metrics-to-track) Is there a difference between using a web
+    browser or fetching web pages via cURL or other HTTP libraries?
+- [(7)](home#metrics-to-track) How does the time of the day affect the
+Cloudflare's blocking mechanism? Does it matter the day of the week or the time
+of the day? [ticket:33010#comment:15]
+- [(15)](home#metrics-to-track) If browsers that should not face CAPTCHA face
+CAPTCHA, why does this happen?
+- [(16)](home#metrics-to-track) How do the observed patterns in the results
+change over time? [ticket:33010]

-<!-- ####################################################################### -->
 <!-- ####################################################################### -->

-# Graphs about Tor Browser centric data
 ## Weighted CAPTCHA rate by Tor Browser version
 ### Purpose
 Understanding the effect of using different Tor Browser versions on the
@@ -873,9 +671,103 @@ Understanding the effect of using Tor Browser at different security levels
 - [(3.3)](home#metrics-to-track) What about the different security levels of Tor
 Browser?

+
 <!-- ####################################################################### -->
 <!-- ####################################################################### -->

+
+# Graphs for understanding the Cloudflare firewall
+## CAPTCHA rate by Cloudflare security level/firewall settings
+### Purpose
+Understanding the effect of different Cloudflare security levels and firewall
+configurations on the probability of seeing a CAPTCHA.
+
+We have a few different domains to test different configurations. Here they are:
+- captcha.wtf
+    - IPv4 only domain, no additional Cloudflare firewall rules
+- yearlight.buzz
+    - IPv4 only domain, Cloudflare firewall is set to present "JS Challenge" for
+    traffic originating from the Tor network
+- bottomlesspit.xyz
+    - IPv4 only domain, Cloudflare firewall is set to present "CAPTCHA Challenge" for
+    traffic originating from the Tor network
+- broccolipizza.monster
+    - IPv4 only domain, Cloudflare firewall is set to block all traffic
+    originating from the Tor network
+- exit11.online
+    - IPv6 only domain, no additional Cloudflare firewall rules
+- icanhazcaptcha.xyz
+    - IPv6 only domain, Cloudflare firewall is set to present "CAPTCHA Challenge" for
+    traffic originating from the Tor network
+
+### Steps to produce
+0. Determine a date range and granularity to plot. Here, we will plot last 30 days
+with a granularity of 1 hour.
+1. Use CAPTCHA Monitor API to get measurements that were *completed
+using only domains specified above* and during the chosen date range
+5. Iterate over the chosen date range with the chosen time intervals. Repeat
+the following for each iteration:
+    1. Distribute the measurements that were completed within the interval of
+    this iteration into bins based on `url` field's value
+    2. Repeat the following for each bin:
+        1. Count the total number of measurements in this bin
+        2. Count the total number of measurements in this bin that have
+        `is_captcha_found` field set to `1`
+        3. Calculate the percentage of measurements that received CAPTCHA using
+        $`\frac{Step 5.2.2}{Step 5.2.1} \times 100`$ (Leave this bin's value
+        empty if there are no corresponding measurements)
+    3. Plot the percentage values for each bin in the Y-axis and the beginning
+    time of this interval in the X-axis
+5. Merge the graphs created for each iteration
+
+### Related questions
+<!-- - [(3.4)](home#metrics-to-track) How does Cloudflare react to browsers with
+and without JavaScript enabled? [ticket:31404] -->
+- [(6)](home#metrics-to-track) How do different security levels of Cloudflare
+affect the blocking mechanism? [ticket:33010#comment:5]
+    - [(6.1)](home#metrics-to-track) Do some of the Cloudflare security levels
+    block users immediately without presenting a CAPTCHA challenge at all?
+
+<!-- ####################################################################### -->
+
+## Code injection rate
+### Purpose
+Cloudflare sometimes injects third-party code to the websites without letting the
+users know. This graph aims to visualize the percentage of measurements were
+affected by third-party code injection over time.
+
+### Steps to produce
+0. Determine a date range and granularity to plot. Here, we will plot last 30 days
+with a granularity of 1 hour.
+1. Use CAPTCHA Monitor API to get measurements that were during between the
+chosen date range
+2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+experiments. This list contains the metadata about the URLs.
+3. Join the measurements and URL list using the `URL` fields. Typically each
+URL maps to multiple measurements.
+4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
+field
+5. Iterate over the chosen date range with the chosen time intervals. Repeat
+the following for each iteration:
+    1. Distribute the measurements that were completed within the
+    interval of this iteration into 2 bins based on `is_data_modified` field's
+    value. Skip the measurements that do not have `is_data_modified` field.
+    2. Repeat the following for each bin:
+        1. Count the total number of measurements in this bin
+        2. Count the total number of measurements in this bin that have
+        `is_captcha_found` field set to `1`
+        3. Calculate the percentage of measurements that received CAPTCHA using
+        $`\frac{Step 5.2.2}{Step 5.2.1} \times 100`$ (Leave this bin's value
+        empty if there are no corresponding measurements)
+    3. Plot the percentage values for each bin in the Y-axis and the beginning
+    time of this interval in the X-axis
+5. Merge the graphs created for each iteration
+
+
+<!-- ####################################################################### -->
+<!-- ####################################################################### -->
+
+
 # Graphs about individual exit relays
 ## Overall CAPTCHA rate
 ### Purpose