Changes

Barkin Simsek · 99a11679
--- a/Dashboard-Graphs.md
+++ b/Dashboard-Graphs.md
+This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository. 
+
+The following graph style will be used for all graphs unless otherwise specified:
+* Type
+    * Line chart
+* Axes
+    * X-axis: The dates of the last 30*24 consensuses (last 30 days), each tick
+    representing a single consensus (The plotting tool automatically omits the
+    overlapping labels but keeps the data points in the chart)
+    * Y-axis: The percentage values from 0% to 100%, uses a linear scale
+* Sample Graph
+![graph-style](uploads/e62c2716de6cd64e3a6bf949d1bd0726/graph-style.png)
+
+**Table of contents**
+- [Graphs for understanding CAPTCHA rates related to user decisions](#graphs-for-understanding-captcha-rates-related-to-user-decisions)
+    - [Weighted CAPTCHA rate by method](#weighted-captcha-rate-by-method)
+    - [Weighted CAPTCHA rate by connection security](#weighted-captcha-rate-by-connection-security)
+    - [Weighted CAPTCHA rate by HTTP request quantity](#weighted-captcha-rate-by-http-request-quantity)
+    - [Weighted CAPTCHA rate by CDN provider](#weighted-captcha-rate-by-cdn-provider)
+- [Graphs for understanding the overall network status](#graphs-for-understanding-the-overall-network-status)
+    - [Probability of a Tor client receiving CAPTCHA](#probability-of-a-tor-client-receiving-captcha)
+    - [Weighted CAPTCHA rate by IP version](#weighted-captcha-rate-by-ip-version)
+    - [Weighted CAPTCHA rate by exit probability](#weighted-captcha-rate-by-exit-probability)
+    - [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age)
+    - [Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location)
+- [Graphs for understanding the Cloudflare firewall](#graphs-about-understanding-the-cloudflare-firewall)
+    - [CAPTCHA rate by Cloudflare security level/firewall settings](#captcha-rate-by-cloudflare-security-levelfirewall-settings)
+    - [CAPTCHA rate by traffic origin](#captcha-rate-by-traffic-origin)
+    - [Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age-1)
+    - [Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location-1)
+    - [Code injection rate](#code-injection-rate)
+- [Graphs about Tor Browser centric data](#graphs-about-tor-browser-centric-data)
+    - [Weighted CAPTCHA rate by Tor Browser version](#weighted-captcha-rate-by-tor-browser-version)
+    - [Weighted CAPTCHA rate by Tor Browser security level](#weighted-captcha-rate-by-tor-browser-security-level)
+- [Graphs about individual exit relays](#graphs-about-individual-exit-relays)
+    - [Overall CAPTCHA rate](#overall-captcha-rate)
+    - [CAPTCHA rate by CDN provider](#captcha-rate-by-cdn-provider)
+
+# Graphs for understanding CAPTCHA rates related to user decisions
+## Weighted CAPTCHA rate by method
+### Purpose
+Understanding the effect of using different methods (for example using
+web browsers like Tor Browser, Firefox over Tor, Brave, etc.) on the probability
+of seeing a CAPTCHA while browsing the internet using the public Tor network.
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Join the measurements and relay data using the relay fingerprints.
+    Typically each relay maps to multiple measurements.
+    6. Distribute the joined data into bins based on `method` field's value
+    7. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    8. Plot the weighted percentage values for each `method` bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(2)](home#metrics-to-track) How does the HTTP request headers affect
+Cloudflare's decision-making mechanism? [ticket:33010#comment:4]
+    - [(2.1)](home#metrics-to-track) Is there a difference between using the
+    actual Tor Browser itself and tor-browser-selenium in terms of the HTTP headers?
+    - [(2.2)](home#metrics-to-track) How does Cloudflare react differently if the
+    browser doesn't support alt-svc headers? [ticket:32915]
+- [(3)](home#metrics-to-track) How do different browsers with different
+User Agents get affected? [ticket:33010#comment:2], [ticket:32924], [ticket:31404]
+    - [(3.1)](home#metrics-to-track) Is there a difference between using a web
+    browser or fetching web pages via cURL or other HTTP libraries?
+- [(7)](home#metrics-to-track) How does the time of the day affect the
+Cloudflare's blocking mechanism? Does it matter the day of the week or the time
+of the day? [ticket:33010#comment:15]
+- [(15)](home#metrics-to-track) If browsers that should not face CAPTCHA face
+CAPTCHA, why does this happen?
+- [(16)](home#metrics-to-track) How do the observed patterns in the results
+change over time? [ticket:33010]
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by connection security
+### Purpose
+Understanding the effect of using https and not using https on the probability
+of seeing a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+    experiments. This list contains the metadata about the URLs.
+    6. Join the measurements, URL list, and relay data using the relay
+    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
+    7. Distribute the joined data into 2 bins based on whether the
+    `is_https` field of each entry is `1` or `0`
+    8. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    9. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(14)](home#metrics-to-track) Is there a difference if the origin server has
+an SSL certificate or not?
+    - [(14.1)](home#metrics-to-track) Does the blocking change if the SSL
+    certificate is issued by Cloudflare or by another entity?
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by HTTP request quantity
+### Purpose
+Understanding the effect of connecting to websites that require single or
+multiple HTTP requests to load on the probability of seeing a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+    experiments. This list contains the metadata about the URLs.
+    6. Join the measurements, URL list, and relay data using the relay
+    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
+    7. Distribute the joined data into 2 bins based on whether the
+    `requires_multiple_reqs` field of each entry is `1` or `0`
+    8. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    9. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(13)](home#metrics-to-track) Is there a difference between websites that load
+resources from third-party resources and websites that contain all resources on
+the origin server? [ticket:33010#comment:6]
+    - [(13.1)](home#metrics-to-track) How do users of websites get affected if
+    the main website is not fronted by Cloudflare, but some of the resources are
+    fetched from a Cloudflare fronted web server? [ticket:33010#comment:6], [ticket:15450]
+
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by CDN provider
+### Purpose
+Understanding the effect of connecting to websites that use CDN providers such
+as Cloudflare, Akamai, Amazon Cloudfront, etc. on the probability of seeing a
+CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+    experiments. This list contains the metadata about the URLs.
+    6. Join the measurements, URL list, and relay data using the relay
+    fingerprints and URLs. Typically each relay and URL map to multiple measurements.
+    7. Distribute the joined data into bins based on `cdn_provider` field's value
+    8. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    9. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+<!-- ####################################################################### -->
+<!-- ####################################################################### -->
+
+# Graphs for understanding the overall network status
+## Probability of a Tor client receiving CAPTCHA
+### Purpose
+Understanding the probability of a Tor client choosing an exit relay in the normal
+weighted way receiving a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Repeat the following for each running exit relay entry within the consensus:
+        1. Count the total number of measurements that were completed using this
+        exit relay
+        2. Count the total number of measurements that were completed using this
+        exit relay and have `is_captcha_found` field set to `1`
+        3. Calculate the percentage of measurements that received CAPTCHA using
+        $`\frac{Step 2.5.2}{Step 2.5.1} \times 100`$ (Assume `0%` if an exit relay
+        exists in the consensus but there are no corresponding measurements)
+    6. Calculate the weighted average of the percentage values (obtained in
+    Step 2.5.3) using exit probabilities (obtained in Step 2.3) as the scaling
+    factor
+    7. Map and memorize the consensus's `valid-after` timestamp to the
+    weighted average of the percentages
+3. Plot the weighted percentage values for each consensus in the Y-axis and
+the `valid-after` timestamps in the X-axis
+
+### Related metrics
+- [(12)](home#metrics-to-track) What is the chance of a Tor client getting affected
+by Cloudflare's blocking practices when choosing a Tor exit node? [ticket:33010]
+- [(17)](home#metrics-to-track) Is whether you get a CAPTCHA much more probabilistic
+and transient? [ticket:33010]
+- [(18)](home#metrics-to-track) The chance that a Tor client, choosing an exit
+relay in the normal weighted faction, will get hit by a CAPTCHA [ticket:33010]
+
+
+## Weighted CAPTCHA rate by IP version
+### Purpose
+Understanding the effect of connecting to web servers
+(and consequently exit relays) that support IPv4 vs IPv6 on the probability
+of seeing a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Obtain the "details document" from Onionoo and match the Onionoo data
+    with the relay entries from consensus using the relay fingerprints. The following query is
+    recommended for obtaining the "details document":
+    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,exit_policy_v6_summary
+    6. Distribute the exit relay entries from the consensus into 2 bins based on
+    whether they support IPv6 exiting or not. This should be decided based on
+    the `exit_policy_v6_summary` field obtained from the "details document"
+    7. Repeat the following for each bin:
+        1. Repeat the following for each exit relay in the bin:
+            1. Count the total number of measurements that were
+            completed using this exit relay
+            2. Count the total number of measurements that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.7.1.2}{Step 2.7.1.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        2. Calculate the weighted average of the percentage values (obtained in
+        Step 2.7.1.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    7. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(1)](home#metrics-to-track) Does Cloudflare treat IPv4 and IPv6 addresses
+differently? [ticket:33010#comment:2]
+- [(9)](home#metrics-to-track) How do specific exit nodes get affected by
+Cloudflare's blocking practices?
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by exit probability
+### Purpose
+Understanding the effect of using smaller or larger exit relays on the
+probability of seeing a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Distribute the exit relay entries from the consensus into 10 bins (each
+    bin containing probability values between n and n+0.1) based on their
+    exit probabilities (calculated in Step 2.3)
+    6. Repeat the following for each bin:
+        1. Repeat the following for each exit relay in the bin:
+            1. Count the total number of measurements that were
+            completed using this exit relay
+            2. Count the total number of measurements that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.6.1.2}{Step 2.6.1.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        2. Calculate the weighted average of the percentage values (obtained in
+        Step 2.6.1.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    7. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(9)](home#metrics-to-track) How do specific exit nodes get affected by
+Cloudflare's blocking practices?
+  - [(9.1)](home#metrics-to-track) Does the size/age/location of the exit node
+  play a role? [ticket:33010#comment:15]
+  - [(9.2)](home#metrics-to-track) Is it always the same Tor exit nodes that get
+  blocked?
+- [(11)](home#metrics-to-track) What fraction of the Tor exit nodes get affected
+by Cloudflare's blocking practices? [ticket:33010], [ticket:23840#comment:22]
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by exit relay age
+### Purpose
+Understanding the effect of using older or younger exit relays
+(based on `first_seen` date) on the probability of seeing a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Obtain the "details document" from Onionoo and match the Onionoo data
+    with the relay entries from consensus using the relay fingerprints. The following query is
+    recommended for obtaining the "details document":
+    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen
+    6. Calculate the age of the exit relays in days using the `first_seen` field
+    of the "details document" and `valid-after` timestamp of the consensus
+    (`exit_age` = ceil_days(`valid-after` - `first_seen`))
+    7. Distribute the exit relay entries from the consensus into
+    `(max(exit_age) - min(exit_age)) / 365` bins based on their ages (calculated in Step 2.6)
+    8. Repeat the following for each bin:
+        1. Repeat the following for each exit relay in the bin:
+            1. Count the total number of measurements that were
+            completed using this exit relay
+            2. Count the total number of measurements that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.8.1.2}{Step 2.8.1.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        2. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.1.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    7. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(9)](home#metrics-to-track) How do specific exit nodes get affected by
+Cloudflare's blocking practices?
+    - [(9.1)](home#metrics-to-track) Does the size/age/location of the exit node
+    play a role? [ticket:33010#comment:15]
+    - [(9.2)](home#metrics-to-track) Is it always the same Tor exit nodes that
+    get blocked?
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by exit relay location
+### Purpose
+Understanding the effect of the physical location of the exit relay's location
+on the probability of seeing a CAPTCHA. This graph will show top 10 countries
+with highest CAPTCHA rates.
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Obtain the "details document" from Onionoo and match the Onionoo data
+    with the relay entries from consensus using the relay fingerprints. The following query is
+    recommended for obtaining the "details document":
+    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name
+    6. Distribute the exit relay entries from the consensus into bins based on
+    their `country_name` value (obtained in Step 2.5)
+    7. Repeat the following for each bin:
+        1. Repeat the following for each exit relay in the bin:
+            1. Count the total number of measurements that were
+            completed using this exit relay
+            2. Count the total number of measurements that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.7.1.2}{Step 2.7.1.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        2. Calculate the weighted average of the percentage values (obtained in
+        Step 2.7.1.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    7. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs with top 10 highest percentage values and discard the rest
+(or keep if you want to have them as well)
+
+### Related metrics
+- [(9)](home#metrics-to-track) How do specific exit nodes get affected by
+Cloudflare's blocking practices?
+    - [(9.1)](home#metrics-to-track) Does the size/age/location of the exit node
+    play a role? [ticket:33010#comment:15]
+    - [(9.2)](home#metrics-to-track) Is it always the same Tor exit nodes that get
+    blocked?
+
+<!-- ####################################################################### -->
+<!-- ####################################################################### -->
+
+# Graphs about understanding the Cloudflare firewall
+## CAPTCHA rate by Cloudflare security level/firewall settings
+### Purpose
+Understanding the effect of different Cloudflare security levels and firewall
+configurations on the probability of seeing a CAPTCHA.
+
+We have a few different domains to test different configurations. Here they are:
+- captcha.wtf
+    - IPv4 only domain, no additional Cloudflare firewall rules
+- yearlight.buzz
+    - IPv4 only domain, Cloudflare firewall is set to present "JS Challenge" for
+    traffic originating from the Tor network
+- bottomlesspit.xyz
+    - IPv4 only domain, Cloudflare firewall is set to present "CAPTCHA Challenge" for
+    traffic originating from the Tor network
+- broccolipizza.monster
+    - IPv4 only domain, Cloudflare firewall is set to block all traffic
+    originating from the Tor network
+- exit11.online
+    - IPv6 only domain, no additional Cloudflare firewall rules
+- icanhazcaptcha.xyz
+    - IPv6 only domain, Cloudflare firewall is set to present "CAPTCHA Challenge" for
+    traffic originating from the Tor network
+
+### Steps to produce
+0. Determine a date range and granularity to plot. Here, we will plot last 30 days
+with a granularity of 1 hour.
+1. Use CAPTCHA Monitor API to get measurements that were *completed
+using domains specified above* and during the chosen date range and
+5. Iterate over the chosen date range with the chosen time intervals. Repeat
+the following for each iteration:
+    1. Distribute the measurements that were completed within the interval of
+    this iteration into bins based on `url` field's value
+    2. Repeat the following for each bin:
+        1. Count the total number of measurements in this bin
+        2. Count the total number of measurements in this bin that have
+        `is_captcha_found` field set to `1`
+        3. Calculate the percentage of measurements that received CAPTCHA using
+        $`\frac{Step 5.2.2}{Step 5.2.1} \times 100`$ (Leave this bin's value
+        empty if there are no corresponding measurements)
+    3. Plot the percentage values for each bin in the Y-axis and the beginning
+    time of this interval in the X-axis
+5. Merge the graphs created for each iteration
+
+### Related metrics
+<!-- - [(3.4)](home#metrics-to-track) How does Cloudflare react to browsers with
+and without JavaScript enabled? [ticket:31404] -->
+- [(6)](home#metrics-to-track) How do different security levels of Cloudflare
+affect the blocking mechanism? [ticket:33010#comment:5]
+    - [(6.1)](home#metrics-to-track) Do some of the Cloudflare security levels
+    block users immediately without presenting a CAPTCHA challenge at all?
+
+<!-- ####################################################################### -->
+
+## CAPTCHA rate by traffic origin
+### Purpose
+Understanding how Cloudflare treats to Tor traffic vs. non-Tor traffic (this one
+is stating the obvious but still good to have data to back up the obvious)
+
+### Steps to produce
+0. Determine a date range and granularity to plot. Here, we will plot last 30 days
+with a granularity of 1 hour.
+1. Use CAPTCHA Monitor API to get measurements that were completed during the
+chosen date range
+2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+experiments. This list contains the metadata about the URLs.
+3. Join the measurements and URL list using the `URL` fields. Typically each
+URL maps to multiple measurements.
+4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
+fields
+5. Iterate over the chosen date range with the chosen time intervals. Repeat
+the following for each iteration:
+    1. Distribute the measurements that were completed within the interval of
+    this iteration into 2 bins based on `method` field's value. Put the methods
+    without "tor" (ex. "firefox") into the `Non-Tor Traffic` bin and the rest
+    (ex. "firefox_over_tor") into the `Tor Traffic` bin.
+    2. Repeat the following for each bin:
+        1. Count the total number of measurements in this bin
+        2. Count the total number of measurements in this bin that have
+        `is_captcha_found` field set to `1`
+        3. Calculate the percentage of measurements that received CAPTCHA using
+        $`\frac{Step 5.2.2}{Step 5.2.1} \times 100`$ (Leave this bin's value
+        empty if there are no corresponding measurements)
+    3. Plot the percentage values for each bin in the Y-axis and the beginning
+    time of this interval in the X-axis
+5. Merge the graphs created for each iteration
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by exit relay age
+### Purpose
+Understanding how quickly Cloudflare blocks the newer relays and if there is a
+different treatment for older relays
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+    experiments. This list contains the metadata about the URLs.
+    6. Join the measurements and URL list using the `URL` fields. Typically each
+    URL maps to multiple measurements.
+    7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
+    fields
+    8. Obtain the "details document" from Onionoo and match the Onionoo data
+    with the relay entries from consensus using the relay fingerprints. The following query is
+    recommended for obtaining the "details document":
+    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,first_seen
+    9. Calculate the age of the exit relays in days using the `first_seen` field
+    of the "details document" and `valid-after` timestamp of the consensus
+    (`exit_age` = ceil_days(`valid-after` - `first_seen`))
+    10. Distribute the exit relay entries from the consensus into
+    `(max(exit_age) - min(exit_age)) / 365` bins based on their ages
+    (calculated in Step 2.9)
+    11. Repeat the following for each bin:
+        1. Repeat the following for each exit relay in the bin:
+            1. Count the total number of measurements that were
+            completed using this exit relay
+            2. Count the total number of measurements that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.8.1.2}{Step 2.8.1.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        2. Calculate the weighted average of the percentage values (obtained in
+        Step 2.8.1.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    7. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(8)](home#metrics-to-track) How often does Cloudflare's blocking mechanism
+change/update itself?
+- [(10)](home#metrics-to-track) How well does Cloudflare keep track of the new
+or old Tor exit nodes?
+- [(10.1)](home#metrics-to-track) How frequently Cloudflare updates its Tor exit
+node list?
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by exit relay location
+### Purpose
+Understanding if Cloudflare prefers to block requests more from exit relays in
+certain countries
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor and between the `valid-after` & `fresh-until` timestamps of the
+    consensus
+    5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+    experiments. This list contains the metadata about the URLs.
+    6. Join the measurements and URL list using the `URL` fields. Typically each
+    URL maps to multiple measurements.
+    7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
+    fields
+    8. Obtain the "details document" from Onionoo and match the Onionoo data
+    with the relay entries from consensus using the relay fingerprints. The following query is
+    recommended for obtaining the "details document":
+    https://onionoo.torproject.org/details?type=relay&flag=Exit&fields=exit_addresses,fingerprint,country_name
+    9. Distribute the exit relay entries from the consensus into bins based on
+    their `country_name` value (obtained in Step 2.5)
+    10. Repeat the following for each bin:
+        1. Repeat the following for each exit relay in the bin:
+            1. Count the total number of measurements that were completed using
+            this exit relay
+            2. Count the total number of measurements that were completed using
+            this exit relay and have `is_captcha_found` field set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.10.1.2}{Step 2.10.1.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        2. Calculate the weighted average of the percentage values (obtained in
+        Step 2.10.1.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    7. Plot the weighted percentage values for each bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs with top 10 highest percentage values and discard the rest
+(or keep if you want to have them as well)
+
+<!-- ####################################################################### -->
+
+## Code injection rate
+### Purpose
+Cloudflare sometimes injects third-party code to the websites without letting the
+users know. This graph aims to visualize the percentage of measurements were
+affected by third-party code injection over time.
+
+### Steps to produce
+0. Determine a date range and granularity to plot. Here, we will plot last 30 days
+with a granularity of 1 hour.
+1. Use CAPTCHA Monitor API to get measurements that were during between the
+chosen date range
+2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+experiments. This list contains the metadata about the URLs.
+3. Join the measurements and URL list using the `URL` fields. Typically each
+URL maps to multiple measurements.
+4. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
+fields
+5. Iterate over the chosen date range with the chosen time intervals. Repeat
+the following for each iteration:
+    1. Distribute the measurements that were completed within the
+    interval of this iteration into 2 bins based on `is_data_modified` field's
+    value. Skip the measurements that do not have `is_data_modified` field.
+    2. Repeat the following for each bin:
+        1. Count the total number of measurements in this bin
+        2. Count the total number of measurements in this bin that have
+        `is_captcha_found` field set to `1`
+        3. Calculate the percentage of measurements that received CAPTCHA using
+        $`\frac{Step 5.2.2}{Step 5.2.1} \times 100`$ (Leave this bin's value
+        empty if there are no corresponding measurements)
+    3. Plot the percentage values for each bin in the Y-axis and the beginning
+    time of this interval in the X-axis
+5. Merge the graphs created for each iteration
+
+<!-- ####################################################################### -->
+<!-- ####################################################################### -->
+
+# Graphs about Tor Browser centric data
+## Weighted CAPTCHA rate by Tor Browser version
+### Purpose
+Understanding the effect of using different Tor Browser versions on the
+probability of seeing a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor Browser (`method` field is equal to `tor_browser`) and between
+    the `valid-after` & `fresh-until` timestamps of the consensus
+    5. Join the measurements and relay data using the relay fingerprints.
+    Typically each relay maps to multiple measurements.
+    6. Distribute the joined data into bins based on `browser_version`
+    field's value
+    7. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    8. Plot the weighted percentage values for each `method` bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(3.2)](home#metrics-to-track) What about different versions of the
+Tor Browser? Does Cloudflare behave differently to different versions of the
+same browser?
+
+<!-- ####################################################################### -->
+
+## Weighted CAPTCHA rate by Tor Browser security level
+### Purpose
+Understanding the effect of using Tor Browser at different security levels
+(Standard, Safer, Safest) on the probability of seeing a CAPTCHA
+
+### Steps to produce
+1. Get consensuses from CollecTor
+2. Repeat the following for each consensus:
+    1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
+    consensus header and `bandwidth-weights` values from the footer
+    2. Repeat the following for each *running exit relay* entry within the consensus:
+        1. Parse the `r` line and memorize the IPv4 address and identity
+        2. Parse the `w` line and memorize the bandwidth
+        3. Parse the `s` line and memorize the relay flags
+    3. Calculate the weighted exit probabilities using the `bandwidth-weights`
+    from the consensus, `bandwidth` values, and `flags` for each exit relay
+    (see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
+    4. Use CAPTCHA Monitor API to get measurements that were completed
+    using Tor Browser (`method` field is equal to `tor_browser`) and between
+    the `valid-after` & `fresh-until` timestamps of the consensus
+    5. Join the measurements and relay data using the relay fingerprints.
+    Typically each relay maps to multiple measurements.
+    6. Distribute the joined data into 3 bins based on `tbb_security_level`
+    field's value
+    7. Repeat the following for each bin:
+        1. Further bin the measurements into sub-bins based on the exit relay used
+        to perform the measurement
+        2. Repeat the following for each exit relay in each sub-bin:
+            1. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay
+            2. Count the total number of measurements in this sub-bin that were
+            completed using this exit relay and have `is_captcha_found` field
+            set to `1`
+            3. Calculate the percentage of measurements that received CAPTCHA using
+            $`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
+            exit relay exists in the consensus but there are no corresponding
+            measurements)
+        3. Calculate the weighted average of the percentage values (obtained in
+        Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
+        scaling factor
+    8. Plot the weighted percentage values for each `method` bin in the Y-axis and
+    the `valid-after` timestamp of the consensus in the X-axis
+3. Merge the graphs created for each consensus
+
+### Related metrics
+- [(3.3)](home#metrics-to-track) What about the different security levels of Tor
+Browser?
+
+<!-- ####################################################################### -->
+<!-- ####################################################################### -->
+
+# Graphs about individual exit relays
+## Overall CAPTCHA rate
+### Purpose
+Seeing the overall CAPTCHA rate for a specific exit relay
+
+### Steps to produce
+0. Determine a date range and granularity to plot. Here, we will plot last 30 days
+with a granularity of 1 hour.
+1. Use CAPTCHA Monitor API to get measurements that were completed using the
+target exit relay and between the chosen date range
+2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+experiments. This list contains the metadata about the URLs.
+3. Join the measurements and URL list using the `URL` fields. Typically each
+URL maps to multiple measurements.
+4. Iterate over the chosen date range with the chosen time intervals. Repeat
+the following for each iteration:
+    1. Count the total number of measurements completed within this interval
+    2. Count the total number of measurements completed within this interval
+    that have `is_captcha_found` field set to `1`
+    3. Calculate the percentage of measurements that received CAPTCHA using
+    $`\frac{Step 4.2.2}{Step 4.2.1} \times 100`$ (Leave this interval's value
+    empty if there are no corresponding measurements)
+3. Plot the percentage values for each iteration in the Y-axis and the beginning
+time for each iteration in the X-axis
+
+<!-- ####################################################################### -->
+
+## CAPTCHA rate by CDN provider
+### Purpose
+Understanding how different CDN providers such as Cloudflare, Akamai,
+Amazon Cloudfront, etc. behave requests coming from a specific exit relay
+
+### Steps to produce
+0. Determine a date range and granularity to plot. Here, we will plot last 30 days
+with a granularity of 1 hour.
+1. Use CAPTCHA Monitor API to get measurements that were completed using the
+target exit relay and between the chosen date range
+2. Use CAPTCHA Monitor API to get the list of URLs that are used in the
+experiments. This list contains the metadata about the URLs.
+3. Join the measurements and URL list using the `URL` fields. Typically each
+URL maps to multiple measurements.
+4. Iterate over the chosen date range with the chosen time intervals. Repeat
+the following for each iteration:
+    1. Distribute the measurements that were completed within the
+    interval of this iteration into bins based on `cdn_provider` field's value
+    2. Repeat the following for each bin:
+        1. Count the total number of measurements in this bin
+        2. Count the total number of measurements in this bin that have
+        `is_captcha_found` field set to `1`
+        3. Calculate the percentage of measurements that received CAPTCHA using
+        $`\frac{Step 4.2.2}{Step 4.2.1} \times 100`$ (Leave this bin's value
+        empty if there are no corresponding measurements)
+    3. Plot the percentage values for each bin in the Y-axis and the beginning
+    time of this interval in the X-axis
+5. Merge the graphs created for each iteration