This document aims to describe how to produce the graphs that will be on the CAPTCHA Monitor's dashboard at [dashboard.captcha.wtf](https://dashboard.captcha.wtf/). If you have any suggestions/feedback, please mention it under [ticket #41](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/issues/41) of this repository.
The following graph style will be used for all graphs unless otherwise specified:
* Type
* Line chart
* Axes
* X-axis: The dates of the last 30*24 consensuses (last 30 days), each tick
representing a single consensus (The plotting tool automatically omits the
overlapping labels but keeps the data points in the chart)
* Y-axis: The percentage values from 0% to 100%, uses a linear scale
-[Graphs for understanding CAPTCHA rates related to user decisions](#graphs-for-understanding-captcha-rates-related-to-user-decisions)
-[Weighted CAPTCHA rate by method](#weighted-captcha-rate-by-method)
-[Weighted CAPTCHA rate by connection security](#weighted-captcha-rate-by-connection-security)
-[Weighted CAPTCHA rate by HTTP request quantity](#weighted-captcha-rate-by-http-request-quantity)
-[Weighted CAPTCHA rate by CDN provider](#weighted-captcha-rate-by-cdn-provider)
-[Graphs for understanding the overall network status](#graphs-for-understanding-the-overall-network-status)
-[Probability of a Tor client receiving CAPTCHA](#probability-of-a-tor-client-receiving-captcha)
-[Weighted CAPTCHA rate by IP version](#weighted-captcha-rate-by-ip-version)
-[Weighted CAPTCHA rate by exit probability](#weighted-captcha-rate-by-exit-probability)
-[Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age)
-[Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location)
-[Graphs for understanding the Cloudflare firewall](#graphs-about-understanding-the-cloudflare-firewall)
-[CAPTCHA rate by Cloudflare security level/firewall settings](#captcha-rate-by-cloudflare-security-levelfirewall-settings)
-[CAPTCHA rate by traffic origin](#captcha-rate-by-traffic-origin)
-[Weighted CAPTCHA rate by exit relay age](#weighted-captcha-rate-by-exit-relay-age-1)
-[Weighted CAPTCHA rate by exit relay location](#weighted-captcha-rate-by-exit-relay-location-1)
-[Code injection rate](#code-injection-rate)
-[Graphs about Tor Browser centric data](#graphs-about-tor-browser-centric-data)
-[Weighted CAPTCHA rate by Tor Browser version](#weighted-captcha-rate-by-tor-browser-version)
-[Weighted CAPTCHA rate by Tor Browser security level](#weighted-captcha-rate-by-tor-browser-security-level)
-[Graphs about individual exit relays](#graphs-about-individual-exit-relays)
-[Overall CAPTCHA rate](#overall-captcha-rate)
-[CAPTCHA rate by CDN provider](#captcha-rate-by-cdn-provider)
# Graphs for understanding CAPTCHA rates related to user decisions
## Weighted CAPTCHA rate by method
### Purpose
Understanding the effect of using different methods (for example using
web browsers like Tor Browser, Firefox over Tor, Brave, etc.) on the probability
of seeing a CAPTCHA while browsing the internet using the public Tor network.
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Join the measurements and relay data using the relay fingerprints.
Typically each relay maps to multiple measurements.
6. Distribute the joined data into bins based on `method` field's value
7. Repeat the following for each bin:
1. Further bin the measurements into sub-bins based on the exit relay used
to perform the measurement
2. Repeat the following for each exit relay in each sub-bin:
1. Count the total number of measurements in this sub-bin that were
completed using this exit relay
2. Count the total number of measurements in this sub-bin that were
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
8. Plot the weighted percentage values for each `method` bin in the Y-axis and
the `valid-after` timestamp of the consensus in the X-axis
3. Merge the graphs created for each consensus
### Related metrics
-[(2)](home#metrics-to-track) How does the HTTP request headers affect
Understanding the effect of using https and not using https on the probability
of seeing a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
experiments. This list contains the metadata about the URLs.
6. Join the measurements, URL list, and relay data using the relay
fingerprints and URLs. Typically each relay and URL map to multiple measurements.
7. Distribute the joined data into 2 bins based on whether the
`is_https` field of each entry is `1` or `0`
8. Repeat the following for each bin:
1. Further bin the measurements into sub-bins based on the exit relay used
to perform the measurement
2. Repeat the following for each exit relay in each sub-bin:
1. Count the total number of measurements in this sub-bin that were
completed using this exit relay
2. Count the total number of measurements in this sub-bin that were
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
9. Plot the weighted percentage values for each bin in the Y-axis and
the `valid-after` timestamp of the consensus in the X-axis
3. Merge the graphs created for each consensus
### Related metrics
-[(14)](home#metrics-to-track) Is there a difference if the origin server has
an SSL certificate or not?
-[(14.1)](home#metrics-to-track) Does the blocking change if the SSL
certificate is issued by Cloudflare or by another entity?
Understanding the effect of connecting to websites that require single or
multiple HTTP requests to load on the probability of seeing a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
experiments. This list contains the metadata about the URLs.
6. Join the measurements, URL list, and relay data using the relay
fingerprints and URLs. Typically each relay and URL map to multiple measurements.
7. Distribute the joined data into 2 bins based on whether the
`requires_multiple_reqs` field of each entry is `1` or `0`
8. Repeat the following for each bin:
1. Further bin the measurements into sub-bins based on the exit relay used
to perform the measurement
2. Repeat the following for each exit relay in each sub-bin:
1. Count the total number of measurements in this sub-bin that were
completed using this exit relay
2. Count the total number of measurements in this sub-bin that were
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
9. Plot the weighted percentage values for each bin in the Y-axis and
the `valid-after` timestamp of the consensus in the X-axis
3. Merge the graphs created for each consensus
### Related metrics
-[(13)](home#metrics-to-track) Is there a difference between websites that load
resources from third-party resources and websites that contain all resources on
the origin server? [ticket:33010#comment:6]
-[(13.1)](home#metrics-to-track) How do users of websites get affected if
the main website is not fronted by Cloudflare, but some of the resources are
fetched from a Cloudflare fronted web server? [ticket:33010#comment:6], [ticket:15450]
Understanding the effect of connecting to websites that use CDN providers such
as Cloudflare, Akamai, Amazon Cloudfront, etc. on the probability of seeing a
CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
experiments. This list contains the metadata about the URLs.
6. Join the measurements, URL list, and relay data using the relay
fingerprints and URLs. Typically each relay and URL map to multiple measurements.
7. Distribute the joined data into bins based on `cdn_provider` field's value
8. Repeat the following for each bin:
1. Further bin the measurements into sub-bins based on the exit relay used
to perform the measurement
2. Repeat the following for each exit relay in each sub-bin:
1. Count the total number of measurements in this sub-bin that were
completed using this exit relay
2. Count the total number of measurements in this sub-bin that were
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.8.2.2}{Step 2.8.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in
Step 2.8.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
9. Plot the weighted percentage values for each bin in the Y-axis and
the `valid-after` timestamp of the consensus in the X-axis
# Graphs for understanding the overall network status
## Probability of a Tor client receiving CAPTCHA
### Purpose
Understanding the probability of a Tor client choosing an exit relay in the normal
weighted way receiving a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Repeat the following for each running exit relay entry within the consensus:
1. Count the total number of measurements that were completed using this
exit relay
2. Count the total number of measurements that were completed using this
exit relay and have `is_captcha_found` field set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.5.2}{Step 2.5.1} \times 100`$ (Assume `0%` if an exit relay
exists in the consensus but there are no corresponding measurements)
6. Calculate the weighted average of the percentage values (obtained in
Step 2.5.3) using exit probabilities (obtained in Step 2.3) as the scaling
factor
7. Map and memorize the consensus's `valid-after` timestamp to the
weighted average of the percentages
3. Plot the weighted percentage values for each consensus in the Y-axis and
the `valid-after` timestamps in the X-axis
### Related metrics
-[(12)](home#metrics-to-track) What is the chance of a Tor client getting affected
by Cloudflare's blocking practices when choosing a Tor exit node? [ticket:33010]
-[(17)](home#metrics-to-track) Is whether you get a CAPTCHA much more probabilistic
and transient? [ticket:33010]
-[(18)](home#metrics-to-track) The chance that a Tor client, choosing an exit
relay in the normal weighted faction, will get hit by a CAPTCHA [ticket:33010]
## Weighted CAPTCHA rate by IP version
### Purpose
Understanding the effect of connecting to web servers
(and consequently exit relays) that support IPv4 vs IPv6 on the probability
of seeing a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is
Understanding the effect of using smaller or larger exit relays on the
probability of seeing a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Distribute the exit relay entries from the consensus into 10 bins (each
bin containing probability values between n and n+0.1) based on their
exit probabilities (calculated in Step 2.3)
6. Repeat the following for each bin:
1. Repeat the following for each exit relay in the bin:
1. Count the total number of measurements that were
completed using this exit relay
2. Count the total number of measurements that were
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.6.1.2}{Step 2.6.1.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
2. Calculate the weighted average of the percentage values (obtained in
Step 2.6.1.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
7. Plot the weighted percentage values for each bin in the Y-axis and
the `valid-after` timestamp of the consensus in the X-axis
3. Merge the graphs created for each consensus
### Related metrics
-[(9)](home#metrics-to-track) How do specific exit nodes get affected by
Cloudflare's blocking practices?
-[(9.1)](home#metrics-to-track) Does the size/age/location of the exit node
play a role? [ticket:33010#comment:15]
-[(9.2)](home#metrics-to-track) Is it always the same Tor exit nodes that get
blocked?
-[(11)](home#metrics-to-track) What fraction of the Tor exit nodes get affected
by Cloudflare's blocking practices? [ticket:33010], [ticket:23840#comment:22]
Understanding the effect of using older or younger exit relays
(based on `first_seen` date) on the probability of seeing a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is
Understanding the effect of the physical location of the exit relay's location
on the probability of seeing a CAPTCHA. This graph will show top 10 countries
with highest CAPTCHA rates.
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is
Understanding how quickly Cloudflare blocks the newer relays and if there is a
different treatment for older relays
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
experiments. This list contains the metadata about the URLs.
6. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements.
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields
8. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is
Understanding if Cloudflare prefers to block requests more from exit relays in
certain countries
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor and between the `valid-after` & `fresh-until` timestamps of the
consensus
5. Use CAPTCHA Monitor API to get the list of URLs that are used in the
experiments. This list contains the metadata about the URLs.
6. Join the measurements and URL list using the `URL` fields. Typically each
URL maps to multiple measurements.
7. Discard the measurements that do not have `cloudflare` in their `cdn_provider`
fields
8. Obtain the "details document" from Onionoo and match the Onionoo data
with the relay entries from consensus using the relay fingerprints. The following query is
Understanding the effect of using different Tor Browser versions on the
probability of seeing a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor Browser (`method` field is equal to `tor_browser`) and between
the `valid-after` & `fresh-until` timestamps of the consensus
5. Join the measurements and relay data using the relay fingerprints.
Typically each relay maps to multiple measurements.
6. Distribute the joined data into bins based on `browser_version`
field's value
7. Repeat the following for each bin:
1. Further bin the measurements into sub-bins based on the exit relay used
to perform the measurement
2. Repeat the following for each exit relay in each sub-bin:
1. Count the total number of measurements in this sub-bin that were
completed using this exit relay
2. Count the total number of measurements in this sub-bin that were
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
8. Plot the weighted percentage values for each `method` bin in the Y-axis and
the `valid-after` timestamp of the consensus in the X-axis
3. Merge the graphs created for each consensus
### Related metrics
-[(3.2)](home#metrics-to-track) What about different versions of the
Tor Browser? Does Cloudflare behave differently to different versions of the
## Weighted CAPTCHA rate by Tor Browser security level
### Purpose
Understanding the effect of using Tor Browser at different security levels
(Standard, Safer, Safest) on the probability of seeing a CAPTCHA
### Steps to produce
1. Get consensuses from CollecTor
2. Repeat the following for each consensus:
1. Parse and memorize the `valid-after` & `fresh-until` timestamps from the
consensus header and `bandwidth-weights` values from the footer
2. Repeat the following for each *running exit relay* entry within the consensus:
1. Parse the `r` line and memorize the IPv4 address and identity
2. Parse the `w` line and memorize the bandwidth
3. Parse the `s` line and memorize the relay flags
3. Calculate the weighted exit probabilities using the `bandwidth-weights`
from the consensus, `bandwidth` values, and `flags` for each exit relay
(see an example calculation [here](https://gitweb.torproject.org/onionoo.git/tree/src/main/java/org/torproject/metrics/onionoo/updater/NodeDetailsStatusUpdater.java#n597))
4. Use CAPTCHA Monitor API to get measurements that were completed
using Tor Browser (`method` field is equal to `tor_browser`) and between
the `valid-after` & `fresh-until` timestamps of the consensus
5. Join the measurements and relay data using the relay fingerprints.
Typically each relay maps to multiple measurements.
6. Distribute the joined data into 3 bins based on `tbb_security_level`
field's value
7. Repeat the following for each bin:
1. Further bin the measurements into sub-bins based on the exit relay used
to perform the measurement
2. Repeat the following for each exit relay in each sub-bin:
1. Count the total number of measurements in this sub-bin that were
completed using this exit relay
2. Count the total number of measurements in this sub-bin that were
completed using this exit relay and have `is_captcha_found` field
set to `1`
3. Calculate the percentage of measurements that received CAPTCHA using
$`\frac{Step 2.7.2.2}{Step 2.7.2.1} \times 100`$ (Assume `0%` if an
exit relay exists in the consensus but there are no corresponding
measurements)
3. Calculate the weighted average of the percentage values (obtained in
Step 2.7.2.3) using exit probabilities (obtained in Step 2.3) as the
scaling factor
8. Plot the weighted percentage values for each `method` bin in the Y-axis and
the `valid-after` timestamp of the consensus in the X-axis
3. Merge the graphs created for each consensus
### Related metrics
-[(3.3)](home#metrics-to-track) What about the different security levels of Tor