Apply conversion script to all *.md files. authored by Alexander Hansen Færøy's avatar Alexander Hansen Færøy
**''WARNING: Trac is retired as of June 12, 2020 and this wiki page has been moved to Tor Project's GitLab at https://gitlab.torproject.org''**
**_WARNING: Trac is retired as of June 12, 2020 and this wiki page has been moved to Tor Project's GitLab at https://gitlab.torproject.org_**
\\
**''Please view this page on Tor Project's GitLab for getting the most up to date information''**
**_Please view this page on Tor Project's GitLab for getting the most up to date information_**
\\
\\
= Cloudflare CAPTCHA Monitoring =
# Cloudflare CAPTCHA Monitoring
[[TOC(depth=4)]]
The '''Cloudflare CAPTCHA Monitoring''' project aims to track how often Cloudflare fronted webpages return CAPTCHAs to Tor clients. The project aims to achieve this by fetching webpages via both Tor and other mainstream web browsers and comparing the results. The tests are repeated periodically to find the patterns over time. Collected metadata, metrics, and results are analyzed and displayed on a dashboard to understand how Cloudflare manipulates internet traffic and affects people's access to the internet.
The **Cloudflare CAPTCHA Monitoring** project aims to track how often Cloudflare fronted webpages return CAPTCHAs to Tor clients. The project aims to achieve this by fetching webpages via both Tor and other mainstream web browsers and comparing the results. The tests are repeated periodically to find the patterns over time. Collected metadata, metrics, and results are analyzed and displayed on a dashboard to understand how Cloudflare manipulates internet traffic and affects people's access to the internet.
== Code ==
The codebase is in development right now, and it is currently located in [https://github.com/woswos/CAPTCHA-Monitor this GitHub repository].
## Code
The codebase is in development right now, and it is currently located in [this GitHub repository](https://github.com/woswos/CAPTCHA-Monitor).
== Documentation ==
You also find the documentation in [https://captcha-monitor.readthedocs.io this Read the Docs page].
## Documentation
You also find the documentation in [this Read the Docs page](https://captcha-monitor.readthedocs.io).
== Dataset ==
The data collected so far can be found in [http://dashboard.captcha.wtf/ this dashboard]. The data is not in a downloadable format right now, but the link will be added here once it is available.
## Dataset
The data collected so far can be found in [this dashboard](http://dashboard.captcha.wtf/). The data is not in a downloadable format right now, but the link will be added here once it is available.
== Detailed description ==
## Detailed description
By design, Cloudflare is developed to alter the traffic between the web servers and internet users. Cloudflare modifies the internet traffic to protect the Cloudflare fronted web servers from various attacks from users with malicious intentions. Even though this seems like a practice with good faith on the surface to protect servers, it harms millions of users more than doing good. Cloudflare makes decisions to block or not to block users based on multiple factors such as visitor's IP address, resources requested, request payload and frequency, and customer-defined firewall rules [#ref_1 (1)]. They don't share the specifics of their decision-making mechanism since it keeps changing over time, and it is not open-source. However, this doesn't stop us from experimenting with the algorithm and understanding how it decides to block/not block users.
Cloudflare mentions that ​IP address​ based rules have the highest hierarchy, and it is followed by Firewall Rules​, ​Zone(URL) Lockdown,​ ​User Agent Blocking​, and ​Web Application Firewall [#ref_2 (2)]. Thus, Cloudflare clearly states in their documentation that they do consider the user's ​IP addresses​ and their web browser's ​User Agent​ while deciding to block a user. Unfortunately, Cloudflare algorithms trigger all red flags when these two parameters (IP address and user agent) are matching to a typical Tor user. This is an easy thing to do for Cloudflare because Tor Browser uses the one fingerprint for all philosophy, and the list of Tor exit nodes is publicly available. The Cloudflare CTO himself, explains that they fetch the list of Tor exit nodes and assign a reputation to the nodes here in trac ​ticket:18361#comment:23​ to block certain users.
Currently, there are a few research projects (like ​[https://www.freehaven.net/anonbib/%23differential-ndss2016 Khattak et al.]​ and [https://www.freehaven.net/anonbib/%23exit-blocking2017 ​Singh et al.]​) on the Tor user blocking practices, but there is no public tool and/or database collecting data regularly on Cloudflare's Tor user blocking practices, to the best of my knowledge. Thus, this project aims to develop tools to monitor this issue and create a database for public usage. Eventually, once there is enough data accumulated, this tool is aimed to function as a data source for the [https://metrics.torproject.org/ Tor Metrics] project. It was also observed that a lot of users struggle with reliably reproducing the Cloudflare behavior to report in their tickets since there are too many variables involving the process. Thus, this project can be used as a standardized toolset to reproduce Cloudflare's behavior since many of the variables are controlled within the project. The collected data might serve as a reference point to the measurements done by the individual users.
Currently, there are a few research projects (like ​[Khattak et al.](https://www.freehaven.net/anonbib/%23differential-ndss2016)​ and [​Singh et al.](https://www.freehaven.net/anonbib/%23exit-blocking2017)​) on the Tor user blocking practices, but there is no public tool and/or database collecting data regularly on Cloudflare's Tor user blocking practices, to the best of my knowledge. Thus, this project aims to develop tools to monitor this issue and create a database for public usage. Eventually, once there is enough data accumulated, this tool is aimed to function as a data source for the [Tor Metrics](https://metrics.torproject.org/) project. It was also observed that a lot of users struggle with reliably reproducing the Cloudflare behavior to report in their tickets since there are too many variables involving the process. Thus, this project can be used as a standardized toolset to reproduce Cloudflare's behavior since many of the variables are controlled within the project. The collected data might serve as a reference point to the measurements done by the individual users.
=== Expected long-term impact ===
* Creating an up to date and reliable data source for further research on the topic
* Integrating the collected data to ​[https://metrics.torproject.org/ Tor Metrics]
* Integrating the collected data to ​[Tor Metrics](https://metrics.torproject.org/)
* Reducing and relaxing the Cloudflare's CAPTCHA policies
* Helping Tor users browse the internet without sacrificing privacy and getting discriminated
== Approach ==
1. Having Cloudflare fronted websites ([https://captcha.wtf/ captcha.wtf] and [https://exit11.online/ exit11.online]) to simulate various configurations that can be done by the Cloudflare users
## Approach
1. Having Cloudflare fronted websites ([captcha.wtf](https://captcha.wtf/) and [exit11.online](https://exit11.online/)) to simulate various configurations that can be done by the Cloudflare users
2. Periodically fetching these websites via Tor and other mainstream web browsers that are not using Tor
3. Recording if a CAPTCHA is returned during the website fetches and other additional predefined [#metrics metrics]
3. Visualizing the results in a dashboard ([http://dashboard.captcha.wtf/ dashboard.captcha.wtf]) and analyzing the collected data
3. Visualizing the results in a dashboard ([dashboard.captcha.wtf](http://dashboard.captcha.wtf/)) and analyzing the collected data
4. Tracking and making the dataset & the results publicly available
Here is a diagram that explains the approach in detail: \\
\\
[[Image(https://trac.torproject.org/projects/tor/raw-attachment/wiki/doc/CAPTCHAMonitor/CAPTCHA_Monitoring_Project_Diagram.png, width=600px)]]
![https://trac.torproject.org/projects/tor/raw-attachment/wiki/doc/CAPTCHAMonitor/CAPTCHA_Monitoring_Project_Diagram.png, width=600px](https://trac.torproject.org/projects/tor/raw-attachment/wiki/doc/CAPTCHAMonitor/CAPTCHA_Monitoring_Project_Diagram.png, width=600px)
== Metrics to track == #metrics
......@@ -78,16 +78,16 @@ Here are some of the questions that the project will try to answer by tracking r
17. Is whether you get a CAPTCHA much more probabilistic and transient? [ticket:33010]
== Related trac tickets ==
## Related trac tickets
The original ticket initiated this project can be found here: #33010
* #18361 - Issues with corporate censorship and mass surveillance
* #23840 - Google's reCAPTCHA fails 100%
* #24351 - Block Global Active Adversary Cloudflare; [wiki:doc/TheGreatCloudwall The Great Cloudwall]
* #24351 - Block Global Active Adversary Cloudflare; [The Great Cloudwall](./doc/TheGreatCloudwall)
* #31404 - Unsolvable reCAPTCHAs
* #32915 - Cloudflare alt-svc failures cause spurious "DNS resolution error" in Tor Browser
== Roadmap ==
## Roadmap
* [X] Create Cloudflare fronted websites \\
* [X] IPv4 and IPv6 only domains (as suggested by ticket:33010#comment:2)
* captcha.wtf -> IPv4 only
......@@ -97,7 +97,7 @@ The original ticket initiated this project can be found here: #33010
* [X] Create a simple dashboard for displaying collected data \\
* [ ] Make the dataset downloadable \\
* [X] Have a working minimum viable product \\
* [X] Integrate [https://stem.torproject.org/ Tor Stem] \\
* [X] Integrate [Tor Stem](https://stem.torproject.org/) \\
* [X] Integrate more web browsers \\
* [ ] Integrate older versions of the web browsers as well \\
* [X] Integrate Cloudflare API not to change Cloudflare settings (of the websites) manually \\
......@@ -111,21 +111,21 @@ The original ticket initiated this project can be found here: #33010
* [ ] Create an API for running the system on the user-provided websites \\
== Tasks ==
## Tasks
[[TicketQuery(keywords~=CAPTCHAMonitor,format=table,order=priority,desc=false,col=id|summary|component|status|owner|priority|severity|time|changetime|reviewer|reporter,max=10)]]
== Development ==
## Development
GeKo & arma are the mentors of this project, and currently, I'm (woswos) the only developer of this project. I develop this project as a part of the Google Summer of Code program.
=== Contact ===
If you have any questions, concerns, feedback, etc. you can reach me on the #tor-dev or #tor-project channels on IRC. My IRC handle is woswos, and if you need help with connecting to IRC, you can follow [https://support.torproject.org/get-in-touch/#irc-help this tutorial].
### Contact
If you have any questions, concerns, feedback, etc. you can reach me on the #tor-dev or #tor-project channels on IRC. My IRC handle is woswos, and if you need help with connecting to IRC, you can follow [this tutorial](https://support.torproject.org/get-in-touch/#irc-help).
You can also email me at <barkin(at)nyu(dot)edu>
=== Contributing and Reporting Bugs ===
### Contributing and Reporting Bugs
I use the trac tickets to keep track of the issues and the project. You can use ticket #33010 and its child tickets to follow the development and share your contributions and bugs. You will need to use the project's GitHub repository for code contributions.
== References ==
## References
[=#ref_1 (1)] https://web.archive.org/web/20200328165212/https://support.cloudflare.com/hc/en-us/articles/205177068-How-does-Cloudflare-work- \\
[=#ref_2 (2)] https://web.archive.org/web/20200328143759/https://support.cloudflare.com/hc/en-us/articles/115002059131-Understanding-your-site-protection-options. \\
[=#ref_3 (3)] https://web.archive.org/web/20200328183738/https://support.cloudflare.com/hc/en-us/articles/200170056-Understanding-the-Cloudflare-Security-Level
\ No newline at end of file