... | ... | @@ -21,9 +21,9 @@ You can view various visualizations of the collected data on the [dashboard](htt |
|
|
If you want to get a copy of the whole database, I would be very happy to share it, please [contact](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/home#contact) me.
|
|
|
|
|
|
## Detailed description
|
|
|
By design, Cloudflare is developed to alter the traffic between the web servers and internet users. Cloudflare modifies the internet traffic to protect the Cloudflare fronted web servers from various attacks from users with malicious intentions. Even though this seems like a practice with good faith on the surface to protect servers, it harms millions of users more than doing good. Cloudflare makes decisions to block or not to block users based on multiple factors such as visitor's IP address, resources requested, request payload and frequency, and customer-defined firewall rules [Source](https://web.archive.org/web/20200328165212/https://support.cloudflare.com/hc/en-us/articles/205177068-How-does-Cloudflare-work). They don't share the specifics of their decision-making mechanism since it keeps changing over time, and it is not open-source. However, this doesn't stop us from experimenting with the algorithm and understanding how it decides to block/not block users.
|
|
|
By design, Cloudflare is developed to alter the traffic between the web servers and internet users. Cloudflare modifies the internet traffic to protect the Cloudflare fronted web servers from various attacks from users with malicious intentions. Even though this seems like a practice with good faith on the surface to protect servers, it harms millions of users more than doing good. Cloudflare makes decisions to block or not to block users based on multiple factors such as visitor's IP address, resources requested, request payload and frequency, and customer-defined firewall rules ([Source](https://web.archive.org/web/20200328165212/https://support.cloudflare.com/hc/en-us/articles/205177068-How-does-Cloudflare-work)). They don't share the specifics of their decision-making mechanism since it keeps changing over time, and it is not open-source. However, this doesn't stop us from experimenting with the algorithm and understanding how it decides to block/not block users.
|
|
|
|
|
|
Cloudflare mentions that IP address based rules have the highest hierarchy, and it is followed by Firewall Rules, Zone(URL) Lockdown, User Agent Blocking, and Web Application Firewall [Source](https://web.archive.org/web/20200328143759/https://support.cloudflare.com/hc/en-us/articles/115002059131-Understanding-your-site-protection-options). Thus, Cloudflare clearly states in their documentation that they do consider the user's IP addresses and their web browser's User Agent while deciding to block a user. Unfortunately, Cloudflare algorithms trigger all red flags when these two parameters (IP address and user agent) are matching to a typical Tor user. This is an easy thing to do for Cloudflare because Tor Browser uses the one fingerprint for all philosophy, and the list of Tor exit nodes is publicly available. The Cloudflare CTO himself, explains that they fetch the list of Tor exit nodes and assign a reputation to the nodes in trac ticket:18361#comment:23 to block certain users.
|
|
|
Cloudflare mentions that IP address based rules have the highest hierarchy, and it is followed by Firewall Rules, Zone(URL) Lockdown, User Agent Blocking, and Web Application Firewall ([Source](https://web.archive.org/web/20200328143759/https://support.cloudflare.com/hc/en-us/articles/115002059131-Understanding-your-site-protection-options)). Thus, Cloudflare clearly states in their documentation that they do consider the user's IP addresses and their web browser's User Agent while deciding to block a user. Unfortunately, Cloudflare algorithms trigger all red flags when these two parameters (IP address and user agent) are matching to a typical Tor user. This is an easy thing to do for Cloudflare because Tor Browser uses the one fingerprint for all philosophy, and the list of Tor exit nodes is publicly available. The Cloudflare CTO himself, explains that they fetch the list of Tor exit nodes and assign a reputation to the nodes in trac ticket:18361#comment:23 to block certain users.
|
|
|
|
|
|
Currently, there are a few research projects (like [Khattak et al.](https://www.freehaven.net/anonbib/%23differential-ndss2016) and [Singh et al.](https://www.freehaven.net/anonbib/%23exit-blocking2017)) on the Tor user blocking practices, but there is no public tool and/or database collecting data regularly on Cloudflare's Tor user blocking practices, to the best of my knowledge. Thus, this project aims to develop tools to monitor this issue and create a database for public usage. Eventually, once there is enough data accumulated, this tool is aimed to function as a data source for the [Tor Metrics](https://metrics.torproject.org/) project. It was also observed that a lot of users struggle with reliably reproducing the Cloudflare behavior to report in their tickets since there are too many variables involving the process. Thus, this project can be used as a standardized toolset to reproduce Cloudflare's behavior since many of the variables are controlled within the project. The collected data might serve as a reference point to the measurements done by the individual users.
|
|
|
|
... | ... | |