|
|
# WARNING: This page is a working draft
|
|
|
|
|
|
# Welcome to CAPTCHA Monitoring project's wiki!
|
|
|
This wiki page contains the final report for the "Tor Project: Cloudflare CAPTCHA Monitoring" project for Google Summer of Code 2020. It is a broad overview of the work completed during the GSoC period, and you can take a look at the [home wiki page](home) for more detailed & latest information.
|
|
|
This wiki page contains the final report for the "Tor Project: Cloudflare CAPTCHA Monitoring" project for Google Summer of Code 2020. It is a broad overview of the work completed during the GSoC period, and you can find more detailed & latest information in the [home wiki page](home).
|
|
|
|
|
|
|
|
|
## What is this project about?
|
... | ... | @@ -10,7 +10,9 @@ The **CAPTCHA Monitoring** project aims to track how often CDN (for ex. Cloudfla |
|
|
|
|
|
## What work has been completed during the GSoC period?
|
|
|
### Background
|
|
|
I have been personally annoyed by receiving CAPTCHAs while using Tor, and going through the Tor Project's issue tickets showed that I wasn't alone in this, especially ticket [#33010](https://gitlab.torproject.org/tpo/metrics/ideas/-/issues/33010). After years of complaints from users and research papers published on the topic, it was clear that a public database & data collection tool was needed to back up the claims and let CDN companies take action. So, the CAPTCHA Monitor was born. Since this project didn't exist before, I designed the whole system and built it during GSoC. The designs of other similar systems, such as [OONI](https://ooni.org/), [Tor Metrics](https://metrics.torproject.org/), and [ExitMap](https://github.com/NullHypothesis/exitmap/), were influential in the decisions I made.
|
|
|
I have been personally annoyed by receiving CAPTCHAs while using Tor, and going through the Tor Project's issue tickets showed that I wasn't alone in this, especially ticket [#33010](https://gitlab.torproject.org/tpo/metrics/ideas/-/issues/33010). After years of complaints from users and research papers published on the topic, it was clear that a public database & data collection tool was needed to back up the claims and let CDN companies take action. So, the CAPTCHA Monitor was born. Since this project didn't exist before, I designed the whole system from scratch and built it during GSoC. The designs of other similar tools, such as [OONI](https://ooni.org/), [Tor Metrics](https://metrics.torproject.org/), and [ExitMap](https://github.com/NullHypothesis/exitmap/), were influential in the decisions I made.
|
|
|
|
|
|
Next, I compiled a list of related tickets & comments from Tor Project's bug tracking system (see [metrics to track section](home#metrics-to-track)) to understand which metrics are valuable to collect and what the community wants to learn. These findings helped me to further tune my design.
|
|
|
|
|
|
Here is a high-level overview of the design I implemented:
|
|
|
```mermaid
|
... | ... | @@ -44,7 +46,7 @@ flowchart LR |
|
|
api --> public
|
|
|
```
|
|
|
|
|
|
There are five separate repositories dedicated to different parts of the system. I will explain the work completed for each repository separately, and here you can see the hierarchy of the repositories:
|
|
|
There are five separate repositories dedicated to different parts of the project. I will explain the work completed for each repository separately, and here you can see the hierarchy of the repositories:
|
|
|
```
|
|
|
CAPTCHA Monitor
|
|
|
|-- Core
|
... | ... | @@ -54,26 +56,25 @@ CAPTCHA Monitor |
|
|
`-- Dashboard
|
|
|
```
|
|
|
|
|
|
### CAPTCHA Monitor Core
|
|
|
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor
|
|
|
|
|
|
|
|
|
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor-Web
|
|
|
|
|
|
### CAPTCHA Monitor Core
|
|
|
The core is responsible for performing the measurements, analyzing the results, and storage. The `compose` submodule periodically fetches the list of URLs from the database and fetches new exit relays from the consensus. Later, it schedules measurement jobs by using the URL and exit relay list. Meanwhile, the `run` submodule runs multiple workers in parallel to process the measurement jobs by letting Tor connect to the requested exit relays and fetching the URL via the Tor browser (or another web browser requested). Finally, the results are stored in the database. The code, issues, and documentation related to the CAPTCHA Monitor Core can be found in [this repository](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor).
|
|
|
|
|
|
https://gitlab.torproject.org/woswos/HTTP-Header-Live
|
|
|
|
|
|
Additionally, the CAPTCHA Monitor Core relies on two other repositories to function. The first one is [HTTP Header Live repository](https://gitlab.torproject.org/woswos/HTTP-Header-Live). It contains a modified version of the [HTTP Header Live web browser extension by Martin Antrag](https://github.com/Nitrama/HTTP-Header-Live). HTTP Header Live is an extension that supports both Firefox & Chromium and it records a copy of the HTTP headers while fetching pages. The modified version of the extension can interface with the CAPTCHA Monitor Core and export the headers in a certain JSON format.
|
|
|
|
|
|
https://github.com/Nitrama/HTTP-Header-Live
|
|
|
The other repository is [CAPTCHA Monitor Web repository](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor-Web). It contains the code for websites served by Cloudflare (see [domains used for testing](home#domains-used-for-testing) section) and Nginx configuration of the webserver. These websites are used during the measurements to test certain properties of the Cloudflare blocking algorithm.
|
|
|
|
|
|
|
|
|
### CAPTCHA Monitor API
|
|
|
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor-API
|
|
|
The API is responsible for serving the collected data
|
|
|
|
|
|
The code, issues, and documentation related to the CAPTCHA Monitor API can be found in [this repository](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor-API).
|
|
|
|
|
|
|
|
|
### CAPTCHA Monitor Dashboard
|
|
|
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor-Dashboard
|
|
|
|
|
|
The code, issues, and documentation related to the CAPTCHA Monitor Dashboard can be found in [this repository](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor-Dashboard).
|
|
|
|
|
|
## Challenges
|
|
|
|
... | ... | |