|
|
|
# Warning: This page is a working draft
|
|
|
|
|
|
# Welcome!
|
|
# Welcome!
|
|
This wiki page contains the final report for the "Tor Project: Cloudflare CAPTCHA Monitoring" project for Google Summer of Code 2020
|
|
This wiki page contains the final report for the "Tor Project: Cloudflare CAPTCHA Monitoring" project for Google Summer of Code 2020
|
|
|
|
|
... | @@ -14,25 +16,26 @@ This wiki page contains the final report for the "Tor Project: Cloudflare CAPTCH |
... | @@ -14,25 +16,26 @@ This wiki page contains the final report for the "Tor Project: Cloudflare CAPTCH |
|
### Findings
|
|
### Findings
|
|
|
|
|
|
## What you would do differently if you did it all again?
|
|
## What you would do differently if you did it all again?
|
|
<!--
|
|
Before starting to work on this project, I was using Tor Browser as is and I didn't have detailed technical knowledge on how the whole system works in detail. I only had a rough idea of Tor works and my knowledge about the Tor Browser & Tor software grew pretty organically as I ask questions on IRC, read the spec files, and code. As you have already guessed, I made a few bad decisions at the beginning of the project because of my initial limited knowledge on the inner workings of Tor.
|
|
Before starting to work on this project, I was using Tor Browser as is and I didn't know the technical details of how the whole system works. I only had a rough idea of how things work and my knowledge about the Tor Browser & Tor software grew pretty organically as I ask questions on IRC and read the spec files. As you have already guessed, I made a few bad decisions in the beginning of the project based on my limited knowledge.
|
|
|
|
|
|
|
|
For example, initially, I decided to use relays' OR addresses to index them and I thought all relays use their OR addresses as their exit addresses. Later, I learned that it is not a good idea to use OR addresses for indexing and I switched to using relay fingerprints. Of course, I needed to edit or remove some parts of the codebase to fix this. This is only a single example; I made many mistakes like this one and spent my time fixing them afterward.
|
|
|
|
|
|
|
|
So, I would read all of the spec files and learn more about how things work before starting to code if I did it all again. That said, I learn better when I see things in action and I would probably end up making similar mistakes to actually learn how the internals of Tor work.
|
|
For example, initially, I decided to use relays' OR addresses to index them in the database and I thought all relays use their OR addresses as their exit addresses. Later, I learned that it is not a good idea to use OR addresses for indexing and I switched to using relay fingerprints. I needed to edit or remove some parts of the codebase to make this change.
|
|
|
|
|
|
In addition to this, I underestimated the size of my operations.
|
|
Another example is my initial tool selection. I underestimated the expansion of my project and started with a modest SQLite database to store the data I collect. It was doing an OK job until I passed the 1gb threshold, added the web API, and parallel web page fetchers. My database needed to handle long simultaneous connections and it turned out to be very problematic with SQLite. I solved these issues by switching to PostgreSQL but once again I needed to edit the code to make this change. Luckily, I was expecting to have this upgrade at some point in the future (but not during the GSoC period) and I built the database connection class modular. So, I only needed to edit that class and the rest of the code worked just fine.
|
|
|
|
|
|
SQLite to postgresql
|
|
So, if I did it all again, I would read all of the spec files, learn more about how things work in detail, and better plan the project's future trajectory before starting to code. That said, I learn better when I see things in action and I would probably end up making similar mistakes in other ways. I guess that is a part of the learning experience :)
|
|
|
|
|
|
-->
|
|
|
|
|
|
|
|
## What is left and next?
|
|
## What is left and next?
|
|
|
|
I pretty much finished everything I planned to work (see [roadmap](home#roadmap)). I'm still working on the second version of the dashboard (see #41). It wasn't originally a part of the roadmap but it turned out to be a necessity after the feedback I received from the community. I wanted to finish it during the GSoC period but once again I underestimated the complexity of the task. I plan to finish it in September. Later, I will ask for feedback from the community and add new things based on the feedback.
|
|
|
|
|
|
|
|
I will also work on the [Tor Metrics](https://metrics.torproject.org/) (see [#tpo/metrics/website/40002](https://gitlab.torproject.org/tpo/metrics/website/-/issues/40002)). I'm committed to working on this project and I'm not planning to stop until we achieve all of the [expected long-term impact](home#expected-long-term-impact) agenda. Probably new items will be added to the agenda as well.
|
|
|
|
|
|
|
|
|
|
## Acknowledgments
|
|
## Acknowledgments
|
|
|
|
I want to acknowledge my mentors Georg Koppen (@gk) and Roger Dingledine (@arma) for being very helpful, tirelessly answering my questions all the time, and guiding me to figure out pieces of this puzzle. I wouldn't learn as much as learned today without you, thank you both!
|
|
|
|
|
|
|
|
I also want to thank Dennis Jackson (@djackson) for his extensive feedback on the [Dashboard Graphs](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/Dashboard-Graphs) page and helping me make the graphs as scientific as possible from an academic point of view.
|
|
|
|
|
|
|
|
And finally, a huge thanks to the folks, who replied to my questions in IRC, those replies were very important for me to correct my errors and extend my knowledge.
|
|
|
|
|
|
|
|
|
|
|
|
|
... | | ... | |