Table of Contents
August 2020
August 24
- Refactored the code that generates the grahs
- Documented the missing parts of the codebase
- Final GSoC report and catching up with last weeks tasks
August 17
- Implemented the code for generating the data that will feed the new graphs
- Worked on documenting the code
- Experimented with D3.js to draw graphs but settled on using the already existing library (Chart.js)
- Fixed the blurry looking graph issue that was present on Tor Browser on the retina screens
- Adjustments to the backend code and writing the frontend code for the graphs
August 10
- Finished working on the dashboard design document
- Got feedback from the Tor community
- Worked on documentation
- Implementing the graphs designed
- More documentation
August 3
- Started working on a design document for the dashboard
- Added more query parameters to the API
- Updated the Dockerfile to use a non-root user to run the code in the container
- Finalizing the dashboard design document and implementing the graphs
- Working on the documentation for the core code
July 2020
July 27
- Added Brave Browser’s Tor Tabs as a new fetcher
- Created a new API (https://api.captcha.wtf/)
- Adding more graphs to the dashboard
July 20
- Migrated to a 2-CPU VPS and got 500% performance increase
- Migrated the SQLite database to a PostgreSQL database to benefit from multi-core CPU performance
- Finished the "Relay Search" section
- Made a few cosmetic changes to the dashboard
- Moved back to using Docker since the application is mature enough and relies on other tools like PostgreSQL
- Added more URLs for testing
- Adding Brave Browser's Tor Tabs as a new fetcher
- Working on producing more meaningful graphs
- More cosmetic updates to the dashboard
July 13
- Last week's blog post
- Sent an e-mail to tor-dev
- Added "Relay Search" section
- Added a "JavaScript" required warning for Tor Browser users who use the browser at the "safest" level
- Created an onion service mirror for the dashboard
- Got feedback from people on Reddit about different ways to expand the project such as including Google's reCAPTCHA
- Finishing the "Relay Search" section
- Adding the "Tests" or "Experiments" section for showing a brief explanation of the tests
- Further improving the cosmetics of the dashboard for fixing the UI elements that break at corner cases
July 6
- Completed the algorithm for deciding which test to run for exit relays
- Added GeoIP information to produce graphs for CAPTCHA rate per country
- Solved the memory leak issue
- Added annotations to the data
- Started versioning the codebase
- Added new the tests for fetching with "firefox_over_tor" and additional websites like https://www.fiverr.com
- Finishing implementing the Cloudflare API module to carry out tests with different Cloudflare security levels
- Sending an email to tor-dev mailing list to convey the updates on the project
- Updating the dashboard to show new features like annotations, versions, etc.
June 2020
June 29
- Switched to using HTTP Header Live extension to collect HTTP headers instead of using seleniumwire
- Seleniumwire was triggering the MITM detection on the Cloudflare end and it was causing an unreaslistic increase in the CAPTCHA rate
- Added the support for testing with different Tor Browser versions
- Added the support for checking the webpage integrity
- Cloudflare sometimes inserts its own JavaScript code into the customer's webpage without letting customers know
- I check for these changes by comparing the MD5 hashes of the page content
- Added 'Measurement Search' section to the dashboard to see individual data points
- Added color indicators for each row to quickly highlight the situation of the measurement
- Green if there was no CAPTCHA and the page integrity was protected
- Orange if CAPTCHA was detected or page integrity wasn't protected
- Red if both CAPTCHA was detected and page integrity wasn't protected
- Added the support for sharing the custom searches by copying the dashboard's URL
- Added color indicators for each row to quickly highlight the situation of the measurement
- Added an algorithm for assigning IPv6 only domains only to exit nodes that support IPv6 exiting to increase the efficiency
- Creating the algorithm for deciding which test to run for exit relays. This algorithm will add missing tests to the queue when a new relay appears and refresh the measurements for existing relays.
- Adding GeoIP information to produce graphs for CAPTCHA rate per country
- Utilizing the earlier implemented Cloudflare API module to carry out tests with different Cloudflare security levels
- I have a memory leak issue. I don't know how I managed to have a memory leak while using Python but I did :)
- Sometimes Tor Browser doesn't quit properly and these 'zombie' instances of Tor Browser keep accumulating and occupying space in the memory. Currently, I'm not sure if this is related to selenium, Tor Browser, or both. I need to solve this issue to keep collecting data without any down time. Otherwise, I need to manually remove the zombie instances and it is not a good solution at all.
June 22
- Updated the dashboard at https://dashboard.captcha.wtf/
- Implemented the multiple process based parallelism mentioned last week
- Started collecting data with the new code, the collected data is available at the dashboard
- Moved the codebase to Tor Project's Gitlab
- I will work on further decreasing the measurement times
- Using
exit_policy_v6_summary
tag from Onionoo to identify exit nodes that support IPv6 and using only these exit nodes for IPv6 tests
- Using
- Adding the ability to use different versions/releases of the Tor Browser
June 15
- Updated the Stem integration to set 2 hop circuits for the measurements
- The first hop is chosen randomly and the final hop is the target exit node
- Managed to decrease individual test time to 10-14 seconds range with this update
- Experimented with using the "New Identity" button instead of fully restarting the browser
- Selenium had issues with reattaching to the browser when I used the "New Identity" button
- Experimented with Docker swarm to run isolated Tor and Tor Browser instances but encountered problems
June 8
- Integrated Tor Stem to specify exit nodes
- Integrated Cloudflare API to change security levels
- Added the feature to change Tor Browser's security levels
- Got the dashboard and data collection system up and running
- Started using the pytest framework for testing
June 1
- Worked on restructuring the codebase to achieve some of the goals set earlier
- Created "fetchers" for different web browsers
- Worked on making seleniumwire work with the Tor Browser Bundle
- Spent time on finding correct settings to flip in the browser and finding the correct way to configure the proxy. This is the resulting script that can capture and modify HTTP headers between Tor and Tor Browser.
- Wrote a test for testing the existing code
May 2020
May 25
- Created the trac tickets for milestones for my project
- Used the community feedback to update certain aspects of the project
- Modified the previously registered domains to have IPv4 and IPv6 records only [suggested by ticket:33010#comment:2]
- captcha.wtf -> IPv4 only
- exit11.online -> IPv6 only
- Updated the project diagram and fixed the wrong wording about DNS & CDN usage [suggested by ticket:33010#comment:28]
- Updated the captcha string to "Cloudflare" to from "Attention Required! | Cloudflare" accommodate possible localizations by Cloudflare [suggested by ticket:33010#comment:25]
- Modified the previously registered domains to have IPv4 and IPv6 records only [suggested by ticket:33010#comment:2]
- Added Let's Encrypt issued SSL certificates to the bypass subdomains on the domains
- Added a Let's Encrypt issued SSL certificate to my IRC bouncer
- Switched to the Docker versions of the modules/software used in the project
- Switched to using a Metabase dashboard from Grafana dashboard to visualize collected data
- Switched to using an SQLite database to store collected data. Previously, influxdb was used and it was a very cumbersome process to export data to other formats. Now, the SQLite database can be easily exported to other formats.
- Added an SQLite example to the base project code
- Created the template for the Read the Docs documentation for the project
- Connected the Read the Docs page to GitHub via webhooks to automate documentation generation process
- https://captcha-monitor.readthedocs.io/
- Making the collected data downloadable
- Having a fully working (hopefully dockerized) proof of concept
- I already had one working, but it was very poorly implemented since I was trying to do my university work at the same time
- Creating better documentation for the code I have at the moment