Table of Contents

August 2020
July 2020
- July 27
- July 20
- July 13
- July 6
June 2020
- June 29
- June 22
- June 15
- June 8
- June 1
May 2020
- May 25
- May 18
- May 11

August 2020

August 24

Past Week:

Refactored the code that generates the grahs
Documented the missing parts of the codebase

Week Ahead:

Final GSoC report and catching up with last weeks tasks

Current Blockers: None

August 17

Past Week:

Implemented the code for generating the data that will feed the new graphs
Worked on documenting the code
Experimented with D3.js to draw graphs but settled on using the already existing library (Chart.js)
Fixed the blurry looking graph issue that was present on Tor Browser on the retina screens

Week Ahead:

Adjustments to the backend code and writing the frontend code for the graphs

Current Blockers: None

August 10

Past Week:

Finished working on the dashboard design document
Got feedback from the Tor community
Worked on documentation

Week Ahead:

Implementing the graphs designed
More documentation

Current Blockers: None

August 3

Past Week:

Started working on a design document for the dashboard
Added more query parameters to the API
Updated the Dockerfile to use a non-root user to run the code in the container

Week Ahead:

Finalizing the dashboard design document and implementing the graphs
Working on the documentation for the core code

Current Blockers: None

July 2020

July 27

Past Week:

Added Brave Browser’s Tor Tabs as a new fetcher
Created a new API (https://api.captcha.wtf/)

Week Ahead:

Adding more graphs to the dashboard

Current Blockers: None

July 20

Past Week:

Migrated to a 2-CPU VPS and got 500% performance increase
Migrated the SQLite database to a PostgreSQL database to benefit from multi-core CPU performance
Finished the "Relay Search" section
Made a few cosmetic changes to the dashboard
Moved back to using Docker since the application is mature enough and relies on other tools like PostgreSQL
Added more URLs for testing

Week Ahead:

Adding Brave Browser's Tor Tabs as a new fetcher
Working on producing more meaningful graphs
More cosmetic updates to the dashboard

Current Blockers: None

July 13

Past Week:

Last week's blog post
Sent an e-mail to tor-dev
Added "Relay Search" section
Added a "JavaScript" required warning for Tor Browser users who use the browser at the "safest" level
Created an onion service mirror for the dashboard
- Available at http://5yalu72ryu4xu457kmcze5kxb4on6xh2vkom35jnu4s3respg7hsguqd.onion/
Got feedback from people on Reddit about different ways to expand the project such as including Google's reCAPTCHA

Week Ahead: (Briefly describe your plans for the week ahead.)

Finishing the "Relay Search" section
Adding the "Tests" or "Experiments" section for showing a brief explanation of the tests
Further improving the cosmetics of the dashboard for fixing the UI elements that break at corner cases

Current Blockers: None

July 6

Past Week:

Completed the algorithm for deciding which test to run for exit relays
Added GeoIP information to produce graphs for CAPTCHA rate per country
Solved the memory leak issue
Added annotations to the data
Started versioning the codebase
Added new the tests for fetching with "firefox_over_tor" and additional websites like https://www.fiverr.com

Week Ahead:

Finishing implementing the Cloudflare API module to carry out tests with different Cloudflare security levels
Sending an email to tor-dev mailing list to convey the updates on the project
Updating the dashboard to show new features like annotations, versions, etc.

Current Blockers: None

June 2020

June 29

Past Week:

Switched to using HTTP Header Live extension to collect HTTP headers instead of using seleniumwire
- Seleniumwire was triggering the MITM detection on the Cloudflare end and it was causing an unreaslistic increase in the CAPTCHA rate
Added the support for testing with different Tor Browser versions
Added the support for checking the webpage integrity
- Cloudflare sometimes inserts its own JavaScript code into the customer's webpage without letting customers know
- I check for these changes by comparing the MD5 hashes of the page content
Added 'Measurement Search' section to the dashboard to see individual data points
- Added color indicators for each row to quickly highlight the situation of the measurement
  - Green if there was no CAPTCHA and the page integrity was protected
  - Orange if CAPTCHA was detected or page integrity wasn't protected
  - Red if both CAPTCHA was detected and page integrity wasn't protected
- Added the support for sharing the custom searches by copying the dashboard's URL
Added an algorithm for assigning IPv6 only domains only to exit nodes that support IPv6 exiting to increase the efficiency

Week Ahead:

Creating the algorithm for deciding which test to run for exit relays. This algorithm will add missing tests to the queue when a new relay appears and refresh the measurements for existing relays.
Adding GeoIP information to produce graphs for CAPTCHA rate per country
Utilizing the earlier implemented Cloudflare API module to carry out tests with different Cloudflare security levels

Current Blockers:

I have a memory leak issue. I don't know how I managed to have a memory leak while using Python but I did :)
Sometimes Tor Browser doesn't quit properly and these 'zombie' instances of Tor Browser keep accumulating and occupying space in the memory. Currently, I'm not sure if this is related to selenium, Tor Browser, or both. I need to solve this issue to keep collecting data without any down time. Otherwise, I need to manually remove the zombie instances and it is not a good solution at all.

June 22

Past Week:

Updated the dashboard at https://dashboard.captcha.wtf/
Implemented the multiple process based parallelism mentioned last week
Started collecting data with the new code, the collected data is available at the dashboard
Moved the codebase to Tor Project's Gitlab

Week Ahead:

I will work on further decreasing the measurement times
- Using exit_policy_v6_summary tag from Onionoo to identify exit nodes that support IPv6 and using only these exit nodes for IPv6 tests
Adding the ability to use different versions/releases of the Tor Browser

Current Blockers: None

June 15

Past Week:

Updated the Stem integration to set 2 hop circuits for the measurements
- The first hop is chosen randomly and the final hop is the target exit node
- Managed to decrease individual test time to 10-14 seconds range with this update
Experimented with using the "New Identity" button instead of fully restarting the browser
- Selenium had issues with reattaching to the browser when I used the "New Identity" button
Experimented with Docker swarm to run isolated Tor and Tor Browser instances but encountered problems

Week Ahead: I was using Docker swarm to have multiple measurements in parallel but that method started becoming unnecessarily complex, memory consuming, and difficult to debug. I decided to use multiple processes on the host machine instead. So, I'm will be coding it.

Current Blockers: None

June 8

Past Week:

Integrated Tor Stem to specify exit nodes
Integrated Cloudflare API to change security levels
Added the feature to change Tor Browser's security levels
Got the dashboard and data collection system up and running
Started using the pytest framework for testing

Week Ahead: Currently, it takes about 40 hours to complete the measurements for all exit nodes. The initial plan was to perform these measurements every day. The measurements need to take less time to fit them into a day. So, I will be working on assigning different processes to different metrics to run them in parallel, which should decrease the processing time.

Current Blockers: None

June 1

Past Week:

Worked on restructuring the codebase to achieve some of the goals set earlier
Created "fetchers" for different web browsers
Worked on making seleniumwire work with the Tor Browser Bundle
- Spent time on finding correct settings to flip in the browser and finding the correct way to configure the proxy. This is the resulting script that can capture and modify HTTP headers between Tor and Tor Browser.
Wrote a test for testing the existing code

Week Ahead: Finally got the code for the first version work. So, I plan to have the whole system (including the dashboard) up and running tomorrow. After that, I will work on integrating the Tor Stem and Cloudflare API into the system.

Current Blockers: None

May 2020

May 25

Past Week:

Created the trac tickets for milestones for my project
Used the community feedback to update certain aspects of the project
- Modified the previously registered domains to have IPv4 and IPv6 records only [suggested by ticket:33010#comment:2]
  - captcha.wtf -> IPv4 only
  - exit11.online -> IPv6 only
- Updated the project diagram and fixed the wrong wording about DNS & CDN usage [suggested by ticket:33010#comment:28]
- Updated the captcha string to "Cloudflare" to from "Attention Required! | Cloudflare" accommodate possible localizations by Cloudflare [suggested by ticket:33010#comment:25]
Added Let's Encrypt issued SSL certificates to the bypass subdomains on the domains
Added a Let's Encrypt issued SSL certificate to my IRC bouncer
Switched to the Docker versions of the modules/software used in the project
Switched to using a Metabase dashboard from Grafana dashboard to visualize collected data
Switched to using an SQLite database to store collected data. Previously, influxdb was used and it was a very cumbersome process to export data to other formats. Now, the SQLite database can be easily exported to other formats.
Added an SQLite example to the base project code
Created the template for the Read the Docs documentation for the project
- Connected the Read the Docs page to GitHub via webhooks to automate documentation generation process
- https://captcha-monitor.readthedocs.io/

Week Ahead:

Making the collected data downloadable
Having a fully working (hopefully dockerized) proof of concept
- I already had one working, but it was very poorly implemented since I was trying to do my university work at the same time
Creating better documentation for the code I have at the moment

Current Blockers: None

May 18

Past Week: I spent some of my time setting up the IRC “bouncer” infrastructure to receive IRC messages all the time. I started talking to the OONI people about my project. I also took a very long rescue flight to return home from my university location. Meanwhile, I had finals, and I’m done with my final exams, finally.

Week Ahead: I plan to actually open the trac tickets to define individual tasks for my project. I planned to do it last week, but I couldn’t do it because of the last-minute developments in my life. I will also keep discussing the details of my project with the external researchers I mentioned.

Current Blockers: None

May 11

Past Week: I spent my time getting used to IRC and getting know to my mentors. I wrote a wiki article on the Tor Project’s trac to explain my project. The wiki article can be found here 1. My mentors introduced me to a few external researchers that might be helpful for my project. My previous week’s blog post can be found here 1.

Week Ahead: I plan to open trac tickets to define individual tasks for my project. So that the wider community can make comments on them and watch the progress. I will also discuss the details of my project with the external researchers I mentioned.

Current Blockers: I have my university finals this week. They don’t really block my progress but they do slow it down.

Comments

Please register or sign in to add a comment.

Monthly Reports

August 2020

August 24

August 17

August 10

August 3

July 2020

July 27

July 20

July 13

July 6

June 2020

June 29

June 22

June 15

June 8

June 1

May 2020

May 25

May 18

May 11

Comments