Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
CAPTCHA-Monitor
CAPTCHA-Monitor
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 22
    • Issues 22
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • Barkin Simsek
  • CAPTCHA-MonitorCAPTCHA-Monitor
  • Wiki
    • Updates
  • Monthly Reports

Last edited by Barkin Simsek Aug 25, 2020
Page history

Monthly Reports

Table of Contents

  • August 2020
    • August 24
    • August 17
    • August 10
    • August 3
  • July 2020
    • July 27
    • July 20
    • July 13
    • July 6
  • June 2020
    • June 29
    • June 22
    • June 15
    • June 8
    • June 1
  • May 2020
    • May 25
    • May 18
    • May 11

August 2020

August 24

☑ Past Week:

  • Refactored the code that generates the grahs
  • Documented the missing parts of the codebase

🔲 Week Ahead:

  • Final GSoC report and catching up with last weeks tasks

🛑 Current Blockers: None

August 17

☑ Past Week:

  • Implemented the code for generating the data that will feed the new graphs
  • Worked on documenting the code
  • Experimented with D3.js to draw graphs but settled on using the already existing library (Chart.js)
  • Fixed the blurry looking graph issue that was present on Tor Browser on the retina screens

🔲 Week Ahead:

  • Adjustments to the backend code and writing the frontend code for the graphs

🛑 Current Blockers: None

August 10

☑ Past Week:

  • Finished working on the dashboard design document
  • Got feedback from the Tor community
  • Worked on documentation

🔲 Week Ahead:

  • Implementing the graphs designed
  • More documentation

🛑 Current Blockers: None

August 3

☑ Past Week:

  • Started working on a design document for the dashboard
  • Added more query parameters to the API
  • Updated the Dockerfile to use a non-root user to run the code in the container

🔲 Week Ahead:

  • Finalizing the dashboard design document and implementing the graphs
  • Working on the documentation for the core code

🛑 Current Blockers: None

July 2020

July 27

☑ Past Week:

  • Added Brave Browser’s Tor Tabs as a new fetcher
  • Created a new API (https://api.captcha.wtf/)

🔲 Week Ahead:

  • Adding more graphs to the dashboard

🛑 Current Blockers: None

July 20

☑ Past Week:

  • Migrated to a 2-CPU VPS and got 500% performance increase
  • Migrated the SQLite database to a PostgreSQL database to benefit from multi-core CPU performance
  • Finished the "Relay Search" section
  • Made a few cosmetic changes to the dashboard
  • Moved back to using Docker since the application is mature enough and relies on other tools like PostgreSQL
  • Added more URLs for testing

🔲 Week Ahead:

  • Adding Brave Browser's Tor Tabs as a new fetcher
  • Working on producing more meaningful graphs
  • More cosmetic updates to the dashboard

🛑 Current Blockers: None

July 13

☑ Past Week:

  • Last week's blog post
  • Sent an e-mail to tor-dev
  • Added "Relay Search" section
  • Added a "JavaScript" required warning for Tor Browser users who use the browser at the "safest" level
  • Created an onion service mirror for the dashboard
    • Available at http://5yalu72ryu4xu457kmcze5kxb4on6xh2vkom35jnu4s3respg7hsguqd.onion/
  • Got feedback from people on Reddit about different ways to expand the project such as including Google's reCAPTCHA

🔲 Week Ahead: (Briefly describe your plans for the week ahead.)

  • Finishing the "Relay Search" section
  • Adding the "Tests" or "Experiments" section for showing a brief explanation of the tests
  • Further improving the cosmetics of the dashboard for fixing the UI elements that break at corner cases

🛑 Current Blockers: None

July 6

☑ Past Week:

  • Completed the algorithm for deciding which test to run for exit relays
  • Added GeoIP information to produce graphs for CAPTCHA rate per country
  • Solved the memory leak issue
  • Added annotations to the data
  • Started versioning the codebase
  • Added new the tests for fetching with "firefox_over_tor" and additional websites like https://www.fiverr.com

🔲 Week Ahead:

  • Finishing implementing the Cloudflare API module to carry out tests with different Cloudflare security levels
  • Sending an email to tor-dev mailing list to convey the updates on the project
  • Updating the dashboard to show new features like annotations, versions, etc.

🛑 Current Blockers: None

June 2020

June 29

☑ Past Week:

  • Switched to using HTTP Header Live extension to collect HTTP headers instead of using seleniumwire
    • Seleniumwire was triggering the MITM detection on the Cloudflare end and it was causing an unreaslistic increase in the CAPTCHA rate
  • Added the support for testing with different Tor Browser versions
  • Added the support for checking the webpage integrity
    • Cloudflare sometimes inserts its own JavaScript code into the customer's webpage without letting customers know
    • I check for these changes by comparing the MD5 hashes of the page content
  • Added 'Measurement Search' section to the dashboard to see individual data points
    • Added color indicators for each row to quickly highlight the situation of the measurement
      • Green if there was no CAPTCHA and the page integrity was protected
      • Orange if CAPTCHA was detected or page integrity wasn't protected
      • Red if both CAPTCHA was detected and page integrity wasn't protected
    • Added the support for sharing the custom searches by copying the dashboard's URL
  • Added an algorithm for assigning IPv6 only domains only to exit nodes that support IPv6 exiting to increase the efficiency

🔲 Week Ahead:

  • Creating the algorithm for deciding which test to run for exit relays. This algorithm will add missing tests to the queue when a new relay appears and refresh the measurements for existing relays.
  • Adding GeoIP information to produce graphs for CAPTCHA rate per country
  • Utilizing the earlier implemented Cloudflare API module to carry out tests with different Cloudflare security levels

🛑 Current Blockers:

  • I have a memory leak issue. I don't know how I managed to have a memory leak while using Python but I did :)
  • Sometimes Tor Browser doesn't quit properly and these 'zombie' instances of Tor Browser keep accumulating and occupying space in the memory. Currently, I'm not sure if this is related to selenium, Tor Browser, or both. I need to solve this issue to keep collecting data without any down time. Otherwise, I need to manually remove the zombie instances and it is not a good solution at all.

June 22

☑ Past Week:

  • Updated the dashboard at https://dashboard.captcha.wtf/
  • Implemented the multiple process based parallelism mentioned last week
  • Started collecting data with the new code, the collected data is available at the dashboard
  • Moved the codebase to Tor Project's Gitlab

🔲 Week Ahead:

  • I will work on further decreasing the measurement times
    • Using exit_policy_v6_summary tag from Onionoo to identify exit nodes that support IPv6 and using only these exit nodes for IPv6 tests
  • Adding the ability to use different versions/releases of the Tor Browser

🛑 Current Blockers: None

June 15

☑ Past Week:

  • Updated the Stem integration to set 2 hop circuits for the measurements
    • The first hop is chosen randomly and the final hop is the target exit node
    • Managed to decrease individual test time to 10-14 seconds range with this update
  • Experimented with using the "New Identity" button instead of fully restarting the browser
    • Selenium had issues with reattaching to the browser when I used the "New Identity" button
  • Experimented with Docker swarm to run isolated Tor and Tor Browser instances but encountered problems

🔲 Week Ahead: I was using Docker swarm to have multiple measurements in parallel but that method started becoming unnecessarily complex, memory consuming, and difficult to debug. I decided to use multiple processes on the host machine instead. So, I'm will be coding it.

🛑 Current Blockers: None

June 8

☑ Past Week:

  • Integrated Tor Stem to specify exit nodes
  • Integrated Cloudflare API to change security levels
  • Added the feature to change Tor Browser's security levels
  • Got the dashboard and data collection system up and running
  • Started using the pytest framework for testing

🔲 Week Ahead: Currently, it takes about 40 hours to complete the measurements for all exit nodes. The initial plan was to perform these measurements every day. The measurements need to take less time to fit them into a day. So, I will be working on assigning different processes to different metrics to run them in parallel, which should decrease the processing time.

🛑 Current Blockers: None

June 1

☑ Past Week:

  • Worked on restructuring the codebase to achieve some of the goals set earlier
  • Created "fetchers" for different web browsers
  • Worked on making seleniumwire work with the Tor Browser Bundle
    • Spent time on finding correct settings to flip in the browser and finding the correct way to configure the proxy. This is the resulting script that can capture and modify HTTP headers between Tor and Tor Browser.
  • Wrote a test for testing the existing code

🔲 Week Ahead: Finally got the code for the first version work. So, I plan to have the whole system (including the dashboard) up and running tomorrow. After that, I will work on integrating the Tor Stem and Cloudflare API into the system.

🛑 Current Blockers: None

May 2020

May 25

☑ Past Week:

  • Created the trac tickets for milestones for my project
  • Used the community feedback to update certain aspects of the project
    • Modified the previously registered domains to have IPv4 and IPv6 records only [suggested by ticket:33010#comment:2]
      • captcha.wtf -> IPv4 only
      • exit11.online -> IPv6 only
    • Updated the project diagram and fixed the wrong wording about DNS & CDN usage [suggested by ticket:33010#comment:28]
    • Updated the captcha string to "Cloudflare" to from "Attention Required! | Cloudflare" accommodate possible localizations by Cloudflare [suggested by ticket:33010#comment:25]
  • Added Let's Encrypt issued SSL certificates to the bypass subdomains on the domains
  • Added a Let's Encrypt issued SSL certificate to my IRC bouncer
  • Switched to the Docker versions of the modules/software used in the project
  • Switched to using a Metabase dashboard from Grafana dashboard to visualize collected data
  • Switched to using an SQLite database to store collected data. Previously, influxdb was used and it was a very cumbersome process to export data to other formats. Now, the SQLite database can be easily exported to other formats.
  • Added an SQLite example to the base project code
  • Created the template for the Read the Docs documentation for the project
    • Connected the Read the Docs page to GitHub via webhooks to automate documentation generation process
    • https://captcha-monitor.readthedocs.io/

🔲 Week Ahead:

  • Making the collected data downloadable
  • Having a fully working (hopefully dockerized) proof of concept
    • I already had one working, but it was very poorly implemented since I was trying to do my university work at the same time
  • Creating better documentation for the code I have at the moment

🛑 Current Blockers: None

May 18

☑ Past Week: I spent some of my time setting up the IRC “bouncer” infrastructure to receive IRC messages all the time. I started talking to the OONI people about my project. I also took a very long rescue flight to return home from my university location. Meanwhile, I had finals, and I’m done with my final exams, finally.

🔲 Week Ahead: I plan to actually open the trac tickets to define individual tasks for my project. I planned to do it last week, but I couldn’t do it because of the last-minute developments in my life. I will also keep discussing the details of my project with the external researchers I mentioned.

🛑 Current Blockers: None

May 11

☑ Past Week: I spent my time getting used to IRC and getting know to my mentors. I wrote a wiki article on the Tor Project’s trac to explain my project. The wiki article can be found here 1. My mentors introduced me to a few external researchers that might be helpful for my project. My previous week’s blog post can be found here 1.

🔲 Week Ahead: I plan to open trac tickets to define individual tasks for my project. So that the wider community can make comments on them and watch the progress. I will also discuss the details of my project with the external researchers I mentioned.

🛑 Current Blockers: I have my university finals this week. They don’t really block my progress but they do slow it down.

Clone repository

Home
 Code
 Interesting Places to Visit
 Documentation
 Dataset
 Detailed Description
 Expected Long-term Impact
 Approach
 Metrics to Track
 Related Tickets
 Roadmap
 Domains Used For Testing
 Development
 Contact
 Reporting Bugs
 Contributing

GSoC 2020

Design Docs
 Dashboard Graphs
 Dashboard UI

Updates
 Tor Mailing List Threads
 Monthly Reports
  August 2020
  July 2020
  June 2020
  May 2020
 Weekly Blog Posts
  August 2020
  July 2020
  June 2020
  May 2020

Archive
 Dashboard Graphs v0