... | @@ -5,6 +5,41 @@ _Following blog posts are mirrored from [DIAL's blog](https://hub.osc.dial.commu |
... | @@ -5,6 +5,41 @@ _Following blog posts are mirrored from [DIAL's blog](https://hub.osc.dial.commu |
|
[[_TOC_]]
|
|
[[_TOC_]]
|
|
|
|
|
|
# July 2020
|
|
# July 2020
|
|
|
|
## July 31
|
|
|
|
This week I released CAPTCHA Monitor v0.2.0, and it mostly contains changes from last week. I wrote the changelog and made changes to the README file. This release contains major changes (like the migration to PostgreSQL) that the installation steps needed to be updated.
|
|
|
|
|
|
|
|
After finalizing this release, I started working on a design document for the dashboard. I contacted a few people in addition to my mentors, and they will give me feedback once I'm done with the document. This document mainly aims for the reproducibility of the graphs, and it makes sure that the graphs I'm designing are correct and make sense. Also, trying to build the dashboard without a solid plan turned into a nightmare since there are too many variables attached to a single measurement. So, this design document will help me be more organized and not waste time on unnecessary graphs that would turn out useless.
|
|
|
|
|
|
|
|
While working on the design document, I realized that my API didn't offer a proper way to count the number of rows that a query returns. This is essential for performing statistics for the dashboard. So, I added a meta parameter called `count_only` to the API calls. When this parameter is set to `1`, the API only returns the number of rows that match the requested criteria. This simple but effective save a lot of bandwidth and decreases API call durations dramatically for the cases where megabytes of HTML data is not needed for the required operation.
|
|
|
|
|
|
|
|
I also worked on migrating the user in the Dockerfile to a non-root user and decreasing the Docker image size. By default, the active user is the root user for docker containers, which might pose security issues. Thus, I worked on creating a new non-root user and running CAPTCHA Monitor with that user.
|
|
|
|
|
|
|
|
Finally, I completed my second GSoC evaluation and finalized the second coding stage. My experience has been amazing so far, and I'm on track with coding. The dashboard and visualizations turned out to be more problematic than I expected, but I will overcome it with the help of the design document that I'm working on.
|
|
|
|
|
|
|
|
|
|
|
|
# July 2020
|
|
|
|
## July 24
|
|
|
|
Until this point, the "API" was a small part in the dashboard code, and it wasn't a real API. This week I spent time on creating a fully-fledged and properly documented API, which is located at [api.captcha.wtf](https://api.captcha.wtf/)
|
|
|
|
|
|
|
|
![5e7c959b69293c444a5b86998470986f3074d451](uploads/2139540fb0f80e420f367a7b7b684576/5e7c959b69293c444a5b86998470986f3074d451.png)
|
|
|
|
|
|
|
|
The new API can perform complex filtering on the data on the database level before transmitting any data to users. The old version was literally transferring all rows in the database in a single call. This was working OK when I had a fewer amount of data in the database. Now, having the ability to filter results before fetching the data saves a lot of bandwidth and processing power.
|
|
|
|
|
|
|
|
Also, the documentation framework I used shows the same API calls and I think it is pretty useful for people who are interested in using the API to learn how it works.
|
|
|
|
|
|
|
|
![f9ce08d036dfbd744454f45146aee8931c8c9954](uploads/2a0cf978778aa7c022ebbb34836a2223/f9ce08d036dfbd744454f45146aee8931c8c9954.png)
|
|
|
|
|
|
|
|
The other significant "achievement" was getting Brave Browser's "Private Window with Tor" to work with my current system. Last week, I thought I could easily add it to the system, but later, I realized that the `chromedriver` that I was using is not capable of receiving key shortcuts, which is the only way to open Private Window with Tor" in Brave Browser. After thinking more, I thought about sending key shortcuts to the Brave Browser process directly instead of using a high-level tool like `chromedriver`, and it did work! Sometimes you need to think outside of the box to make things work :) The rest of the code wasn't that difficult after completing this small step.
|
|
|
|
|
|
|
|
|
|
|
|
## July 17
|
|
|
|
This week I worked on tasks that I wasn't planning and expecting to work on. I first finished the "Relays Search" section by adding the code for calculating the CAPTCHA probabilities. After this, I wanted to work on the "Experiment Search" section of the dashboard, but I realized that I forgot to put the Chromium fetchers in production. So, I spent some time updating the old Chromium fetcher to become compatible with the HTTP Header Live extension based system, and I put in production. My mentor, Roger, also suggested me to add a new fetcher based on Brave browser's "Tor Tabs". I figured out how to make it happen and spent time experimenting with it.
|
|
|
|
|
|
|
|
Additionally, I found more URLs to track by simply going through the Alexa Top 500 list. I identified a lot of websites that show CAPTCHAs to Tor users and put these sites into the system. Meanwhile, I realized that the SQLite database I'm using wasn't able to handle the increased demand. It was frequently locking itself because of the concurrent connections. Thus I decided to switch to a more robust solution like PostgreSQL, and I worked on adapting the parts of the code that interfaces with the database to use PostgreSQL.
|
|
|
|
|
|
|
|
Finally, I presented my project in Tor's Metrics Team meeting. I got good feedback, and the Metrics team liked the project. I hope and plan to integrate my project into the Tor's metrics dashboard to reach a broader audience.
|
|
|
|
|
|
|
|
|
|
## July 10
|
|
## July 10
|
|
This week I spent a lot of time working on the dashboard to add new graphs and features. That said, I needed to add a few new features in the backend to support the updates in the dashboard. First, I used the freely available [GeoLite2](https://hub.osc.dial.community/t/weekly-gsoc-standups-for-2020-w28/1770/12) database to get the physical locations (country and continent) of the exit relays and I recorded that information in the database.
|
|
This week I spent a lot of time working on the dashboard to add new graphs and features. That said, I needed to add a few new features in the backend to support the updates in the dashboard. First, I used the freely available [GeoLite2](https://hub.osc.dial.community/t/weekly-gsoc-standups-for-2020-w28/1770/12) database to get the physical locations (country and continent) of the exit relays and I recorded that information in the database.
|
|
|
|
|
... | | ... | |