Create Developer Guide authored by Barkin Simsek's avatar Barkin Simsek
# Getting the system run
1. You can start up all of the container using `make up` as you can see below. If you run `docker ps`, you will see that we just started a bunch of containers.
![Screen_Shot_2021-05-20_at_15.16.33](uploads/5c4e532f636425df35a45e74bf799c27/Screen_Shot_2021-05-20_at_15.16.33.png)
- `captchamonitor-tor-container` runs a copy of a Tor client, a new `captchamonitor-tor-container` container created for each worker in the system
- `captchamonitor-tor-browser-container` runs an instance of [Docker Selenium](https://github.com/SeleniumHQ/docker-selenium) that was compiled to use Tor Browser to fetch websites on demand
- `selenium/standalone-firefox` runs an instance of [Docker Selenium](https://github.com/SeleniumHQ/docker-selenium) that was compiled to use Firefox Browser to fetch websites on demand
- `selenium/standalone-chrome` runs an instance of [Docker Selenium](https://github.com/SeleniumHQ/docker-selenium) that was compiled to use Chrome Browser to fetch websites on demand
- `postgres:9.6` runs a copy of the PostgreSQL database server. As you can see above, this container has port `5432` exposed outside of the network. So, we can connect to it and check the data inside.
- `captchamonitor` runs the code we write
1. Now, all containers run in the background and they don't require any user interaction by design. If you want to view the output of our code, you can use `make logs` command as shown below. However, these logs are just here for us to debug things if needed. The program should place all useful information into the database. It shouldn't just print to screen.
![Screen_Shot_2021-05-20_at_15.51.17](uploads/c3d0403f20d84dc043772208ce71a02b/Screen_Shot_2021-05-20_at_15.51.17.png)
1. If we connect to the database using a database client (such as [DBeaver](https://dbeaver.io/)), we will see that 6 tables were already created for us. These tables are `url`, `relay`, `fetcher`, `fetch_queue`, `fetch_completed`, `fetch_failed`. You can check [this file](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/blob/master/src/captchamonitor/utils/models.py) to learn more details. We can and should add more tables for other functionality, for example consensus, etc.
1. The [worker](https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/blob/master/src/captchamonitor/core/worker.py), first spins up a new `captchamonitor-tor-container`. Later, it claims the job from `fetch_queue` table, processes it using the specified web browser, and places the results back into the database. However, before adding any job to `fetch_queue` table, we first need to add some fetchers, URLs, and relays. Here, I used the database client to add data manually, but this should be automated later.
![Screen_Shot_2021-05-20_at_15.25.06](uploads/c1e22cca8768e103feabc3a59de78d65/Screen_Shot_2021-05-20_at_15.25.06.png)
![Screen_Shot_2021-05-20_at_15.25.08](uploads/d35bf27ba2d7c8796bf6cc52f4f607fb/Screen_Shot_2021-05-20_at_15.25.08.png)
![Screen_Shot_2021-05-20_at_15.25.11](uploads/073fff32652e4f7e8b7258724c2e121c/Screen_Shot_2021-05-20_at_15.25.11.png)
Now, we can add some jobs that use the relays, URLs, and fetchers we just added:
![Screen_Shot_2021-05-20_at_15.25.56](uploads/8033947bae1eaf01ed6858a9208b36fc/Screen_Shot_2021-05-20_at_15.25.56.png)
1. After a few seconds, one the workers will claim the job and process it:
![Screen_Shot_2021-05-20_at_15.26.03](uploads/5212987c621e55d17213e06239d6a1c4/Screen_Shot_2021-05-20_at_15.26.03.png)
1. Next, the worker will keep processing the rest of the jobs and it will remove completed jobs from the queue:
![Screen_Shot_2021-05-20_at_15.26.21](uploads/80c421c2643472e616227e359e438ec8/Screen_Shot_2021-05-20_at_15.26.21.png)
![Screen_Shot_2021-05-20_at_15.26.38](uploads/aac63f12048e5e3fbe9b8158791396aa/Screen_Shot_2021-05-20_at_15.26.38.png)
1. Finally, the successful fetches will show up in the `fetch_completed` table:
![Screen_Shot_2021-05-20_at_15.26.48](uploads/d935547a51eb561daed9d53100bd6c62/Screen_Shot_2021-05-20_at_15.26.48.png)
The HTML data received to the web browser and a dump of the HAR will be available in each result:
![Screen_Shot_2021-05-20_at_15.27.01](uploads/bbe5b7f1c1643f8820671178c2a5819a/Screen_Shot_2021-05-20_at_15.27.01.png)
![Screen_Shot_2021-05-20_at_15.27.07](uploads/546136e0f981afb081af3e3af93a83f8/Screen_Shot_2021-05-20_at_15.27.07.png)
1. If there was a problem with a job, it will be moved into the `fetch_failed` table by the worker.
# Stopping the system
1. You should use `make down`, otherwise you might leave some of the containers behind or maybe even corrupt data:
![Screen_Shot_2021-05-20_at_16.13.56](uploads/6acc9787459e88f303e678ec15d7e4ec/Screen_Shot_2021-05-20_at_16.13.56.png)
\ No newline at end of file