There was a discussion as in not to use the `request` library as a whole because that might have wrong influence on the data provided by the Captcha Monitor.
There was a discussion as in not to use the `request` library as a whole because that might have wrong influence on the data provided by the Captcha Monitor.
This week I added the few mentioned lists below:
I implemented the logic part of `analyser.py` and wrote scripts to add it into the Database. Basically I implemented and integrated the `analyser.py` into the Captcha Monitor.
- [X] Logic for the Captcha-Monitor has been implemented (Analyser.py).
The details are as follows:
- [X] Add flags and integral results(dom_analyse, captcha_checker, status_check) to insert into Database.
- [X] Change in the existing database (models.py). Create a new table with: id, created_at, updated_at, fetch_completed_id, captcha_checker, dom_analyse, status_check.
- [X] Logic for Captcha has been implemented (`Analyser.py`)
- [X] Write tests to check if `analyze_completed` returns the queried results.
- [X] Changed a bit of logic and added the use of flags and integral results (dom_analyse, captcha_checker, check) to insert into DB.
- [ ] Write more tests for `analyser.py` and improve the coverage of it.
- [X] Data from the analyser need to be inserted into DB
- [X] Add `analyser_completed` table to the `models.py` with details such as `captcha_checker`, `dom_analyse,``status_check` to get the details from the Database entries.
- [X] Wrote tests for `Analyser.py`
- [ ] I personally feel more tests could be added to get a better Coverage of the code.
#### Week Ahead:
This week I'll be experimenting and try to implement the `Consensus Module` or the [`Senser Paper`](https://dl.acm.org/doi/10.1145/2523649.2523669). The basic idea of this paper states that using multiple proxies one could get a rough estimate of a website and it's behavior, which then could be used to our convenience and check if the websites are blocked or not.
Using the above approach one can compare the websites generated from the `Consensus Module` with the websites we get by the `exit relays`. Thereby getting a better results than the `stop words` or `filter list` we are using as of now. Also for the `Consensus Module` to work we need to use different or multiple proxies or vpns to get a rough idea.