Update GSoC 2021 authored by hackhard's avatar hackhard
...@@ -76,3 +76,43 @@ Blog: https://hackhard.github.io/my-blog/ ...@@ -76,3 +76,43 @@ Blog: https://hackhard.github.io/my-blog/
If you have any queries, feedback regarding the project you could reach me on the tor channels: (the #tor-dev or #tor-project channels on [OFTC](https://webchat.oftc.net/?channels=tor) IRC). My IRC handle being **\_ranchak\_** If you have any queries, feedback regarding the project you could reach me on the tor channels: (the #tor-dev or #tor-project channels on [OFTC](https://webchat.oftc.net/?channels=tor) IRC). My IRC handle being **\_ranchak\_**
Also you can also reach me out at: <abishekhmjee(at)gmail(dot)com> Also you can also reach me out at: <abishekhmjee(at)gmail(dot)com>
## Updated Logic:
```mermaid
%% Enable JS to see this
%% Use of dashed lines and boxes show the things that haven't been implemented as of now.
graph TD
subgraph Fetch
A[Fetch url using Tor client]
B[Fetch url using Non-Tor client]
end
subgraph DOM Analysis
E;E1;E2;G;G1;G2;H
end
A[Fetch url using Tor client] --> C{Is the <br> status code <br> same?}--Yes---No&nbspredirection,&nbspDOM&nbspchecks&nbsprequired-->D1[Remove GDPR popups]
B[Fetch url using Non-Tor client] -->C--NO--->D{check whether <br> tor returns 4xx or 5xx <br>error codes}--YES---------->F(Tor Block Error)
D--NO---Could&nbspbe&nbspGDPR,&nbspredirection,&nbspCaptcha-->E4-.No-->D1-->E[/Additional Tests <br> DOM checks/]
E-->E1[DOM Checks <br> Percentage of differnce in DOM nodes]
E-.->E2[Consensus Module]
E-.....->E3[Captcha Check] %% This will use the fact of the request path containing captcha in the url itself
E4{If the <br> Redirected website <br>returns error}-.yes.->F
E1-->G{if <br> score >0}
G--YES-->G1{if <br> score >k}--YES-->H(Tor returns Error) %% For most cases it returns error or it might be possible that the page hasn't been loaded.
G1--NO-->G2(Filter list)
G1-.No-..->E2
G--NO---->H[Denotes Pop-ups, <br>or in some cases <br>when the difference is lot <br>denotes another page.]
A------------->Q(Website's without error, but different pages)
click Q href "http://www.dominos.com" _blank
style D1 stroke-width:1px,stroke-dasharray: 5 8
style E4 stroke-width:1px,stroke-dasharray: 5 8
style E3 stroke-width:1px,stroke-dasharray: 5 8
style E2 stroke-width:1px,stroke-dasharray: 5 8
style F stroke-width:3px,fill:#f04
```
+ Use of dashed lines and boxes show the things that haven't been implemented as of now.
+ As of now K has been set to 150 (Experimental Analysis)
+ The Captcha Checking Module has been proposed recently, which enables the use of "captcha" in the requests path from the responses we get while we load a website.
+ I've also proposed to check the final status code for a website redirected to another ([Mastercard](https://www.mastercard.de/) redirects to [this link](https://www.mastercard.de/de-de.html), and thus checking status code for "this link". Similar example would be for http://adsabs.harvard.edu/ which redirects to [here1](https://ui.adsabs.harvard.edu/) but for tor it redirects sometimes to [here2](http://adsabs.harvard.edu/cgi-bin/access_denied).
\ No newline at end of file