Update GSoC 2021 authored by hackhard's avatar hackhard
......@@ -90,9 +90,9 @@ end
subgraph DOM Analysis
E;E1;E2;G;G1;G2;H
end
A[Fetch url using Tor client] --> C{Is the <br> status code <br> same?}--Yes---No&nbspredirection,&nbspDOM&nbspchecks&nbsprequired-->D1[Remove GDPR popups]
A[Fetch url using Tor client] --> C{Is the <br> status code <br> same?}--YES: <br> No redirection, DOM checks required-->D1[Remove GDPR popups]
B[Fetch url using Non-Tor client] -->C--NO--->D{check whether <br> tor returns 4xx or 5xx <br>error codes}--YES---------->F(Tor Block Error)
D--NO---Could&nbspbe&nbspGDPR,&nbspredirection,&nbspCaptcha-->E4-.No-->D1-->E[/Additional Tests <br> DOM checks/]
D--NO: <br> Could be GDPR, redirection, Captcha-->E4-.No-->D1-->E[/Additional Tests <br> DOM checks/]
E-->E1[DOM Checks <br> Percentage of differnce in DOM nodes]
E-.->E2[Consensus Module]
E-.....->E3[Captcha Check] %% This will use the fact of the request path containing captcha in the url itself
......@@ -102,7 +102,9 @@ G--YES-->G1{if <br> score >k}--YES-->H(Tor returns Error) %% For most cases it r
G1--NO-->G2(Filter list)
G1-.No-..->E2
G--NO---->H[Denotes Pop-ups, <br>or in some cases <br>when the difference is lot <br>denotes another page.]
A------------->Q(Website's without error, but different pages)
A------------->Q(Websites without error, but different pages)
click Q href "http://www.dominos.com" _blank
style D1 stroke-width:1px,stroke-dasharray: 5 8
......@@ -110,8 +112,10 @@ style E4 stroke-width:1px,stroke-dasharray: 5 8
style E3 stroke-width:1px,stroke-dasharray: 5 8
style E2 stroke-width:1px,stroke-dasharray: 5 8
style F stroke-width:3px,fill:#f04
```
```
+ Use of dashed lines and boxes show the things that haven't been implemented as of now.
+ As of now K has been set to 150 (Experimental Analysis)
+ The Captcha Checking Module has been proposed recently, which enables the use of "captcha" in the requests path from the responses we get while we load a website.
......
......