A[Fetch url using Tor client] --> C{Is the <br> status code <br> same?}--Yes---No redirection, DOM checks required-->D1[Remove GDPR popups]
A[Fetch url using Tor client] --> C{Is the <br> status code <br> same?}--YES: <br> No redirection, DOM checksrequired-->D1[Remove GDPR popups]
B[Fetch url using Non-Tor client] -->C--NO--->D{check whether <br> tor returns 4xx or 5xx <br>error codes}--YES---------->F(Tor Block Error)
D--NO---Could be GDPR, redirection, Captcha-->E4-.No-->D1-->E[/Additional Tests <br> DOM checks/]
D--NO: <br> Could be GDPR, redirection,Captcha-->E4-.No-->D1-->E[/Additional Tests <br> DOM checks/]
E-->E1[DOM Checks <br> Percentage of differnce in DOM nodes]
E-.->E2[Consensus Module]
E-.....->E3[Captcha Check] %% This will use the fact of the request path containing captcha in the url itself
...
...
@@ -102,7 +102,9 @@ G--YES-->G1{if <br> score >k}--YES-->H(Tor returns Error) %% For most cases it r
G1--NO-->G2(Filter list)
G1-.No-..->E2
G--NO---->H[Denotes Pop-ups, <br>or in some cases <br>when the difference is lot <br>denotes another page.]
A------------->Q(Website's without error, but different pages)
A------------->Q(Websites without error, but different pages)
+ Use of dashed lines and boxes show the things that haven't been implemented as of now.
+ As of now K has been set to 150 (Experimental Analysis)
+ The Captcha Checking Module has been proposed recently, which enables the use of "captcha" in the requests path from the responses we get while we load a website.