Update GSoC 2021 authored by hackhard's avatar hackhard
...@@ -89,7 +89,7 @@ A[Fetch url using Tor client] ...@@ -89,7 +89,7 @@ A[Fetch url using Tor client]
B[Fetch url using Non-Tor client] B[Fetch url using Non-Tor client]
end end
subgraph DOM Analysis subgraph DOM Analysis
E;E1;E2;G;G1;G2;H E;E1;E2;G;G1;G2;H;H2;X
end end
A[Fetch url using Tor client] --> C{Is the <br> status code <br> same?}--YES: <br> No redirection, DOM checks required-->D1[Remove GDPR popups] A[Fetch url using Tor client] --> C{Is the <br> status code <br> same?}--YES: <br> No redirection, DOM checks required-->D1[Remove GDPR popups]
B[Fetch url using Non-Tor client] -->C--NO--->D{check whether <br> tor returns 4xx or 5xx <br>error codes}--YES---------->F(Tor Block Error) B[Fetch url using Non-Tor client] -->C--NO--->D{check whether <br> tor returns 4xx or 5xx <br>error codes}--YES---------->F(Tor Block Error)
...@@ -99,10 +99,11 @@ E-.->E2[Consensus Module] ...@@ -99,10 +99,11 @@ E-.->E2[Consensus Module]
E-.....->E3[Captcha Check] %% This will use the fact of the request path containing captcha in the url itself E-.....->E3[Captcha Check] %% This will use the fact of the request path containing captcha in the url itself
E4{If the <br> Redirected website <br>returns error}-.yes.->F E4{If the <br> Redirected website <br>returns error}-.yes.->F
E1-->G{if <br> score > 0} E1-->G{if <br> score > 0}
G--YES-->G1{if <br> score >k}--YES-->H(Tor returns Error) %% For most cases it returns error or it might be possible that the page hasn't been loaded. G--NO, if score = 0--->X(Matched!<br>no errors)
G--YES-->G1{if <br> score > K%}--YES---->H2(Tor returns Error <br> or in some cases, denotes a different page) %% For most cases it returns error or it might be possible that the page hasn't been loaded.
G1--NO-->G2(Filter list) G1--NO-->G2(Filter list)
G1-.No-..->E2 G1-.No-..->E2
G--NO---->H[Denotes Pop-ups, <br>or in some cases <br>when the difference is lot <br>denotes another page.] G--NO----->H[Denotes Pop-ups, <br>or in some cases <br>when the difference is lot in negative terms- tbb > nbb<br>denotes another page, which might have more DOM nodes.]
A------------->Q(Websites without error, but different pages) A------------->Q(Websites without error, but different pages)
...@@ -114,8 +115,6 @@ style E3 stroke-width:1px,stroke-dasharray: 5 8 ...@@ -114,8 +115,6 @@ style E3 stroke-width:1px,stroke-dasharray: 5 8
style E2 stroke-width:1px,stroke-dasharray: 5 8 style E2 stroke-width:1px,stroke-dasharray: 5 8
style F stroke-width:3px,fill:#f04 style F stroke-width:3px,fill:#f04
``` ```
+ Use of dashed lines and boxes show the things that haven't been implemented as of now. + Use of dashed lines and boxes show the things that haven't been implemented as of now.
+ As of now K has been set to 150 (Experimental Analysis) + As of now K has been set to 150 (Experimental Analysis)
... ...
......