Make list of tests that we feel confident deploying on M-Lab and test them
In the brussels meeting we took these notes on the tests that we can deploy on M-Lab:
HTTPHost: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/HTTPHost (link inaccurate as to what it does) How does it work, briefly? Uses Host header to identify transparent proxy when the back-end server is under our control. Connect to an HTTP server running on M-Lab. Send a get request, or /, add the “host field” inside, which consists of something we’re trying to determine is/isn’t censored. If what you get back is a block page, it’s determinate for censorship. Could also lead to a vendor signature. Pending qsts for future work:
What if we cache data (e.g., store the FB homepage)? Would that be a copyright probl? What data must be collected on the M-Lab server and published? HTTP requests being made and the responses. What data does it gather from the client? The requests the client makes. What data does it gather on the server? The requests the server receives. What is logged on the client? NA Who has access to client’s logs? NA
Two Way Traceroute: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/TwoWayTraceroute How does it work, briefly? Multiprotocol multiport traceroute to detect discrepancies in paths based on source/destination port and protocol. It is performed both from the client to the backend and from the backend to the client.Traceroute per protocol/port pair. What data must be collected on M-Lab and published? Traceroute per protocol/port pair. What data does it gather from the client? Traceroutes. What data does it gather on the server? Traceroutes. What is logged? NA Who has access to logs? NA Note: may need to mask ip address, first and last hop. Also the client IP address will be present in the IP header embedded in the ICMP Time Exceeded messages sent by each intervening router. This is a TCP/UDP/ICMP traceroute - it performs multi-protocol traces simultaneously on the client side.
Keyword Filtering: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/KeywordFiltering How does it work, briefly? We establish a connection to the backend and over either HTTP or other plain-text protocols we send a set of keywords to be tested for filtering. Censorship may be detected either because the keyword does not reach the backend or the client and/or backend receive a RST packet. The client has a secure channel with the backend to signal the keywords being sent. What data must be collected on M-Lab and published? Keywords that are sent and received, and blocked. What data does it gather from the client? Keywords that were sent. What event triggered censorship (did the packet not arrive at destination, was a RST packet received? Was it received by the client or server or both). What data does it gather on the server? Keywords that were blocked. What is logged? NA Who has access to logs? NA Notes: Very careful to anonymize this.
RST Packet Detection: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/RSTPacketDetection How does it work, briefly? This test is a dependent test, in the sense that it requires some other test to trigger it. We listen for RST packets on both the client and server and try to trigger the censor to block via RST packets. This can be done by injecting keywords into a TCP connection or by contacting certain sites. Open qsts:
Is it possible on mlab to ignore RST packets? What data must be collected on M-Lab and published? RST packets. Requests made that triggered the RST packet. What data does it gather from the client? RST packets. What kind of client request was made to trigger the RST packet. What data does it gather on the server? RST packets. What server side request was done to trigger the RST packet. What is logged? NA Who has access to logs? NA
Daphn3: https://speakerdeck.com/u/hellais/p/ooni-and-daphn3 How does it work, briefly? This is a dependent test. We have a packet that we know is being censored and our objective is to figure out where in the packet the censor is matching to trigger censorship. We create a state machine of client sent messages and server side messages. We start by mutating the first byte and walk through the state machine until a certain mutation does not trigger the censor. We have therefore discovered that the censor is fingerprinting on that byte. What data must be collected on M-Lab and published? The censored packet capture. The fingerprint that the censor is matching against. What data does it gather from the client? What (packet, mutation) pair it received and sent. What data does it gather on the server? What (packet, mutation) pair it received and sent. What is logged? NA Who has access to logs? NA
Header Field Manipulation: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/HeaderFieldManipulation How does it work, briefly? You establish a connection to the backend system. Send some specially crafted HTTP header fields (variating capitalization, add some special HTTP headers, etc.) and detect on the backend if these arrive as they were created by the client. What data must be collected on M-Lab and published? The HTTP requests being created by the client and the requests being received by the server. What data does it gather from the client? The requests it sends. What data does it gather on the server? The requests it receives. What is logged? NA Who has access to logs? NA
Captive Portal: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/CaptivePortal How does it work, briefly? Mimics vendor captive portal tests (e.g. Chrome, Apple, Firefox, MS, etc.), attempts to detect via DNS lookups that should return a bad value, and instead return a login page (in the case of Chrome, etc.) What data must be collected on M-Lab and published? Domains tested and the results. Reverse resolutions. What data does it gather from the client? The resolved domain as seen by the client, and a boolean for whether or not this matched the control result. If it doesn’t exactly match, fuzzy matching and reverse resolutions will be used (this is useful in the case of geolocalized services), and this is specified in the returned results as well. What data does it gather on the server? None, although it could be useful to run the control resolver on M-Lab, which can also be done over TCP over Tor with unbound. What is logged? The domains tested, the sets of resolved IPs from both the experimental DNS resolver and the control resolver, and a boolean for whether or not the there was an intersection between the tests (True, positive intersection means that DNS was not tampered; False, zero intersection means that DNS was tampered.) Who has access to logs? Depending on the Captive Portal test - sometimes upstream servers (MS, Apple, etc) may have log data.
Network Latency https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/Networklatency How does it work, briefly? Measures difference in RTT between different protocols at the sub-100-ms granularity to attempt to identify inline manipulation and/or inspection of data.difference NOT YET IMPLEMENTED BISmark related tests seem relevant here? Potentially we contact Nick directly What data must be collected on M-Lab and published? Packets sent and received and the timings of those. What data does it gather from the client? The exact timing or received and sent packets. The packets being sent and received. What data does it gather on the server? The exact timing or received and sent packets. The packets being sent and received. What is logged? Who has access to logs?
DNS Lookup: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/DNSLookup How does it work, briefly? DNS lookups of to be tested hostnames compared to a to be tested DNS resolver and lookups to a good DNS server (ie, Google 8.8.8.8). Additionally we may want to do DNS queries over a known “clean” channel. What data must be collected on M-Lab and published? Domain names requested and results from experiment and control resolvers. Possibly whether the query was seen by the name server if that’s under our control. What data does it gather from the client? Domain names requested. Results from experiment resolver. Results from control resolver. What data does it gather on the server? If running on slice, gather which queries are seen by the server. What is logged? NA Who has access to logs? NA Note: This could run on a slice if the slice is running a resolver. This is not necessary, however. Tests for which we can store data, but don’t need to run on a slice
Bridge Tor: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/BridgeT How does it work, briefly? A Tor client uses a geoIPdb to determine likelihood of connections to the Tor network being blocked, and then automatically iterates through a set of types of connections to Tor bridges, ranging from ICMP Echo (Ping) to a full Tor protocol connection. What data must be collected on M-Lab and published? Status per bridge per connection type. What kind of measurements were made and their result. What data does it gather from the client? Which Tor bridges were reachable/unreachable using which connection type. The application level view of what request was made (e.x. did an Ping, did a Tor connection). In the case of the Tor test we want to collect the info level log of Tor. What data does it gather on the server? None, there is no server component. However, it could be useful to run a “canary” Tor bridge on a server along with the triggering mechanism for a DPI probe, and then log connections by probes to canaries. What is logged? Which Tor bridges were reachable/unreachable using which connection type. Who has access to logs?
HTTP scan: https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/HTTPScan How does it work, briefly? Queries list of sites to detect blocking. It collects the request being made to the website, the content of the website and it does measurements to determine how different the structure of the page is from the expected page (https://gitweb.torproject.org/ooni-probe.git/blob/HEAD:/ooni/plugins/domclass.py). What data must be collected on M-Lab and published? The content of the sites being contacted, the request being made and the HTTP headers in the response. Eigenvector and value. What data does it gather from the client? The request being made and the response. What data does it gather on the server? None. What is logged? NA Who has access to logs? NA Note: doesn’t need to run on M-Lab server, but could go to server and be stored in M-Lab repository. If we need something running on the server could implement.
Based on this list we should figure out which ones we feel confident deploying on their machines and update the shared google doc providing a link to the implementation of the backend component and the client nettest component.