Team issueshttps://gitlab.torproject.org/tpo/anti-censorship/team/-/issues2023-11-15T11:46:08Zhttps://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/13Understand the "long tail" of unclassifiable network traffic2023-11-15T11:46:08ZPhilipp Winterphw@torproject.orgUnderstand the "long tail" of unclassifiable network trafficThe obfs family of obfuscation protocols strives to "look like nothing" and falls into the long tail of network traffic that is meant to be unclassifiable. That is, if an ISP is monitoring its uplink, it shouldn't be able to figure out t...The obfs family of obfuscation protocols strives to "look like nothing" and falls into the long tail of network traffic that is meant to be unclassifiable. That is, if an ISP is monitoring its uplink, it shouldn't be able to figure out that one of its users is talking obfs4 to a Tor bridge. Instead, the obfs4 connection should show up as "unknown" in the log files.
We know next to nothing about this long tail that the obfs family hides in. What fraction of flows does it constitute? What fraction of bytes? What kind of protocols and implementations are difficult to classify? How does the long tail differ across uplinks?
Over at legacy/trac#30716 we're brainstorming features for obfs4's successor but before moving forward with obfs5, we should get a better understanding of this long tail because it allows us to make informed design decisions. Packet traces from the [WIDE backbone](http://mawi.wide.ad.jp/mawi/) is one of the data sets that may be helpful here.
Let's use this ticket to track progress and collect insights.https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/18Improve the PT spec and how PTs interface with Tor2022-02-10T19:59:17ZCecylia BocovichImprove the PT spec and how PTs interface with TorWe want to make it easier for developers (and academics) to design and implement new pluggable transports and get them easily integrated with Tor so that we can have a well-functioning PT integration pipeline.
This is a large project th...We want to make it easier for developers (and academics) to design and implement new pluggable transports and get them easily integrated with Tor so that we can have a well-functioning PT integration pipeline.
This is a large project that will consist of several things:
- We need to assess pain points with the current PT spec and desired features from a variety of PT developers.
- We might want to take a look at the PTv2 specification to see where features differ from our v1 and also which features seem to be liked or used by PT developers.
- We should think about how bridge distribution should factor into the PT specification. For example, some transports such as meek and snowflake handle "bridge" information differently than transports whose bridges are distributed through BridgeDB. This results in a different interaction with Tor, and we might consider modifying the spec with the snowflake/broker model in mind (ticket legacy/trac#29296).
In general, we should improve our communication with the pluggable transports community to see what they need and figure out how to get more PTs integrated with Tor.https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/19Intent to create Pluggable Transport: HTTPS proxy2021-07-29T15:06:08ZTracIntent to create Pluggable Transport: HTTPS proxy# httpsproxy
HTTP CONNECT method is one of the standard ways to proxy internet traffic, which is used both in [HTTP/1.1](https://tools.ietf.org/html/rfc2616#section-9.9) and [HTTP/2](https://http2.github.io/http2-spec/#CONNECT). HTTPS tr...# httpsproxy
HTTP CONNECT method is one of the standard ways to proxy internet traffic, which is used both in [HTTP/1.1](https://tools.ietf.org/html/rfc2616#section-9.9) and [HTTP/2](https://http2.github.io/http2-spec/#CONNECT). HTTPS traffic is very popular on the web, and pluggable transports could benefit from this fact. There's very high collateral damage that would result from full HTTPS blocking, and it adds diversity to PTs’ shapes because most current PTs do not resemble HTTPS.
Usage of HTTPS proxies also helps with active probing: a proxy can be an actual web server that serves content, as opposed to circumvention technologies, that don't show any apparent collateral damage nor respond in any way, when probed. To a prober that doesn't have correct credentials, httpsproxy server can look like a real web server, if it is a real web server.
## Way to use it HTTPS proxies with Tor
### Naive proxy
Given correct credentials, user can request any standard forwardproxy on the web to connect to Tor. Client establishes TLS connection to the web proxy, and sends request in a form of
```
CONNECT 0.1.2.3:9001 HTTP/1.1
Host: 0.1.2.3
Proxy-Authorization: Basic dXNlcjpwYXNz
```
where 0.1.2.3:9001 is address of arbitrary vanilla Tor entry node. Web Server would establish tcp connection to this address and relay subsequent traffic to it.
Such an approach allows us to use a diverse set of standard proxies: a webproxy is easy to set up and does not need to speak Tor. However, the web proxy operator will likely want to whitelist Tor entrance nodes in order to prevent abuse. As such, they would benefit from talking to some sort of https-proxy-authority, which would provide an entrance node(s) to whitelist, and allow proxies to let Tor Project know that their servers could be used as a proxy.
While lack of server-side PT makes it easier to deploy, it also means we cannot collect metrics.
### Full Bridge
A full bridge runs a Tor entry node, a pluggable transport and an upstreaming frontend webserver. The upstreaming webserver would check credentials, and, instead of consuming CONNECT requests, it would upstream them into the pluggable transport ExtORPort, while also stapling client’s IP to it in a header. The PT would parse the IP from the HTTP request header, and pass it to ExtORPort, thus enabling metrics collection.
## Registering with BridgeDB
As it currently stands, bridges have to have an ORPort open to be registered with BridgeDB legacy/trac#7349
This leads to easy identification and blocking of bridges. However, we can still register bridge lines with BridgeDB, if we add an additional hop to an intermediate proxy before entering a bridge. A censor would only be able to observe the address of the intermediate proxy.
Having such a 2-hop setup is a natural property of Naive Proxy, as described above. Bridge line example:
```
httpsproxy [vanilla entry addr] [entry fingerprint] url=https://username:password@naiveproxy.org
```
We can use 2-hop approach with full bridges as well: the intermediate proxy would forward HTTP request (preferably with client IP in “Forwarded: for=IP:port” header). In this case, intermediate proxy just redirects all requests (as long as credentials are correct) to the chosen full bridge(s), which is essentially a reverse proxy -- a widely supported technology.
While the second hop adds overhead, there's a benefit in not requiring would-be proxy operators to run a full bridge, since configuration of a proxy now becomes substantially easier, and, ideally, would amount to adding a few lines to a web server config file and registering themselves w/ bridgeDB via some script. Not requiring them to install, configure and run both PT and Tor daemons may allow us to attract a bigger amount of volunteers for the entrance servers.
However it’s unclear which party and how would actually register the bridge line. Perhaps, a separate https-proxy-authority could do that (and provide web proxies with entries to use)
## Current prototype
Works with standard HTTP/1.1 and HTTP/2.0 proxies with both naive proxies and full bridges. If there's an interest in seeing current prototype, I would gladly share it, @dcf already created ticket for the repo creation legacy/trac#26793.
### Language
Both client and server are implemented in Golang. Relatively safe, cross-platform language.
### Overhead
Bandwidth overhead depends on aggressiveness of padding, but I would not expect goodput to drop below 80%, especially for high-bandwidth workloads, which should mostly consist of MTU-sized packets. Detailed evaluation would be done after padding is implemented.
Computational overhead amounts to TLS handshake per flow plus the usual connection management.
## Fingerprinting
Running a real web server helps, however there are multiple potential fingerprintabilities. Those include:
### Probing web server with proxy requests without a secret
By default, web servers with this sort of forward proxying enabled will respond to unauthenticated proxy requests with “407 Proxy Authentication Required”, whereas a web server without forwardproxying enabled will respond differently, stating that it's not a proxy and doesn't want your CONNECT requests.
It would be beneficial to hide the fact of proxying (although note that this doesn't give out proxy as a Tor proxy, just that forward proxying is enabled). This feature is already supported by [Caddy web server](https://github.com/caddyserver/forwardproxy/blob/master/README.md#caddyfile-syntax-server-configuration) (see "probe_resistance" option), which is used for the current implementation.
### TLS ClientHello fingerprinting
meek has been blocked before based on its TLS ClientHello at least twice. There is a library called [utls](https://github.com/refraction-networking/utls) that provides the ability to mimic arbitrary ClientHello messages. It uses real world data from https://tlsfingerprint.io/ to learn what it should mimic based on provided collateral damage, and allows developers to confirm the correctness of their mimicking. In the event of any particular "fingerprint" being blocked or incorrectly mimicked, this transport would use multiple "fingerprints" and cycle through them until an unblocked one is found.
### Other TLS fingerprinting
Evaluation of other TLS handshake messages and TLS records, and how they may differ from mimicked implementations remains a TODO.
### Traffic Size Patterns
The current prototype doesn't use padding yet, and traces generated by it look extremely fingerprintable by constantly generating packets of size CELL_SIZE * N + constant overhead.
We intend to address this problem shortly by splitting and padding http/2 frames to resemble common web traffic.
There is no standard way to pad http/1.1 that will work with standard web proxies, but we can probably split the cells.
### Connection establishment traffic patterns
This is especially relevant to 2-hop approaches: the client might have to wait for the first response for a long time, while the proxy establishes connection. This is an issue for many proxies, which is also possible to solve, just noting it requires attention and solution.
### Connection lifetime
Being connected to the same server for prolonged periods of time (HTTPS tunnel may work fine for hours, if not days) could be a distinguishing feature. Client should redial at least once an hour. TODO
**Trac**:
**Username**: sfhttps://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/45Proposal: Push Notification Based Signaling Channel2022-01-10T18:54:43ZshelikhooProposal: Push Notification Based Signaling ChannelProposal: Push Notification Based Signaling Channel
Modern [Operating Systems](https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/APNSOverview.html#//apple_ref/doc/uid/TP40008194...Proposal: Push Notification Based Signaling Channel
Modern [Operating Systems](https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/APNSOverview.html#//apple_ref/doc/uid/TP40008194-CH8-SW1) and [Browsers](https://caniuse.com/push-api) support push notifications that can include data in their payload. This provides a uni-directional communication channel from the client to the server if the client does not have a push receiver running, or a bidirectional communication channel if the client has a push receiver running.
As we have seen in recent block [happened](https://gitlab.torproject.org/tpo/community/support/-/issues/40050) in Russia, even with domain fronting, adversaries are able to block fronting domains without significant consequence as in the case of [meek-azure](https://ntc.party/t/ooni-reports-of-tor-blocking-in-certain-isps-since-2021-12-01/1477/3). We might want to diversify our signalling channels to prevent blocking
## Advantages
### Collateral Damage Maximization
These pushing channels have no substitutions and offer a functionality that is observable to users. Once the pushing service is blocked, all apps(with servers hosted in the region) and websites will be influenced.
### Asymmetrical
When WebPush is used, a single codebase that interacts with a browser in a standardized way can be used on all browsers. The adversary will need to block a significant amount of service while we don't need to do anything vendor-specific.
### Plausible Fingerprint
For a push notification sender, it is expected to interact with applications that send requests from a non-browser environment. For push receivers, the proxy software won't interact with push service directly, thus no chance of being recognized for fingerprint.
## Disadvantages
### Special Environment for Receiver
Push notification receiver requires a running operating system or browser. This means it would be difficult to ship it with the client. The message from server to client may need an alternative channel such as to reply with source address forging.
### Special Setup Requirement
iOS-based push notification requires an Apple Developer account. This is not required by Web Push in most cases.