Intent to create Pluggable Transport: HTTPS proxy

httpsproxy

HTTP CONNECT method is one of the standard ways to proxy internet traffic, which is used both in HTTP/1.1 and HTTP/2. HTTPS traffic is very popular on the web, and pluggable transports could benefit from this fact. There's very high collateral damage that would result from full HTTPS blocking, and it adds diversity to PTs’ shapes because most current PTs do not resemble HTTPS.

Usage of HTTPS proxies also helps with active probing: a proxy can be an actual web server that serves content, as opposed to circumvention technologies, that don't show any apparent collateral damage nor respond in any way, when probed. To a prober that doesn't have correct credentials, httpsproxy server can look like a real web server, if it is a real web server.

Way to use it HTTPS proxies with Tor

Naive proxy

Given correct credentials, user can request any standard forwardproxy on the web to connect to Tor. Client establishes TLS connection to the web proxy, and sends request in a form of

CONNECT 0.1.2.3:9001 HTTP/1.1
Host: 0.1.2.3
Proxy-Authorization: Basic dXNlcjpwYXNz

where 0.1.2.3:9001 is address of arbitrary vanilla Tor entry node. Web Server would establish tcp connection to this address and relay subsequent traffic to it.

Such an approach allows us to use a diverse set of standard proxies: a webproxy is easy to set up and does not need to speak Tor. However, the web proxy operator will likely want to whitelist Tor entrance nodes in order to prevent abuse. As such, they would benefit from talking to some sort of https-proxy-authority, which would provide an entrance node(s) to whitelist, and allow proxies to let Tor Project know that their servers could be used as a proxy.

While lack of server-side PT makes it easier to deploy, it also means we cannot collect metrics.

Full Bridge

A full bridge runs a Tor entry node, a pluggable transport and an upstreaming frontend webserver. The upstreaming webserver would check credentials, and, instead of consuming CONNECT requests, it would upstream them into the pluggable transport ExtORPort, while also stapling client’s IP to it in a header. The PT would parse the IP from the HTTP request header, and pass it to ExtORPort, thus enabling metrics collection.

Registering with BridgeDB

As it currently stands, bridges have to have an ORPort open to be registered with BridgeDB legacy/trac#7349 (moved) This leads to easy identification and blocking of bridges. However, we can still register bridge lines with BridgeDB, if we add an additional hop to an intermediate proxy before entering a bridge. A censor would only be able to observe the address of the intermediate proxy.

Having such a 2-hop setup is a natural property of Naive Proxy, as described above. Bridge line example:

httpsproxy [vanilla entry addr] [entry fingerprint] url=https://username:password@naiveproxy.org

We can use 2-hop approach with full bridges as well: the intermediate proxy would forward HTTP request (preferably with client IP in “Forwarded: for=IP:port” header). In this case, intermediate proxy just redirects all requests (as long as credentials are correct) to the chosen full bridge(s), which is essentially a reverse proxy -- a widely supported technology.

While the second hop adds overhead, there's a benefit in not requiring would-be proxy operators to run a full bridge, since configuration of a proxy now becomes substantially easier, and, ideally, would amount to adding a few lines to a web server config file and registering themselves w/ bridgeDB via some script. Not requiring them to install, configure and run both PT and Tor daemons may allow us to attract a bigger amount of volunteers for the entrance servers.

However it’s unclear which party and how would actually register the bridge line. Perhaps, a separate https-proxy-authority could do that (and provide web proxies with entries to use)

Current prototype

Works with standard HTTP/1.1 and HTTP/2.0 proxies with both naive proxies and full bridges. If there's an interest in seeing current prototype, I would gladly share it, @dcf already created ticket for the repo creation legacy/trac#26793 (moved).

Language

Both client and server are implemented in Golang. Relatively safe, cross-platform language.

Overhead

Bandwidth overhead depends on aggressiveness of padding, but I would not expect goodput to drop below 80%, especially for high-bandwidth workloads, which should mostly consist of MTU-sized packets. Detailed evaluation would be done after padding is implemented. Computational overhead amounts to TLS handshake per flow plus the usual connection management.

Fingerprinting

Running a real web server helps, however there are multiple potential fingerprintabilities. Those include:

Probing web server with proxy requests without a secret

By default, web servers with this sort of forward proxying enabled will respond to unauthenticated proxy requests with “407 Proxy Authentication Required”, whereas a web server without forwardproxying enabled will respond differently, stating that it's not a proxy and doesn't want your CONNECT requests. It would be beneficial to hide the fact of proxying (although note that this doesn't give out proxy as a Tor proxy, just that forward proxying is enabled). This feature is already supported by Caddy web server (see "probe_resistance" option), which is used for the current implementation.

TLS ClientHello fingerprinting

meek has been blocked before based on its TLS ClientHello at least twice. There is a library called utls that provides the ability to mimic arbitrary ClientHello messages. It uses real world data from https://tlsfingerprint.io/ to learn what it should mimic based on provided collateral damage, and allows developers to confirm the correctness of their mimicking. In the event of any particular "fingerprint" being blocked or incorrectly mimicked, this transport would use multiple "fingerprints" and cycle through them until an unblocked one is found.

Other TLS fingerprinting

Evaluation of other TLS handshake messages and TLS records, and how they may differ from mimicked implementations remains a TODO.

Traffic Size Patterns

The current prototype doesn't use padding yet, and traces generated by it look extremely fingerprintable by constantly generating packets of size CELL_SIZE * N + constant overhead.

We intend to address this problem shortly by splitting and padding http/2 frames to resemble common web traffic. There is no standard way to pad http/1.1 that will work with standard web proxies, but we can probably split the cells.

Connection establishment traffic patterns

This is especially relevant to 2-hop approaches: the client might have to wait for the first response for a long time, while the proxy establishes connection. This is an issue for many proxies, which is also possible to solve, just noting it requires attention and solution.

Connection lifetime

Being connected to the same server for prolonged periods of time (HTTPS tunnel may work fine for hours, if not days) could be a distinguishing feature. Client should redial at least once an hour. TODO

Trac:
Username: sf