AMP cache rendezvous
This merge request adds a -ampcache
command-line option to the Snowflake client,
and an /amp/client/ route to the broker.
The client can register itself via an AMP cache,
using AMP-specific data encodings,
as an alternative to domain fronting.
Implements #25985 (closed).
Cc @twim.
Demo
I've set up a temporary AMP-capable broker at https://snowflake.rinsed-tinsel.site/, and a standalone proxy polling it. To try it, create a torrc.ampcache file:
DataDirectory datadir
UseBridges 1
Bridge snowflake 192.0.2.3:1
ClientTransportPlugin snowflake exec ./client \
-url https://snowflake.rinsed-tinsel.site/ \
-ampcache https://cdn.ampproject.org/ \
-front www.google.com \
-ice stun:stun.voip.blackberry.com:3478,stun:stun.altar.com.pl:3478,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.com:3478,stun:stun.sonetel.net:3478,stun:stun.stunprotocol.org:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478 \
-max 1 \
-log snowflake.log
Then run tor:
$ tor -f torrc.ampcache SocksPort 9250
You should see in snowflake.log:
2021/07/19 18:03:05 Negotiating via AMP cache rendezvous...
2021/07/19 18:03:05 Broker URL: https://snowflake.rinsed-tinsel.site/
2021/07/19 18:03:05 AMP cache URL: https://cdn.ampproject.org/
2021/07/19 18:03:05 Front domain: www.google.com
Overview
An AMP cache works like an HTTP proxy with restrictions:
- Only GET requests work, not POST.
- Responses must be written in a restricted dialect of HTML. The AMP cache validates the server response and will not send it to the requester if it does not conform to AMP requirements.
- The AMP cache can modify responses in certain ways, such as by normalizing HTML and compressing images.
Here is what happens in an AMP cache rendezvous, step by step.
For the client logic, see ampCacheRendezvous.Exchange
.
For the broker logic, see ampClientOffers
.
- The client forms an encoded client poll message (just as in any other form of rendezvous).
1.0\n{"offer":"{\"type\":\"offer\",\"sdp\":\"...\"}","nat":"unrestricted"}
- The client calls
amp.EncodePath
to further encode the client poll message into a broker URL, under the /amp/client/ route.https://snowflake-broker.torproject.net/amp/client/0Sh2kIitCf34O/MS4wCnsib2ZmZXI iOiJ7XCJ0eXBlXCI6XCJvZmZlclwiLFwic2RwXCI6XCIuLi5cIn0iLCJuYXQiOiJ1bnJlc3RyaWN0ZWQ ifQ
0
is a format version number for the rest of the path,Sh2kIitCf34O
is random padding to prevent cache collisions, andMS4w...
is the base64 of the client poll request message. - The client calls
amp.CacheURL
to make it relative to the AMP cache, using the AMP cache URL format. See how the domain changes to a subdomain of cdn.ampproject.org.https://snowflake--broker-torproject-net.cdn.ampproject.org/c/s/snowflake-broker .torproject.net/amp/client/0Sh2kIitCf34O/MS4wCnsib2ZmZXIiOiJ7XCJ0eXBlXCI6XCJvZmZ lclwiLFwic2RwXCI6XCIuLi5cIn0iLCJuYXQiOiJ1bnJlc3RyaWN0ZWQifQ
- The client domain-fronts its request to hide the URL domain,
which would reveal the domain of the broker being accessed through the cache.
It changes the domain to something like
www.google.com
, but leaves the Host header set tosnowflake--broker-torproject-net.cdn.ampproject.org
.https://www.google.com/c/s/snowflake-broker.torproject.net/amp/client/0Sh2kIitCf 34O/MS4wCnsib2ZmZXIiOiJ7XCJ0eXBlXCI6XCJvZmZlclwiLFwic2RwXCI6XCIuLi5cIn0iLCJuYXQi OiJ1bnJlc3RyaWN0ZWQifQ
- The client sends its request to the AMP cache,
the cache forwards the request to the broker,
and the broker receives it.
The broker calls
amp.DecodePath
to recover the client poll message from the URL path, passes the message toIPC.ClientOffers
, and forms a client poll response message:{"answer":"{\"type\":\"answer\",\"sdp\":\"...\"}"}
- The broker encodes the client poll response messages as AMP HTML
using AMP armor
encoding, and sends it back in an HTTP response to the AMP cache:
<!doctype html> <html amp> <head> <meta charset="utf-8"> <script async src="https://cdn.ampproject.org/v0.js"></script> <link rel="canonical" href="#"> <meta name="viewport" content="width=device-width"> <style amp-boilerplate>body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 no rmal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animati on:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,e nd) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{vis ibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility :visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible }}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyfra mes -amp-start{from{visibility:hidden}to{visibility:visible}}</style><noscript>< style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animat ion:none;animation:none}</style></noscript> </head> <body> <pre> 0eyJhbnN3ZXIiOiJ7XCJ0eXBlXCI6XCJ hbnN3ZXJcIixcInNkcFwiOlwiLi4uXCJ 9In0= </pre> </body> </html>
<link>
,<meta>
,<style>
, etc. at the top is required AMP boilerplate. The actual data is encoded into<pre>
elements. The first byte0
is a format version number for the rest of the data.eyJh...
is the base64 of the client poll response message. - The AMP cache receives the broker's response, applies a lot of transformations
to the HTML that nevertheless leaves the contents of the
<pre>
elements intact, and forward the response to the client. - The client searches for
<pre>
elements, extracts and decodes their contents, and recovers the client poll response message.
The process is similar to the existing domain-fronted rendezvous, except that the client sends its request message in the URL path, rather than in an HTTP POST body, and the broker sends its response message as AMP-armored HTML.
Walkthrough of commits
Refactoring
dcf/snowflake@db243b91
improves some BrokerChannel
tests to use a non-empty Host
component in the broker URL.
This is to improve the tests for the next commit.
dcf/snowflake@41c70f63
changes BrokerChannel
's internal representation of
domain fronting into a form that makes more sense to me.
dcf/snowflake@b706e9c7
makes BrokerChannel
abstract over rendezvous methods.
The url
, front
, and transport
fields of BrokerChannel
are removed and placed into a new httpRendezvous
type.
AMP-related support
dcf/snowflake@cd56247a
adds AMP-related support code, copied from my AMP cache tunnel.
This includes the CacheURL
function that modifies a URL to be accessed through an AMP cache,
EncodePath
and DecodePath
,
and the AMP armor encoder and decoder.
-ampcache
option
Client dcf/snowflake@06ca0e86
adds an -ampcache
client command-line option
and an ampCacheRendezvous
type,
which at this point functions the same as
httpRendezvous
.
dcf/snowflake@a2af6fb4
makes -ampcache
and ampCacheRendezvous
actually use AMP encoding.
Broker /amp/client route
dcf/snowflake@f7505cc2
adds a new ampClientOffers
function, which is like the existing clientOffers
,
but it takes the client poll request from the URL path
and writes the client poll response as AMP HTML.
Documentation
dcf/snowflake@010f11c8
adds information about registration methods to the client README:
-url
/-front
for domain fronting;
-url
/-ampcache
/-front
for AMP cache.
Discussion
The AMP cache rendezvous happens to coexist nicely with domain fronting rendezvous
with respect to command-line options,
because it needs the same two pieces of information (-url
and -front
),
in addition to the new -ampcache
.
But conceptually, other forms of rendezvous (like DNS)
may need completely different information.
We may end up with a lot of options if we add a new option for
every parameter of every registration method—and the program will need to check
that no two options of different methods are used at the same time.
@twim's branch split out an overarching
-codec
option
(e.g. -codec=post
, -codec=amp
), but it was still dependent on
-url
and -front
options, and there was no option for controlling the AMP cache URL.
Similarly, NewSnowflakeClient
and NewBrokerChannel
effectively take all the rendezvous-related command-line options
as string arguments.
If we add more rendezvous methods, we may want to factor this out
into a Config
struct or similar.
Alternatively, the rendezvous method
(which at this point is a private type, rendezvousMethod
)
could be specified at a higher level
and be passed into NewSnowflakeClient
or NewBrokerChannel
.
The change to NewSnowflakeClient
's signature requires a major version upgrade
to v2, I believe.
The decision of what rendezvous method to use is made
here, in NewBrokerChannel
.
The function looks at its broker
, ampCache
, and front
arguments,
and decides whether to use httpRendezvous
or ampCacheRendezvous
.
ampClientOffers
needs to know its own /amp/client/ route,
in order to know where to start looking for information encoded in the URL path.
One way to avoid that would be to change the path encoding
to only look at the final path component; i.e., everything after the final slash.
As it is, the version 0
encoding uses a slash as a delimiter
for where the actual data starts, but the encoding begins before that point.
Although the -ampcache
option lets you specify the URL of any AMP cache,
there's only one AMP cache that actually works, which is https://cdn.ampproject.org/.
There used to be a Cloudflare one at https://amp.cloudflare.com/,
but it doesn't work anymore.
Bing's AMP cache doesn't fetch origin pages ("publisher pages") on demand.