Skip to content

AMP cache rendezvous

David Fifield requested to merge dcf/snowflake:ampcache into main

This merge request adds a -ampcache command-line option to the Snowflake client, and an /amp/client/ route to the broker. The client can register itself via an AMP cache, using AMP-specific data encodings, as an alternative to domain fronting.

Implements #25985 (closed).

Cc @twim.

Demo

I've set up a temporary AMP-capable broker at https://snowflake.rinsed-tinsel.site/, and a standalone proxy polling it. To try it, create a torrc.ampcache file:

DataDirectory datadir
UseBridges 1
Bridge snowflake 192.0.2.3:1

ClientTransportPlugin snowflake exec ./client \
-url https://snowflake.rinsed-tinsel.site/ \
-ampcache https://cdn.ampproject.org/ \
-front www.google.com \
-ice stun:stun.voip.blackberry.com:3478,stun:stun.altar.com.pl:3478,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.com:3478,stun:stun.sonetel.net:3478,stun:stun.stunprotocol.org:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478 \
-max 1 \
-log snowflake.log

Then run tor:

$ tor -f torrc.ampcache SocksPort 9250

You should see in snowflake.log:

2021/07/19 18:03:05 Negotiating via AMP cache rendezvous...
2021/07/19 18:03:05 Broker URL: https://snowflake.rinsed-tinsel.site/
2021/07/19 18:03:05 AMP cache URL: https://cdn.ampproject.org/
2021/07/19 18:03:05 Front domain: www.google.com

Overview

An AMP cache works like an HTTP proxy with restrictions:

  • Only GET requests work, not POST.
  • Responses must be written in a restricted dialect of HTML. The AMP cache validates the server response and will not send it to the requester if it does not conform to AMP requirements.
  • The AMP cache can modify responses in certain ways, such as by normalizing HTML and compressing images.

Here is what happens in an AMP cache rendezvous, step by step. For the client logic, see ampCacheRendezvous.Exchange. For the broker logic, see ampClientOffers.

  • The client forms an encoded client poll message (just as in any other form of rendezvous).
    1.0\n{"offer":"{\"type\":\"offer\",\"sdp\":\"...\"}","nat":"unrestricted"}
  • The client calls amp.EncodePath to further encode the client poll message into a broker URL, under the /amp/client/ route.
    https://snowflake-broker.torproject.net/amp/client/0Sh2kIitCf34O/MS4wCnsib2ZmZXI
    iOiJ7XCJ0eXBlXCI6XCJvZmZlclwiLFwic2RwXCI6XCIuLi5cIn0iLCJuYXQiOiJ1bnJlc3RyaWN0ZWQ
    ifQ
    The URL encoding scheme is described here. Briefly, 0 is a format version number for the rest of the path, Sh2kIitCf34O is random padding to prevent cache collisions, and MS4w... is the base64 of the client poll request message.
  • The client calls amp.CacheURL to make it relative to the AMP cache, using the AMP cache URL format. See how the domain changes to a subdomain of cdn.ampproject.org.
    https://snowflake--broker-torproject-net.cdn.ampproject.org/c/s/snowflake-broker
    .torproject.net/amp/client/0Sh2kIitCf34O/MS4wCnsib2ZmZXIiOiJ7XCJ0eXBlXCI6XCJvZmZ
    lclwiLFwic2RwXCI6XCIuLi5cIn0iLCJuYXQiOiJ1bnJlc3RyaWN0ZWQifQ
  • The client domain-fronts its request to hide the URL domain, which would reveal the domain of the broker being accessed through the cache. It changes the domain to something like www.google.com, but leaves the Host header set to snowflake--broker-torproject-net.cdn.ampproject.org.
    https://www.google.com/c/s/snowflake-broker.torproject.net/amp/client/0Sh2kIitCf
    34O/MS4wCnsib2ZmZXIiOiJ7XCJ0eXBlXCI6XCJvZmZlclwiLFwic2RwXCI6XCIuLi5cIn0iLCJuYXQi
    OiJ1bnJlc3RyaWN0ZWQifQ
  • The client sends its request to the AMP cache, the cache forwards the request to the broker, and the broker receives it. The broker calls amp.DecodePath to recover the client poll message from the URL path, passes the message to IPC.ClientOffers, and forms a client poll response message:
    {"answer":"{\"type\":\"answer\",\"sdp\":\"...\"}"}
  • The broker encodes the client poll response messages as AMP HTML using AMP armor encoding, and sends it back in an HTTP response to the AMP cache:
    <!doctype html>
    <html amp>
    <head>
    <meta charset="utf-8">
    <script async src="https://cdn.ampproject.org/v0.js"></script>
    <link rel="canonical" href="#">
    <meta name="viewport" content="width=device-width">
    <style amp-boilerplate>body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 no
    rmal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animati
    on:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,e
    nd) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{vis
    ibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility
    :visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible
    }}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyfra
    mes -amp-start{from{visibility:hidden}to{visibility:visible}}</style><noscript><
    style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animat
    ion:none;animation:none}</style></noscript>
    </head>
    <body>
    <pre>
    0eyJhbnN3ZXIiOiJ7XCJ0eXBlXCI6XCJ
    hbnN3ZXJcIixcInNkcFwiOlwiLi4uXCJ
    9In0=
    </pre>
    </body>
    </html>
    All the <link>, <meta>, <style>, etc. at the top is required AMP boilerplate. The actual data is encoded into <pre> elements. The first byte 0 is a format version number for the rest of the data. eyJh... is the base64 of the client poll response message.
  • The AMP cache receives the broker's response, applies a lot of transformations to the HTML that nevertheless leaves the contents of the <pre> elements intact, and forward the response to the client.
  • The client searches for <pre> elements, extracts and decodes their contents, and recovers the client poll response message.

The process is similar to the existing domain-fronted rendezvous, except that the client sends its request message in the URL path, rather than in an HTTP POST body, and the broker sends its response message as AMP-armored HTML.

Walkthrough of commits

Refactoring

dcf/snowflake@db243b91 improves some BrokerChannel tests to use a non-empty Host component in the broker URL. This is to improve the tests for the next commit.

dcf/snowflake@41c70f63 changes BrokerChannel's internal representation of domain fronting into a form that makes more sense to me.

dcf/snowflake@b706e9c7 makes BrokerChannel abstract over rendezvous methods. The url, front, and transport fields of BrokerChannel are removed and placed into a new httpRendezvous type.

AMP-related support

dcf/snowflake@cd56247a adds AMP-related support code, copied from my AMP cache tunnel. This includes the CacheURL function that modifies a URL to be accessed through an AMP cache, EncodePath and DecodePath, and the AMP armor encoder and decoder.

Client -ampcache option

dcf/snowflake@06ca0e86 adds an -ampcache client command-line option and an ampCacheRendezvous type, which at this point functions the same as httpRendezvous.

dcf/snowflake@a2af6fb4 makes -ampcache and ampCacheRendezvous actually use AMP encoding.

Broker /amp/client route

dcf/snowflake@f7505cc2 adds a new ampClientOffers function, which is like the existing clientOffers, but it takes the client poll request from the URL path and writes the client poll response as AMP HTML.

Documentation

dcf/snowflake@010f11c8 adds information about registration methods to the client README: -url/-front for domain fronting; -url/-ampcache/-front for AMP cache.

Discussion

The AMP cache rendezvous happens to coexist nicely with domain fronting rendezvous with respect to command-line options, because it needs the same two pieces of information (-url and -front), in addition to the new -ampcache. But conceptually, other forms of rendezvous (like DNS) may need completely different information. We may end up with a lot of options if we add a new option for every parameter of every registration method—and the program will need to check that no two options of different methods are used at the same time. @twim's branch split out an overarching -codec option (e.g. -codec=post, -codec=amp), but it was still dependent on -url and -front options, and there was no option for controlling the AMP cache URL.

Similarly, NewSnowflakeClient and NewBrokerChannel effectively take all the rendezvous-related command-line options as string arguments. If we add more rendezvous methods, we may want to factor this out into a Config struct or similar. Alternatively, the rendezvous method (which at this point is a private type, rendezvousMethod) could be specified at a higher level and be passed into NewSnowflakeClient or NewBrokerChannel.

The change to NewSnowflakeClient's signature requires a major version upgrade to v2, I believe.

The decision of what rendezvous method to use is made here, in NewBrokerChannel. The function looks at its broker, ampCache, and front arguments, and decides whether to use httpRendezvous or ampCacheRendezvous.

ampClientOffers needs to know its own /amp/client/ route, in order to know where to start looking for information encoded in the URL path. One way to avoid that would be to change the path encoding to only look at the final path component; i.e., everything after the final slash. As it is, the version 0 encoding uses a slash as a delimiter for where the actual data starts, but the encoding begins before that point.

Although the -ampcache option lets you specify the URL of any AMP cache, there's only one AMP cache that actually works, which is https://cdn.ampproject.org/. There used to be a Cloudflare one at https://amp.cloudflare.com/, but it doesn't work anymore. Bing's AMP cache doesn't fetch origin pages ("publisher pages") on demand.

Edited by David Fifield

Merge request reports