GoAgent asks users to install their own copy of the software on Google App Engine and use that copy as their personal proxy. That is, App Engine has a service where you can run your webapp on Google's hardware, for free, subject to some bandwidth quotas. GoAgent is set up to be easily uploadable to App Engine. The idea is that you create your own free App Engine account and upload your own copy of GoAgent. GoAgent is the most widely used circumvention tool in China, according to OpenITP's April 2013 "Collateral Freedom" report (see particularly section 2).
This short video in English shows how to create a Google Account, use it to create an App Engine instance, upload the GoAgent software, and configure your local browser.
- https://www.youtube.com/watch?v=qqlmX0ws-1s "How to set up your own free Proxy / VPN with GoAgent (How to use Facebook in China)"
- http://mrjetlee.com/facebook-from-china/ Accompanying web page
It may be that there are other deployment methods besides App Engine. I have seen some references to PHP and something called PaaS.
The main trick behind GoAgent is in how it communicates with App Engine, despite appspot.com being blocked. It makes an HTTPS connection not to appspot.com, but to www.google.com or one of a handful of other unblocked Google domains. The TLS client hello does not contain any SNI that might be pattern-matched by a censor. The HTTP request inside the TLS layer has a header
Host: myapp.appspot.com, which causes the Google frontend server to send the traffic to App Engine, even though from outward appearances it is addressed to the search page. This is the same trick used by flashproxy-reg-appspot.
git clone https://github.com/goagent/goagent.git
Notes on the client source code
GoAgent appears to be an HTTP proxy only (not SOCKS). For HTTPS, it appears to do local MITM that replaces the site's cert with one of its own.
do_CONNECT "handle CONNECT cmmand, socket forward or deploy a fake cert"
Here's where they switch "www.google.com" for "*.appspot.com":
TLS characteristics. They are using Python to make their TLS connections.
- TLS cipher list
- "set_ciphers as Modern Browsers"
- randomly discard about half of ciphers(!) when ssl_obfuscate is true.
- One blacklist, don't know what that's about.
- Another DNS blacklist, might be the same.
- from the source code, it seems these are considered "bad IPs", even though they belong to innocent-looking blocks (e.g. Google). My guess is that GFW DNS poisoning sometimes returns real-looking addresses that are actually fake, e.g. from ZH-wikipedia's page on DNS poisoning, translated by Google: "A particular example is the comparison Google+ plus.google.com domain is contaminated to Google's own servers 22.214.171.124,126.96.36.199, 188.8.131.52 there on 184.108.40.206 ip address in the form of blockade blockade".
The App Engine URL Fetch service embeds the appid in the User-Agent and X-Appengine-Inbound-Appid headers. Since GoAgent users typically upload their own copy of the server code to their own appid, users can be tracked by the target web site over time. The User-Agent is commonly logged, so such tracking doesn't even require a special web server configuration. The appendix to the User-Agent string looks like: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~_appid_)".
Ideas for a similar pluggable transport
Maybe we could make a transport that is compatible with the actual GoAgent protocol, so that anyone with their own existing deployment can continue to use it, and perhaps we can operate an instance open to the public just by deploying the existing GoAgent code. On the other hand, we could merely take inspiration from GoAgent and run our own server with its own protocol.
The idea of reflecting HTTP requests through a third-party web server/client is similar to the OSS design. The situation with App Engine is better than what is assumed in the paper: Because we control the web app (we are the OSS), we can return downstream data in response bodies, and not have to rely on being able to make HTTP requests back to the censored client.
We would want a general TCP proxy, not only something that shows you web pages. It will be necessary to encode the stream data into HTTP requests and responses. Transporting data is easy through HTTP request and response bodies. What is also needed is a way to say that a later HTTP request belongs to the same logical stream as an earlier one. The OSS paper did this by associating with each request a stream ID, which is generated randomly for the first message in each new stream. It may also be necessary to have some kind of reliability layer, for when a request to or from App Engine is lost. The OSS paper used seq and ack numbers for this.
website and github source were deleted ~Aug 24 2015 https://en.greatfire.org/blog/2015/aug/chinese-developers-forced-delete-softwares-police