Raw import from Trac using Trac markup language. authored by Alexander Hansen Færøy's avatar Alexander Hansen Færøy
Notes about [https://code.google.com/p/goagent/ GoAgent] ([https://github.com/goagent/goagent GitHub]) ([https://en.wikipedia.org/wiki/GoAgent Wikipedia en]) ([https://zh.wikipedia.org/wiki/GoAgent Wikipedia zh]). Some of what's written here is speculative or false because it's hard to find good English documentation.
!GoAgent asks users to install their own copy of the software on [https://developers.google.com/appengine/ Google App Engine] and use that copy as their personal proxy. That is, App Engine has a service where you can run your webapp on Google's hardware, for free, subject to some bandwidth quotas. !GoAgent is set up to be easily uploadable to App Engine. The idea is that you create your own free App Engine account and upload your own copy of !GoAgent. !GoAgent is the most widely used circumvention tool in China, according to [https://openitp.org/pdfs/CollateralFreedom.pdf OpenITP's April 2013 "Collateral Freedom" report] (see particularly section 2).
This short video in English shows how to create a Google Account, use it to create an App Engine instance, upload the GoAgent software, and configure your local browser.
* https://www.youtube.com/watch?v=qqlmX0ws-1s "How to set up your own free Proxy / VPN with GoAgent (How to use Facebook in China)"
* http://mrjetlee.com/facebook-from-china/ Accompanying web page
It may be that there are other deployment methods besides App Engine. I have seen some references to [https://github.com/goagent/goagent/blob/3.0/server/php/index.php PHP] and something called PaaS.
The main trick behind !GoAgent is in how it communicates with App Engine, despite appspot.com being blocked. It makes an HTTPS connection not to appspot.com, but to www.google.com or one of a handful of other unblocked Google domains. The TLS client hello does not contain any SNI that might be pattern-matched by a censor. The HTTP request inside the TLS layer has a header `Host: myapp.appspot.com`, which causes the Google frontend server to send the traffic to App Engine, even though from outward appearances it is addressed to the search page. This is the same trick used by [https://gitweb.torproject.org/flashproxy.git/tree/facilitator/doc/appspot-howto.txt flashproxy-reg-appspot].
== Source code ==
The only source code at the [https://code.google.com/p/goagent/ main Google Code site] is a zip download. The [https://code.google.com/p/goagent/source/browse/ source code browser] there contains only a README file. It looks like the source code is actually developed at GitHub:
* https://github.com/goagent/goagent
* `git clone https://github.com/goagent/goagent.git`
The local proxy, the software you run on your desktop, is in the [https://github.com/goagent/goagent/tree/3.0/local local] directory, and the main program is [https://github.com/goagent/goagent/blob/3.0/local/proxy.py proxy.py].
The Internet proxy server, than runs on App Engine, is in the [https://github.com/goagent/goagent/tree/3.0/server/python server/python] directory, specifically [https://github.com/goagent/goagent/blob/3.0/server/python/wsgi.py wsgi.py].
== Notes on the client source code ==
!GoAgent appears to be an HTTP proxy only (not SOCKS). For HTTPS, it appears to do local MITM that replaces the site's cert with one of its own.
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L2052 do_CONNECT] "handle CONNECT cmmand, socket forward or deploy a fake cert"
* It seems that [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L2062 if the destination is in common.GOOGLE_SITES], then it calls `do_CONNECT_FWD` to make the request via App Engine.
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L2065 Otherwise] it does the MITM with `do_CONNECT_AGENT`.
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L6 "based on mitmproxy"]
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L269 import_ca] adds the !GoAgent cert to the local trusted certs(!). See [https://www.bamsoftware.com/sec/goagent-advisory.html#en-cacert security advisory].
Here's where they switch "www.google.com" for "*.appspot.com":
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L1004 pick up the certificate]
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L1024 special case for appspot.com in validation]
TLS characteristics. They are using Python to make their TLS connections.
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L802 TLS cipher list]
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L844 "set_ciphers as Modern Browsers"]
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L863 randomly discard about half of ciphers](!) when ssl_obfuscate is true.
DNS blacklists:
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L664 One blacklist], don't know what that's about.
* [https://github.com/goagent/goagent/blob/24cff86308d0fa8eb913ea46fb1f3d0aa5ea4be6/local/proxy.py#L2348 Another DNS blacklist], might be the same.
* from the source code, it seems these are considered "bad IPs", even though they belong to innocent-looking blocks (e.g. Google). My guess is that GFW DNS poisoning sometimes returns real-looking addresses that are actually fake, e.g. from [http://zh.wikipedia.org/wiki/%E5%9F%9F%E5%90%8D%E6%9C%8D%E5%8A%A1%E5%99%A8%E7%BC%93%E5%AD%98%E6%B1%A1%E6%9F%93 ZH-wikipedia's page on DNS poisoning], translated by Google: "A particular example is the comparison Google+ plus.google.com domain is contaminated to Google's own servers 74.125.127.102,74.125.155.102, 74.125.39.113 there on 209.85.229.138 ip address in the form of blockade blockade".
The App Engine URL Fetch service [https://developers.google.com/appengine/docs/python/urlfetch/#headers_identifying_request_source embeds the appid] in the User-Agent and X-Appengine-Inbound-Appid headers. Since GoAgent users typically upload their own copy of the server code to their own appid, users can be tracked by the target web site over time. The User-Agent is commonly logged, so such tracking doesn't even require a special web server configuration. The appendix to the User-Agent string looks like: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~''appid'')".
== Ideas for a similar pluggable transport ==
See [[meek]] for development along these lines.
Maybe we could make a transport that is compatible with the actual !GoAgent protocol, so that anyone with their own existing deployment can continue to use it, and perhaps we can operate an instance open to the public just by deploying the existing !GoAgent code. On the other hand, we could merely take inspiration from !GoAgent and run our own server with its own protocol.
The idea of reflecting HTTP requests through a third-party web server/client is similar to the [https://www.bamsoftware.com/papers/oss.pdf OSS] design. The situation with App Engine is better than what is assumed in the paper: Because we control the web app (we ''are'' the OSS), we can return downstream data in response bodies, and not have to rely on being able to make HTTP requests back to the censored client.
We would want a general TCP proxy, not only something that shows you web pages. It will be necessary to encode the stream data into HTTP requests and responses. Transporting data is easy through HTTP request and response bodies. What is also needed is a way to say that a later HTTP request belongs to the same logical stream as an earlier one. The OSS paper did this by associating with each request a stream ID, which is generated randomly for the first message in each new stream. It may also be necessary to have some kind of reliability layer, for when a request to or from App Engine is lost. The OSS paper used seq and ack numbers for this.
== Project deleted ==
website and github source were deleted ~Aug 24 2015
https://en.greatfire.org/blog/2015/aug/chinese-developers-forced-delete-softwares-police
== Similar projects ==
* [https://code.google.com/p/g-proxy/ g-proxy] ([https://g-proxy.appspot.com/ demo page])
* [https://github.com/madeye/gaeproxy gaeproxy] for Android
* !GoAgent says it was forked from [https://code.google.com/p/gappproxy/ GAppProxy]/[https://code.google.com/p/wallproxy/ wallproxy].
* Our own [https://gitweb.torproject.org/flashproxy.git/tree/facilitator/doc/appspot-howto.txt flashproxy-reg-appspot] uses the `Host` trick to talk to App Engine while appearing to talk to www.google.com.
\ No newline at end of file