Help BridgeDB see client IP addresses of moat requests
BridgeDB sees the following HTTP headers for incoming moat requests:
Content-Length: ['504']
Via: ['1.1 bridges.torproject.org']
X-Forwarded-Host: ['bridges.torproject.org']
X-Forwarded-For: ['127.0.0.1']
Connection: ['Keep-Alive']
Host: ['127.0.0.1:3881']
X-Forwarded-Server: ['bridges.torproject.org']
Content-Type: ['application/vnd.api+json']
As part of our BridgeDB metrics (#9316 (moved)) it would be useful to see the IP address of incoming requests.
Here's an IRC conversation in which dcf explains the big picture and proposes a way to fix this issue:
dcf1│ phw: there are 2 layers of HTTP in Moat -- one is the CDN-traversal (meek) layer, and one is the end-to-end tunnelled HTTP to the web
server itself.
dcf1│ The CDN will set XFF on the outer meek layer, but not on the inner layer which is what BridgeDB sees (and it couldn't touch the inner
layer because it's HTTPS, which is the whole reason for the complicated proxypass setup).
dcf1│ phw: in other words, XFF is being set on the connection to port 2000, but then that layer is stripped off (only meek-server sees it)
and the tunneled contents go to port 3881.
dcf1│ Even though BridgeDB is an HTTP service, we go to the trouble of tunnelling a whole independent HTTP+TLS stream *inside* the
domain-fronted layer, just to prevent the CDN from tampering with end-to-end traffic.
dcf1│ That's why there are 2 proxypass: the first terminates the meek CDN layer, and the second terminates the tunnelled actual BridgeDB
exchange.
dcf1│ One way to solve this would be yet another shim that understands ExtORPort, parses out the USERADDR, and inserts it into XFF into an
HTTP header. As if the pipeline weren't long enough...
phw│ dcf1: gotcha. i'll create a ticket for this. do you mind if i quote your above explanation?
dcf1│ go for it
phw│ is meek-server exposed to USERADDR? i don't think i follow
dcf1│ meek-server can provide USERADDR to whatever port it forwards to. Currently I'm sure it's set up not to do that, to just forward the
contents without any prefix.
dcf1│ meek-server looks at Meek-IP, X-Forwarded-For, or request source IP address, and uses those to construct a USERADDR, which normally it
would provide to tor on tor's ExtORPort.
dcf1│ The purpose of ExtORPort, as opposed to ordinary ORPort, is to allow passing extra metadata like that, before the actual stream data.
dcf1│ So currently meek-server in Moat is set up to treat the local HTTPS server as its ORPort (not ExtORPort, because Apache wouldn't
understand that).
phw│ dcf1: ah, i understand. thanks for elaborating
dcf1│ Just thought of a research idea: see if any bridges are wrongly exposing their ExtORPort, which would conceivably permit manipulating
statistics.
phw│ we should call this new shim rube-goldberg-machine ;)
dcf1│ Take a subset of bridges, then port-scan them to see if any other port understands ExtORPort-prefixed Tor connections.
dcf1│ Moat is a concrete case where pluggable transports fall short of what you might want them to do. We're trying to "plug" meek-server
into another pipeline, but it's difficult because it's talking to something other than Tor. It's possible, but it quickly turns into a
big pile of shims and adapters.