Non-vanilla bridges currently have no way to automatically test their reachability. Vanilla bridges self-test the reachability of their ORPort by creating a circuit that includes themselves, but we cannot do this for, say, obfs4. In practice, this is problematic because obfs4 operators won't know if their bridge is unreachable; for example due to NAT. In fact, BridgeDB is distributing obfs4 bridges that aren't actually reachable.
We need to build a mechanism that allows non-vanilla bridges to test their reachability. Ideally, something would create a circuit over the bridge while speaking its respective transport protocol but even a simple TCP or UDP-based reachability test would already go a long way.
Looking at the discussion over in #30331, tor seems to be the right component to trigger the reachability test. In its log files, it can then yell at the operator if the test failed. The question is: how should we design the mechanism that implements the reachability test?
One solution would be a simple HTTP API that takes as input an address, port, a transport type, and optional parameters, and then tells you if the given bridge is reachable, e.g.: the URL https://pt-reachable.torproject.org/obfs4/1.2.3.4/9002 may respond with something along the lines of obfs4_reachable: true. Ideally, if the reachability test fails, we should provide details, to help the operator figure out what went wrong.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
We had a short discussion on IRC and concluded the following:
We don't want another central service that collects all the data.
A bridge can self-test by having its tor client establish a TCP connection with its obfs4 port (see #30477 (moved)). Tor can then warn the operator in its log file if the test fails. Unfortunately, this won't help if we ever deploy a PT that speaks UDP.
Some operators will ignore their log files, though, so we will still be collecting unreachable obfs4 bridges. BridgeDB should therefore learn how to test all of its bridges by speaking their respective transport protocol. It should not hand out bridges that are unreachable or otherwise broken.
We were left wondering what obfs4 operators should do in the short term, before #30477 (moved) is done, to figure out if their bridge is reachable. One way forward would be a simple web page, hosted by us, that asks for an IP address, and a port as input. The service then tries to establish a TCP connection to the given tuple, and lets the user know if it succeeded or failed. The service doesn't need to log or remember anything, and we can run it on polyanthum, the host that also runs BridgeDB.
I was wary at first of a little web app to help with testing, because it's yet another place to go break into or watch; but I think so long as we know it is a short term fix until the proper reachability testing gets into the version of Tor that people have, the usability boost makes it an acceptable risk.
I expect we have quite a few obfs4 bridges right now that are firewalled off -- and if we do a campaign to get more people to run obfs4 bridges, without a good easy intuitive step for "then check if it works" we'll have even more.
I set up a demo at https://nymity.ch:8081. After entering your bridge's IP address and port, the service tells you if the port is reachable or not. If the port is unreachable, the service tells you the error message it got. The tool also has a simple rate limiter that limits requests to an average of one per second, with bursts of up to five per second.
I set up a demo at https://nymity.ch:8081. After entering your bridge's IP address and port, the service tells you if the port is reachable or not. If the port is unreachable, the service tells you the error message it got. The tool also has a simple rate limiter that limits requests to an average of one per second, with bursts of up to five per second.
Awesome! It worked for me :)
I just have a few minor comments:
A nicer way to express the timeout here would be
timeout := 3* time.Second, but even better would be to set a commented constant at the beginning of the file.
In main() you could have the certificate and key files passed in as specific arguments such as --cert or --key as the broker does here. The advantage of this is making sure they are passed in the correct order (which should be documented outside of the usage function).
Do you also want timestamps in your logs?
As a more general note, is this meant to be used in an automated way for bridge operators to log and report to themselves when their port isn't reachable? Or as an occasional manual check? I know this is something temporary so maybe not a large consideration, but returning something other than a 200 OK if the port is unreachable would make it easier to write a client-side go program that performs this check automatically.
A nicer way to express the timeout here would be
timeout := 3* time.Second, but even better would be to set a commented constant at the beginning of the file.
In main() you could have the certificate and key files passed in as specific arguments such as --cert or --key as the broker does here. The advantage of this is making sure they are passed in the correct order (which should be documented outside of the usage function).
Also fixed in the same branch.
Do you also want timestamps in your logs?
I would like to keep timestamps because they tell us how much (ab)use the service is seeing. Do you see any issues with timestamps?
On a related note: I noticed that the http package can log error messages that include the client's IP address. I included snowflake's safe logger to prevent this from happening.
As a more general note, is this meant to be used in an automated way for bridge operators to log and report to themselves when their port isn't reachable? Or as an occasional manual check? I know this is something temporary so maybe not a large consideration, but returning something other than a 200 OK if the port is unreachable would make it easier to write a client-side go program that performs this check automatically.
At this point it's meant for occasional manual checks. I plan to add a link to the service to our obfs4proxy installation guide. I originally intended this service to be used as an API (see the bottom paragraph of the ticket's description) but it's not clear if we want yet another service that deals with bridge data. The better way forward may be to improve BridgeDB.
A nicer way to express the timeout here would be
timeout := 3* time.Second, but even better would be to set a commented constant at the beginning of the file.
I would like to keep timestamps because they tell us how much (ab)use the service is seeing. Do you see any issues with timestamps?
As long as you're not logging IP addresses, this seems fine to me. You're also not exporting the data, it's mostly a consideration in the case that the machine or service is compromised. I don't see issues with an attacker getting ahold of the number of requests made and the times at which they are made. There are probably easier ways to find out whatever information they would hope to find out from these logs anyway.
On a related note: I noticed that the http package can log error messages that include the client's IP address. I included snowflake's safe logger to prevent this from happening.
Oh good point, I'm glad the package is useful here.
As long as you're not logging IP addresses, this seems fine to me. You're also not exporting the data, it's mostly a consideration in the case that the machine or service is compromised. I don't see issues with an attacker getting ahold of the number of requests made and the times at which they are made. There are probably easier ways to find out whatever information they would hope to find out from these logs anyway.
Yes, agreed.
Ok, I'll move forward with setting up the service on polyanthum. Thanks for the reviews!
On IRC, we concluded that we should add another ProxyPass directive to our apache config on polyanthum, so bridge operators can access this service over a URL such as bridges.torproject.org/scan/.
We should also make the service start automatically at boot -- perhaps by creating a systemd script?