Bridge 892CF98FB2151BA4C1AD519E63B1F84C48238616 advertises:* obfs4: dysfunctional Error: We got an ECONNREFUSED while connecting to the target OR. Last tested: 2021-03-22 22:51:46.804048996 +0000 UTC (8h4m5.867658121s ago)
I'm guessing something got wedged. Maybe the Tor client connected to bridgestrap crashed? Maybe we want some sort of monitoring for this service? Maybe we already have it? :)
Designs
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Ok, so bridgestrap is running. And bridgestrap's Tor is running. It really is testing its new bridges, and finding a "connect refused" response to each test.
Weirdly, the logs seem to indicate that the bridge line it's using has the string "[scrubbed]" where the bridge's address is supposed to be. Maybe that's just an oddity of the logs and the actual strings have actual addresses in them? Because if it's trying to connect to the bridge address "[scrubbed]" then that would be a good reason why it can't reach any of them. :)
If scrubbed is just something that happens in the logs, and actually real addresses are being used, then the next steps to look at are
(a) what are those addresses? do they look right? when you connect, from that server, does it work?
and then
(b) what do the tor logs look like? it looks like there aren't any kept in the bridgestrap service, and I wonder what Tor is actually trying to connect to. Or if Tor has been writing out sad pitiful things to its logs about something that has gone wrong that nobody will notice and fix. :)
14:38 <+phw> tor's data directory should be in /tmp/tor-datadir-...
Looking at these logs, we can see exactly when this failed and the errors we're getting:
Mar 04 15:28:47.000 [warn] Pluggable Transport process terminated with status code 512Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.
and this just continues forever.
I've saved the entire data directory (including the tor log) in /home/bridgestrap/2021-03-21-datadir.tar.gz but it looks like obfs4 just failed and we couldn't recover from that.
Right now the only solution on our side is to restart Tor, so what we should address in this ticket is learning when this happens. A month is too long to go without noticing. I'm adding this to the list of motivating examples for our work on alerts next week.
We tracked down the source of this problem, and we now have prometheus alerts for bridgestrap to detect it in the future, so I'm going to close this issue.
Yeah it's failing at a more frequent rate recently it seems. The best I can do is the existing cronjob. Until tpo/core/tor#33669 (closed) is fixed we're going to have to consider this unreliable. Unless we figure out why the PT process is exiting.
I believe @cohosh added a cron job to restart bridgestrap + bridgestrap's Tor every so often, which should be an adequate workaround while we wait for a fix on the Tor side.
There is also the mystery of "what is making obfs4proxy mysteriously exit?"
That mystery might also be interesting to the Tor Browser folks (cc @sysrqb so he knows about this topic), since if Tor Browser's obfs4proxy dies, then obfs4 bridges won't work anymore, and the user will probably not ever figure out what happened. (A full Tor Browser restart should fix it each time.)