bridgestrap returning ECONNREFUSED for all bridges?

Cc'ing @cohosh and @phw so they know about the ticket.

Maybe there is a bridgestrap survival guide tucked away in the wiki somewhere?

Found it!

https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Bridgestrap-Survival-Guide

Ok, so bridgestrap is running. And bridgestrap's Tor is running. It really is testing its new bridges, and finding a "connect refused" response to each test.

Weirdly, the logs seem to indicate that the bridge line it's using has the string "[scrubbed]" where the bridge's address is supposed to be. Maybe that's just an oddity of the logs and the actual strings have actual addresses in them? Because if it's trying to connect to the bridge address "[scrubbed]" then that would be a good reason why it can't reach any of them. :)

If scrubbed is just something that happens in the logs, and actually real addresses are being used, then the next steps to look at are

(a) what are those addresses? do they look right? when you connect, from that server, does it work?

and then

(b) what do the tor logs look like? it looks like there aren't any kept in the bridgestrap service, and I wonder what Tor is actually trying to connect to. Or if Tor has been writing out sad pitiful things to its logs about something that has gone wrong that nobody will notice and fix. :)

assigned to @cohosh

added Next label

Yeah we use a log scrubber for most of our services for privacy. I'm looking into this now.

Looks like this has been happening for a while

From phw in IRC just now:

14:38 <+phw> tor's data directory should be in /tmp/tor-datadir-...

Looking at these logs, we can see exactly when this failed and the errors we're getting:

Mar 04 15:28:47.000 [warn] Pluggable Transport process terminated with status code 512
Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.
Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.
Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.
Mar 04 15:28:48.000 [warn] The connection to the SOCKS5 proxy server at 127.0.0.1:46439 just failed. Make sure that the proxy server is up and running.

and this just continues forever.

I've saved the entire data directory (including the tor log) in /home/bridgestrap/2021-03-21-datadir.tar.gz but it looks like obfs4 just failed and we couldn't recover from that.

This is exactly the problem described in tpo/core/tor#33669 (closed).

Right now the only solution on our side is to restart Tor, so what we should address in this ticket is learning when this happens. A month is too long to go without noticing. I'm adding this to the list of motivating examples for our work on alerts next week.

I also just updated the survival guide with info on how to find the tor log: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Bridgestrap-Survival-Guide

Nice detective work, @cohosh!

We tracked down the source of this problem, and we now have prometheus alerts for bridgestrap to detect it in the future, so I'm going to close this issue.

closed

seems to re-appear: https://bridges.torproject.org/status?id=662D4E4DE2C883625C543DFA3C4EE466899E6C85

Thanks @toralf we keep manually restarting it, but these issues will continue until tpo/core/tor#33669 (closed) is fixed :/

I just restarted it again, so it should work now.

not really I fear

Yeah it's failing at a more frequent rate recently it seems. The best I can do is the existing cronjob. Until tpo/core/tor#33669 (closed) is fixed we're going to have to consider this unreliable. Unless we figure out why the PT process is exiting.

mentioned in issue tpo/core/tor#33669 (closed)

I believe @cohosh added a cron job to restart bridgestrap + bridgestrap's Tor every so often, which should be an adequate workaround while we wait for a fix on the Tor side.

There is also the mystery of "what is making obfs4proxy mysteriously exit?"

That mystery might also be interesting to the Tor Browser folks (cc @sysrqb so he knows about this topic), since if Tor Browser's obfs4proxy dies, then obfs4 bridges won't work anymore, and the user will probably not ever figure out what happened. (A full Tor Browser restart should fix it each time.)

marked this issue as related to tpo/core/tor#33669 (closed)

mentioned in issue #20 (closed)

I was wondering why the crontab wasn't working. Turns out you can't just run systemctl --user from a crontab and expect it to work: https://unix.stackexchange.com/questions/474439/using-cron-to-restart-a-systemd-user-service/474440#474440

So I set up a systemd timer instead. You can see the changes here: bridgestrap-admin@9f7c7b75

Let's see if this works!

mentioned in issue tpo/tpa/team#41676 (closed)

bridgestrap returning ECONNREFUSED for all bridges?

Designs

Child items ...

Activity