brade in legacy/trac#10418 (moved) gave some reproducible instructions on how to trigger the "can't find a pluggable transport proxy" error, even when obfsproxy has no trouble starting up.
You can find brade's post in comment:40:ticket:10418
We were supposed to connect to bridge '1.3.4.5:12345' using pluggable transport 'obfs3', but we can't find a pluggable transport proxy supporting 'obfs3'. This can happen if you haven't provided a ClientTransportPlugin line, or if your pluggable transport proxy stopped running.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Trac: Description: brade in legacy/trac#10418 (moved) gave some reproducible instructions on how to trigger the "can't find a pluggable transport proxy" error, even when obfsproxy has no trouble starting up.
We were supposed to connect to bridge '1.3.4.5:12345' using pluggable transport 'obfs3', but we can't find a pluggable transport proxy supporting 'obfs3'. This can happen if you haven't provided a ClientTransportPlugin line, or if your pluggable transport proxy stopped running.
to
brade in legacy/trac#10418 (moved) gave some reproducible instructions on how to trigger the "can't find a pluggable transport proxy" error, even when obfsproxy has no trouble starting up.
You can find brade's post in comment:40:ticket:10418
We were supposed to connect to bridge '1.3.4.5:12345' using pluggable transport 'obfs3', but we can't find a pluggable transport proxy supporting 'obfs3'. This can happen if you haven't provided a ClientTransportPlugin line, or if your pluggable transport proxy stopped running.
After a little bit of debugging, I found out that the errors were thrown because we tried to create an OR connection before the managed proxy was finished configuring (managed proxies are configured with a configuration protocol, that takes a bit of time).
ATM, we are blocking bridge descriptor fetches when PTs haven't finished configuring, by doing:
if (pt_proxies_configuration_pending()) return;
in fetch_bridge_descriptors(). But apparently, blocking that function is not sufficient to postpone all OR connections.
For example, the errors in brade's bug were caused because of the:
...#12 0x0000555555633c43 in directory_initiate_command (if_modified_since=0, payload_len=0, payload=0x0, resource=0x55555568a2c7 "microdesc", indirection=DIRIND_ONEHOP, router_purpose=0 '\000', dir_purpose=14 '\016', digest=0x555555c5f27c "\240\235Sm\321u-T.\037\273<\234\344D\235Q)\202\071\020\313\031S", dir_port=0, or_port=<optimized out>, _addr=0x7fffffffdfa0, address=<optimized out>) at src/or/directory.c:878#13 directory_get_from_dirserver (dir_purpose=dir_purpose@entry=14 '\016', router_purpose=router_purpose@entry=0 '\000', resource=<optimized out>, resource@entry=0x55555568a2c7 "microdesc", pds_flags=pds_flags@entry=2) at src/or/directory.c:467#14 0x000055555558c354 in update_consensus_networkstatus_downloads (now=now@entry=1394201439) at src/or/networkstatus.c:767#15 0x000055555558dd40 in update_networkstatus_downloads (now=1394201439) at src/or/networkstatus.c:906#16 0x0000555555586c0d in run_scheduled_events (now=1394201439) at src/or/main.c:1468....
codepath. This means that maybe we should also block entry to update_consensus_networkstatus_downloads() if pt_proxies_configuration_pending(). I made a small patch that did that, and it seems to solve the error messages in this case.
But how can we be sure that we have blocked all the relevant codepaths that might launch an OR connection before PTs have been configured?
If I'm not mistaken, on a client, two things can cause us to launch an OR connection: an attempt to fetch directory info, and an attempt to build a circuit for traffic.
I believe that if should_delay_dir_fetches() is true, we won't try to launch any directory fetches.
If we don't have enough directory info, then we won't try to build circuits, because router_have_minimum_dir_info() will return false. Also, if should_delay_dir_fetches() is true, we won't update have_minimum_dir_info in update_have_minimum_dir_info() [*], so even if we do have sufficient directory info in the cache, I believe we won't acknowledge that until should_delay_dir_fetches() returns false.
So the approach above seems sound-ish to me. I'm fine merging this patch if you've tested it, and it comes with a changes file.
[*] We will need to update the notice message and status message that come from update_router_have_minimum_dir_info() when should_delay_dir_fetches() returns true.
QOI: msg_out should be set to NULL if we aren't delaying.
Fixed in fixup commit. Please check the branch again.
What limits the frequency with which we log this message?
Hm, you mean of the log_notice() in update_router_have_minimum_dir_info()? Good question.
IIUC, we normally don't reach the log_notice() if we can't connect to bridges, because we don't have a consensus at all (it will fail the checks above).
If we have a previously cached consensus, we might reach the log_notice() in which case it will print "Delaying dir fetches: PT proxies are still configuring" till the PTs finish configuring. This usually takes 2 run_scheduled_events() ticks, so that message should not appear too many times there. If you want, we can turn it to an info-level log if it's the PT case (and not the "no bridge descriptors known" case).