Identify and reduce circuit failures with some relays
From the list of relays that @gk detected that longclaw is failing to measure but not Torflow, there is CA94704217260E7483DA88719CACD7A94C564D5C.
We first suspected that maybe the 10secs timeout to build the circuit is too low.
Torflow doesn't specify a timeout, nor in the TorCtl code
circ.circ_id = self.extend_circuit(0, circ.id_path()) nor in the configuration. Stem added the timeout option in
new_circuit only in version 1.7.
CircuitBuildTimeout defaults to 60secs but sbws configures it to 10, and we don't let tor to learn about circuit timeout
I attempted to manually build the same circuits that failed to be built in the log with CA94704217260E7483DA88719CACD7A94C564D5C, and they failed. They also failed incrementing the timeout to 60, or not using timeout arg in
new_circuit, in which case tor will return TIMEOUT or DESTROYED.
Then i attempted to build the circuit using CA94704217260E7483DA88719CACD7A94C564D5C as an exit, and it succeed.
The reason why CA94704217260E7483DA88719CACD7A94C564D5C is being used as entry, it's because it can not exit to ALL public IPs.
We added that in #40006 (closed) because there were relays that had a policy that rejected to exit to the IP that our Web server has.
I checked that with the current code and consensus, there're 131 exits that would be measured as first hop because of the patch added in #40006 (closed) (
.exit_policy.strip_private().can_exit_to(port=443, strict=True)]). 0 if we'd remove
From the 131, if we check ipv6, it would only be 63.
We don't know yet why this relay fails to build circuits as an entry. We could try to build circuits with those 131 exits as 1st hop and see which ones fail.
I'm also curious to see whether most of those exits are in the ~150 relays list @gk found that are not being measured by longclaw.
A simple patch we can try, is to try to measure an exit that fails to be measured, in a different position.
Torflow seems to check exit policies too. Looking at the logs of torflow running for 1 day, it measured CA94704217260E7483DA88719CACD7A94C564D5C only as an exit. I don't know if it would be measured as entry at some point.