Automate measuring connection timeouts per exit
I have been investigating connection timeouts manually, using Tor Browser in #21394 (moved).
My manual test is as follows: I set Tor Browser's pref "extension.torbutton.loglevel to 3. In the Browser console, I filter for the word "TIMEOUT". Then I attempt to connect to a website, and I count the number of TIMEOUTs displayed on the browser console, such as this:
[10-26 06:25:47] Torbutton INFO: controlPort >> 650 STREAM 532 DETACHED 833 2606:2800:220:1:248:1893:25c8:1946:80 REASON=TIMEOUT
I repeatedly hit "New Tor Circuit for this Site" in the torbutton menu and manually write down how many timeouts were observed for each circuit. Here's my data from when I attempted to connect to example.com 50 times:
This sort of stream timeout is because, according to arma:
it means you sent your begin cell, and then you didn't get an end cell or a connected cell after 10 seconds
The dominant source of timeouts appears to be DNS resolution failures at the exit nodes. I observed almost no timeouts connecting directly to IPv4 or IPv6 addresses instead of a domain name (see ticket:21394#comment:20).
Regardless of the cause, I think these timeouts are causing serious damage to Tor Browser usability and we should try hard to fix it.
teor suggested some fixes to tor. In the meantime it would be great if we had an automated test that can measure the frequency of connection timeouts on a daily basis. I imagine it could generate several circuits through each exit node (both to domains and to bare IP addresses) and produce summary statistics. That would also help us know if the fixes are working or if we have any regressions in the future.
Is this something the Metrics team would be interested in working on? I see the timeout statistics on https://metrics.torproject.org/torperf-failures.html but I don't think that is measuring exactly the same thing.