Tor descriptor fetching is unreliable
Hi all. This last weekend I wrote a little script to validate tor's descriptors every hour and ensure that all of the directory authorities are reachable. All the script does is...
- Download and validate the server descriptors from a random authority.
- Download and validate the extrainfo descriptors from a random authority.
- Download and validate the present consensus from each authority.
This is all well and good, but I've found that downloading from Tor's directory authorities is surprisingly flaky. The most troubled authority looks to be maatuska, though all of them manifest issues (especially for server/extrainfo descriptors, which is by far the larger requests)...
atagar@odin:~/Desktop/tor/tor-utils$ grep "Unable to download descriptors" logs/descriptor_checker.stem_debug | cut -f 9 -d ' ' | sort | uniq -c | sort -rn
58 'http://171.25.193.9:443/tor/status-vote/current/consensus.z'
29 'http://171.25.193.9:443/tor/status-vote/current/consensus.z':
15 'http://76.73.17.194:9030/tor/server/all.z'
15 'http://208.83.223.34:443/tor/server/all.z'
12 'http://76.73.17.194:9030/tor/extra/all.z'
10 'http://171.25.193.9:443/tor/server/all.z'
10 'http://128.31.0.39:9131/tor/extra/all.z'
9 'http://171.25.193.9:443/tor/extra/all.z'
8 'http://208.83.223.34:443/tor/extra/all.z'
8 'http://154.35.32.5:80/tor/server/all.z'
8 'http://128.31.0.39:9131/tor/server/all.z'
6 'http://208.83.223.34:443/tor/status-vote/current/consensus.z'
6 'http://193.23.244.244:80/tor/server/all.z'
5 'http://193.23.244.244:80/tor/extra/all.z'
5 'http://154.35.32.5:80/tor/extra/all.z'
4 'http://212.112.245.170:80/tor/extra/all.z'
4 'http://208.83.223.34:443/tor/server/all.z':
3 'http://76.73.17.194:9030/tor/server/all.z':
3 'http://208.83.223.34:443/tor/status-vote/current/consensus.z':
3 'http://208.83.223.34:443/tor/extra/all.z':
3 'http://193.23.244.244:80/tor/server/all.z':
3 'http://154.35.32.5:80/tor/extra/all.z':
2 'http://76.73.17.194:9030/tor/status-vote/current/consensus.z'
2 'http://76.73.17.194:9030/tor/extra/all.z':
2 'http://171.25.193.9:443/tor/server/all.z':
2 'http://171.25.193.9:443/tor/extra/all.z':
2 'http://128.31.0.39:9131/tor/server/all.z':
1 'http://76.73.17.194:9030/tor/status-vote/current/consensus.z':
1 'http://212.112.245.170:80/tor/server/all.z'
1 'http://212.112.245.170:80/tor/extra/all.z':
1 'http://154.35.32.5:80/tor/server/all.z':
1 'http://128.31.0.39:9131/tor/extra/all.z':
Stem attempts three requests before giving up, and very frequently hits that limit. Some errors are connection resets and timeouts, but the most frequent is a 'successful' connection where the compressed content is simply truncated (and hence corrupt). Karsten, Kostas, and I ran into this earlier too on tor-dev@...
atagar@odin:~/Desktop/tor/tor-utils$ grep "Unable to download descriptors" logs/descriptor_checker.stem_debug | cut -f 10- -d ' ' | sort | uniq -c | sort -nr
56 (2 retries remaining): Error -5 while decompressing data: incomplete or truncated stream
44 (2 retries remaining): <urlopen error [Errno 111] Connection refused>
36 (1 retries remaining): Error -5 while decompressing data: incomplete or truncated stream
33 <urlopen error [Errno 111] Connection refused>
33 (1 retries remaining): <urlopen error [Errno 111] Connection refused>
22 Error -5 while decompressing data: incomplete or truncated stream
6 (2 retries remaining): [Errno 104] Connection reset by peer
4 [Errno 104] Connection reset by peer
4 (1 retries remaining): [Errno 104] Connection reset by peer
1 <urlopen error timed out>
1 (2 retries remaining): <urlopen error timed out>
1 (2 retries remaining): timed out
1 (1 retries remaining): <urlopen error timed out>
I could simply up the retry count to something ridiculously high (say, ten?) to address these issues but it would be better if we figured out the root cause.
Any thoughts on how best to approach this?
Thanks! -Damian