Streams sometimes stall for up to 1 hour without making any progress
We're measuring Tor performance using our OnionPerf tool by regularly downloading 5 MiB files over Tor. Some of these measurements run longer than 1 hour, after which a timeout in OnionPerf aborts them, or run for up to 30 minutes until they complete. (For comparison, 99% of successful runs complete within roughly two minutes.)
I noticed one particular source of slowness which I think is the reason for the application timeouts after 1 hour and for some of the 1% slowest successful runs: streams stall for seconds or minutes and would even stall for hours if we let them, without making any progress; and suddenly they make progress until they complete or stall again.
I'm attaching four graphs showing this problem. All these graphs show download progress over time with time on x and progress on y. Each gray bar is one measurement. The black line starts at the bottom of its gray bar and goes up to the top of that bar as more data is received. The number on the right is the stream ID.
The first two graphs show application timeouts, the last two show the slowest 1% of successful runs. First and third show downloads from a public server, second and fourth from an onion server.
Note that not all runs have this problem of stalling as described above. Some of the more obvious cases are:
- Page 3, stream ID 436971: that stream basically does nothing for over half an hour and then completes within seconds.
- Page 3, stream ID 436986: same as before, just with a shorter stalling period.
Other cases have different issues. For example, stream ID 34117 on page 3 is rather slow for most of the time and then suddenly gets faster at the end. However, it does not stall.
I do have tor logs and tor controller event logs for these cases. Here's a log containing many relevant STREAM and STREAM_BW events: https://people.torproject.org/~karsten/volatile/streams-2019-02-18.log.xz (61.1K)
These measurements have been made using tor versions 0.2.9.11-dev and 0.3.0.7-dev.
I can provide more data. But rather than uploading everything, please let me know what data would be most useful, and I'll provide just that.