Improve Shadow's network models
Shadow heavily over-estimated the performance benefits that congestion control would provide. Compare the graphs from onionperf at (tpo/network-health/analysis#37 (closed) to the the graphs in https://blog.torproject.org/congestion-contrl-047/
I have been running some simulations, and the bulk of the difference appears to be because we used a network model from a "flooding" period (2021-09). If we use a network model from a non-flooding period (2021-07), then we're much closer, but still not there.
Here's some guesses as to the remaining source of the discrepancy:
- Only 80% of exits are upgraded (sims currently running with a non-flooding model to check this)
- Shadow may be sampling relays oddly to build its network models, and not choosing enough slow relays, esp in the flooding models.
- Shadow does not simulate shared relay links
- Shadow does not simulate CPU
- There is an active onion service DoS: https://metrics.torproject.org/hidserv-rend-relayed-cells.html
- Our sbws load balancing has not upgraded to CC yet, and/or is otherwise different than Shadow
My personal bet is that the first few items are more likely than the lower ones.
For the last two, I will be working with others to investigate that. But in the meantime, we should see what is easy to do to improve Shadow's model to get its performance results closer to live.
In terms of the goal here: I suspect that some Guards and/or slow relays are bottlenecking the live network's performance. We can improve this situation by changing the Fast and Guard cutoffs, lowering CBT, and increasing the number of Guard relays in use by clients to 2. After these changes, we may be able to improve performance further by setting higher congestion control parameters, since there will be less likelihood of overload after these changes.
In order to test this, we need more confidence that Shadow is properly sampling and modeling the live network, so that as we experiment with new Fast and Guard cutoffs, we can be confident that those improvements are realistic before we change the network.