v3bw files with too large of weights lead to relays being selected nearly uniformly at random

As part of working on the FlashFlow paper (currently under submission) we ran Shadow simulations and compared it to TorFlow. Not surprising.

Summary of the Shadow network used:

5% of the real Tor network in size
- 44 exits
- 104 guards
- 180 middles
- 3 auths
- 10 "markov" clients. It's not terribly important to know what they're doing, other than knowing they're making lots of 3-hop exit circuits and exchanging traffic with servers. 2 of the clients have tor debug logs. All 10 contribute to the relay selection data.
Tor version used is a63b4148 (master branch as of March 5th, 2020) plus a small logging patch. Branch here. This existed in 0.3.5.7 as well. I don't know when this problem started because I don't know exactly what the problem is.
Shadow 292cd89ba52fc2972fdd9d2e27e384db9601663b (as of Jan 10th, 2020).
Shadow-plugin-tor 8deab15a032f5173ba7c12ad6dd0bcb1cb0c3463 (as of Oct 2019) plus patch so it works with new Tors. Branch here.

The only difference in the simulations are the v3bw files used.

There are three simulations:

Torflow-derived weights (TF)
FlashFlow-derived weights (FF init)
FlashFlow-derived weights that have all been divided by 136 (FF scaled)

weight-dist.pdf shows the distribution of the weights in the v3bw files, both with the raw absolute weights and as normalized (norm_weight = weight / total_weight). Despite having nearly identical normalized weight distributions (note: FF init and FF scaled are obviously identical), FF init results in (1) relays being selected seemingly uniformly at random, and (2) significantly worse performance as a consequence.

selection-v-weight.pdf shows how often the 10 markov clients picked each relay. Focus on the scatter plots. Notice how in TF and FF scaled there is basically a 1:1 linear relationship between additional weight and selection frequency, while in FF init the selection frequency is roughly the same regardless of the relay's weight.

I am also attaching the three v3bw files, combined into one file to reduce email spam.

I am also attaching small snippits from the debug logs (again: combined into one file) of one of the markov clients. The snippits show some of the relay weights the client is using when deciding which relays to use. You can see in the FF initial one that the weights are much more similar than in FF scaled and TF.