Reproduce Rob's FlashFlood experiment
Shadow was unable to reproduce the 95th percentile perf degradation for FlashFlood (seen in tpo/metrics/analysis#33076 (comment 2569011)).
I suspect this is due to a Tor bug in something that Shadow is emulating rather than simulating. DNS is in this category. It could be DNS timeouts due to overload.
We should get some fast machines and reproduce the experiment on Live, with more frequent onionperf measurement, so we can get more datapoints of which relays were involved in the 95th percentile slowdown.
Rob has released code.
Python script to run the experiment: https://gist.github.com/robgjansen/ebd7f8ba019dbef2af4877122281cf3b
Notes from Rob:
- The git log in the Tor branch has some details.
- Several important items are hard-coded in the python "speedtester" script, e.g., the 2nd hop test relay fingerprints and the path a latest consensus file.
- The 2nd hop test relays should set MaxAdvertisedBandwidth to the minimum allowed value so they are reserved as much as possible for the speed testing.
Rob also gave me the Tor Safety Board submission. I will attach that. He also gave me result files, which I will attach to https://gitlab.torproject.org/tpo/metrics/analysis/-/issues/3307 in case we can still figure out the issue from the last run's data.