sbws bandwidth scanner presentation
See also juga's slides: https://juga0.github.io/tor_hackweek_bandwidth_slides/
Why bandwidth scanners?
Relays can lie about their bandwidths.
How sbws works
sbws runs these threads:
- main thread
- Tor event listener (stem)
- ResultDump (stores measurements to result files)
- standard Python threads
- scanner threads: target of 3 threads, to measure 3 relays at a time
Critical sections for threads:
- refresh: relay list
- relay priority: relay list
- measure relay: relay list, etc.
sbws gets a list of relays from the consensus, and scans those relays. It updates ever few minutes.
Building two-hop paths from scanner to web server via an entry and exit.
- select a target relay
- select the other half of the 2-hop path (exit for entry, or entry for exit)
- choose a faster relay than the target
- exits must exit to port 443, and not be a bad exit, otherwise they are used as entries
Measure the speed
- find the right file size to get a reasonable measurement (16 MB - 1 GB)
- measure and store results
The results are stored as lines of JSON.
Every hour, the scanner generates a results file according to the Tor bandwidth file spec.
The results are filtered:
- ignore older than 5 days
- ignore relays with fewer than 2 measurements
- ignore relays where the first and last measurements are less than 24 hours apart
Scale the relay's self-reported bandwidth by the measured bandwidth.
See the bandwidth file spec: "Torflow Scaling".
Format the bandwidth file
- optional metadata
- one relay per line: id and bandwidth, and other keys
It takes 24 hours to scan the entire network.
How many measurements should we have for a relay before we vote for it?
Against: one result can be inaccurate, we don't want to load lots of clients on a new relay
For: it takes a long time to measure a new relay, and relay operators are disappointed
Proposal: vote for all relays, but cap early measurements (and cap few measurements?)
Proposal: start with a file size that depends on the relay bandwidth
Proposal: stop the download when you have learned enough, or the file takes too long
What is the minimum number of relays in a bandwidth file?
Against: A network with one measured relay is a sad network
For: A network with no bandwidth votes is a sad network
Proposal: you must be running for at least 24 hours before you publish
Proposal: try to keep a result for every download, even if it was too fast or too slow
Proposal: speed up relay measurements by reducing retries
Proposal: increase the number of threads, based on the available bandwidth
Proposal: deploy sbws on every directory authority
What diasgnostic information do we need for failed relays?
- list categories of failures in relay bandwidth lines
- votes contain bandwidth file headers and hash
- DirPort URL for downloading the current bandwidth file
- Proposal: a tool to analyse OnionPerf logs and Bandwidth files to tell relay operators what is wrong with their relay
How much bandwidth?
- 100 Mbps peak, scaling compensates for higher-bandwidth relays
- One scanner per directory authority, multiple servers per scanner
How can we?
- Hiding scanning from relays
- Make sure exit and non-exit bandwidths are equivalent, because they're measured differently
- Remove reliance on self-reported relay bandwidths
- That's hard, because we measure residual bandwidth, but we want to know overall capacity