|
|
= sbws bandwidth scanner presentation =
|
|
|
|
|
|
See also juga's slides: https://juga0.github.io/tor_hackweek_bandwidth_slides/
|
|
|
|
|
|
== Why bandwidth scanners? ==
|
|
|
|
|
|
Relays can lie about their bandwidths.
|
|
|
|
|
|
== How sbws works ==
|
|
|
|
|
|
=== Threads ===
|
|
|
|
|
|
sbws runs these threads:
|
|
|
* main thread
|
|
|
* Tor event listener (stem)
|
|
|
* ResultDump (stores measurements to result files)
|
|
|
* standard Python threads
|
|
|
* scanner threads: target of 3 threads, to measure 3 relays at a time
|
|
|
|
|
|
Critical sections for threads:
|
|
|
* refresh: relay list
|
|
|
* relay priority: relay list
|
|
|
* measure relay: relay list, etc.
|
|
|
|
|
|
=== Measurement ===
|
|
|
|
|
|
sbws gets a list of relays from the consensus, and scans those relays.
|
|
|
It updates ever few minutes.
|
|
|
|
|
|
Building two-hop paths from scanner to web server via an entry and exit.
|
|
|
* select a target relay
|
|
|
* select the other half of the 2-hop path (exit for entry, or entry for exit)
|
|
|
* choose a faster relay than the target
|
|
|
* exits must exit to port 443, and not be a bad exit, otherwise they are used as entries
|
|
|
|
|
|
Measure the speed
|
|
|
* find the right file size to get a reasonable measurement (16 MB - 1 GB)
|
|
|
* measure and store results
|
|
|
|
|
|
The results are stored as lines of JSON.
|
|
|
|
|
|
=== Generate ===
|
|
|
|
|
|
Every hour, the scanner generates a results file according to the Tor bandwidth file spec.
|
|
|
|
|
|
The results are filtered:
|
|
|
* ignore older than 5 days
|
|
|
* ignore relays with fewer than 2 measurements
|
|
|
* ignore relays where the first and last measurements are less than 24 hours apart
|
|
|
|
|
|
==== Scaling ====
|
|
|
|
|
|
Scale the relay's self-reported bandwidth by the measured bandwidth.
|
|
|
|
|
|
See the bandwidth file spec: "Torflow Scaling".
|
|
|
|
|
|
==== Format the bandwidth file ====
|
|
|
|
|
|
Header:
|
|
|
* Timestamp
|
|
|
* optional metadata
|
|
|
|
|
|
Results:
|
|
|
* one relay per line: id and bandwidth, and other keys
|
|
|
|
|
|
== Questions ==
|
|
|
|
|
|
It takes 24 hours to scan the entire network.
|
|
|
|
|
|
How many measurements should we have for a relay before we vote for it?
|
|
|
* Against: one result can be inaccurate, we don't want to load lots of clients on a new relay
|
|
|
* For: it takes a long time to measure a new relay, and relay operators are disappointed
|
|
|
|
|
|
* Proposal: vote for all relays, but cap early measurements (and cap few measurements?)
|
|
|
* Proposal: start with a file size that depends on the relay bandwidth
|
|
|
* Proposal: stop the download when you have learned enough, or the file takes too long
|
|
|
|
|
|
What is the minimum number of relays in a bandwidth file?
|
|
|
* Against: A network with one measured relay is a sad network
|
|
|
* For: A network with no bandwidth votes is a sad network
|
|
|
|
|
|
* Proposal: you must be running for at least 24 hours before you publish
|
|
|
|
|
|
* Proposal: try to keep a result for every download, even if it was too fast or too slow
|
|
|
* Proposal: speed up relay measurements by reducing retries
|
|
|
* Proposal: increase the number of threads, based on the available bandwidth
|
|
|
* Proposal: deploy sbws on every directory authority
|
|
|
|
|
|
What diasgnostic information do we need for failed relays?
|
|
|
* list categories of failures in relay bandwidth lines
|
|
|
* votes contain bandwidth file headers and hash
|
|
|
* DirPort URL for downloading the current bandwidth file
|
|
|
* Proposal: a tool to analyse OnionPerf logs and Bandwidth files to tell relay operators what is wrong with their relay
|
|
|
|
|
|
How much bandwidth?
|
|
|
* 100 Mbps peak, scaling compensates for higher-bandwidth relays
|
|
|
* One scanner per directory authority, multiple servers per scanner
|
|
|
|
|
|
How can we?
|
|
|
* Hiding scanning from relays
|
|
|
* Make sure exit and non-exit bandwidths are equivalent, because they're measured differently
|
|
|
* Remove reliance on self-reported relay bandwidths
|
|
|
* That's hard, because we measure residual bandwidth, but we want to know overall capacity |