Alexander Færøy · 53a16e3a
--- a/org/meetings/2019BrusselsNetworkTeam/Notes/SBWSRoadmap.md
+++ b/org/meetings/2019BrusselsNetworkTeam/Notes/SBWSRoadmap.md
+= sbws bandwidth scanner presentation =
+
+See also juga's slides: https://juga0.github.io/tor_hackweek_bandwidth_slides/
+
+== Why bandwidth scanners? ==
+
+Relays can lie about their bandwidths.
+
+== How sbws works ==
+
+=== Threads ===
+
+sbws runs these threads:
+* main thread
+* Tor event listener (stem)
+* ResultDump (stores measurements to result files)
+* standard Python threads
+* scanner threads: target of 3 threads, to measure 3 relays at a time
+
+Critical sections for threads:
+* refresh: relay list
+* relay priority: relay list
+* measure relay: relay list, etc.
+
+=== Measurement ===
+
+sbws gets a list of relays from the consensus, and scans those relays.
+It updates ever few minutes.
+
+Building two-hop paths from scanner to web server via an entry and exit.
+* select a target relay
+* select the other half of the 2-hop path (exit for entry, or entry for exit)
+  * choose a faster relay than the target
+* exits must exit to port 443, and not be a bad exit, otherwise they are used as entries
+
+Measure the speed
+* find the right file size to get a reasonable measurement (16 MB - 1 GB)
+* measure and store results
+
+The results are stored as lines of JSON.
+
+=== Generate ===
+
+Every hour, the scanner generates a results file according to the Tor bandwidth file spec.
+
+The results are filtered:
+* ignore older than 5 days
+* ignore relays with fewer than 2 measurements
+* ignore relays where the first and last measurements are less than 24 hours apart
+
+==== Scaling ====
+
+Scale the relay's self-reported bandwidth by the measured bandwidth.
+
+See the bandwidth file spec: "Torflow Scaling".
+
+==== Format the bandwidth file ====
+
+Header:
+* Timestamp
+* optional metadata
+
+Results:
+* one relay per line: id and bandwidth, and other keys
+
+== Questions ==
+
+It takes 24 hours to scan the entire network.
+
+How many measurements should we have for a relay before we vote for it?
+* Against: one result can be inaccurate, we don't want to load lots of clients on a new relay
+* For: it takes a long time to measure a new relay, and relay operators are disappointed
+
+* Proposal: vote for all relays, but cap early measurements (and cap few measurements?)
+* Proposal: start with a file size that depends on the relay bandwidth
+* Proposal: stop the download when you have learned enough, or the file takes too long
+
+What is the minimum number of relays in a bandwidth file?
+* Against: A network with one measured relay is a sad network
+* For: A network with no bandwidth votes is a sad network
+
+* Proposal: you must be running for at least 24 hours before you publish
+
+* Proposal: try to keep a result for every download, even if it was too fast or too slow
+* Proposal: speed up relay measurements by reducing retries
+* Proposal: increase the number of threads, based on the available bandwidth
+* Proposal: deploy sbws on every directory authority
+
+What diasgnostic information do we need for failed relays?
+* list categories of failures in relay bandwidth lines
+* votes contain bandwidth file headers and hash
+* DirPort URL for downloading the current bandwidth file
+* Proposal: a tool to analyse OnionPerf logs and Bandwidth files to tell relay operators what is wrong with their relay
+
+How much bandwidth?
+* 100 Mbps peak, scaling compensates for higher-bandwidth relays
+* One scanner per directory authority, multiple servers per scanner
+
+How can we?
+* Hiding scanning from relays
+* Make sure exit and non-exit bandwidths are equivalent, because they're measured differently
+* Remove reliance on self-reported relay bandwidths
+  * That's hard, because we measure residual bandwidth, but we want to know overall capacity