Commit 09ea0377 authored by juga's avatar juga
Browse files

Merge branch 'bug29149_squashed'

Solved conflicts:
parents 2a08a7ba 0fb01281
......@@ -19,7 +19,7 @@ Continue reading to install ``sbws`` in other ways.
System requirements
- Tor
- Tor (last stable version is recommended)
- Python 3 (>= 3.5)
- virtualenv_ (while there is not ``stem`` release > 1.6.0, it is
recommended to install the required python dependencies in a virtualenv)
......@@ -19,7 +19,7 @@ of the Tor bandwidth authorities, to avoid creating unnecessary traffic.
**ADVICE**: It is recommended to read this documentation at
[Read the Docs]( In
[Github]( some links won't be properly
[Github]( some links won't be properly
It can also be read after installing the Debian package ``sbws-doc`` in
``/usr/share/doc/sbws`` or after building it locally as explained in
while (no SIGINT/SIGTERM?)
while (next relay to measure?)
:Select a destination;
:Select a second relay;
:Build a circuit;
:HTTP GET (Range-Bytes);
:Store measurement;
\ No newline at end of file
if (relay to measure is exit?) then (yes)
:obtain non-exits;
else (no)
:obtain an exits
without bad flag
that can exit
to port 443;
:potential second relays;
:obtain a relay
from potential
sencond relays
if (second relay has 2x bandwidth?) then (yes)
elseif (other second relay has 1.5x bandwidth?) then (yes)
elseif (other second relay has 1x bandwidth?) then (yes)
else (nothing)
:second relay selected!;
:Build a circuit
whith exit as
second hop;
\ No newline at end of file
Bandwidth authorities in metrics
Current bandwidth authorities
.. image:: images/bwauth.*
:alt: bandwidth authorities in metrics
Bandwidth Authorities - Measured Relays past 7 days
.. image:: images/bwauth_measured_7days.png
:alt: bandwidth measured in the past 7 days
Bandwidth Authorities - Measured Relays past 90 days
.. image:: images/bwauth_measured_90days.png
:alt: bandwidth measured in the past 90 days
Relays' bandwidth distribution
sbws raw measurements compared to Torflow measurements
.. image:: images/43710932-ac1eeea8-9960-11e8-9e7e-21fddff2f7a3.png
:alt: sbws and torflow raw measurements distribution
.. image:: images/43710933-ac95e0bc-9960-11e8-9aaf-0bb1f83b65e2.png
:alt: sbws and torflow raw measurements distribution 2
sbws linear scaling
Multiply each relay bandwidth by ``7500/median``
See bandwidth_file_spec_ appendix B to know how about linear scaling.
Code: :func:`sbws.lib.v3bwfile.sbws_scale`
.. image:: images/20180901_163442.png
:alt: sbws linear scaling
sbws Torflow scaling
See bandwidth_file_spec_ appendix B to know how about torflow scaling.
Code: :func:`sbws.lib.v3bwfile.torflow_scale`
.. image:: images/20180901_164014.png
:alt: sbws torflow scaling
.. _bandwidth_file_spec:
class RelayList {
stem.Controller _controller
Lock _refresh_lock
int _last_refresh
list @p relays
list @p bad_exits
list @p exits
list @p non_exits
list @p authorities
bool _need_refresh()
list _relays_with_flag(int flag)
list _relays_without_flag(int flag)
list exits_not_bad_can_exit_to_port(int port)
RelayList *-- Relay
class Relay {
stem.RouterStatusEntryV3 _from_ns
stem.RelayDescriptor _from_desc
str @p nickname
str @p fingerprint
list @p flags
ExitPolicy @p exit_policy
str @p address
str @p master_key_ed25519
int @p observed_bandwidth
int @p average_bandwidth
int @p burst_bandwidth
int @p consensus_bandwidth
int @p consensus_bandwidth_is_unmeasured
obj _from_ns(attr)
obj _from_desc(attr)
bool can_exit_to_port(int port)
bool is_exit_not_bad_allowing_port(int port)
class RelayPrioritizer {
int fresh_seconds
ResultDump result_dump
RelayList relay_list
bool measure_authorities
generator best_priority()
RelayPrioritizer *-- RelayList
RelayPrioritizer *-- ResultDump
Result ^-- ResultError
Result ^-- ResultSuccess
Result -- Destination
class Result {
Result.Relay _relay
list @p circ
str @p dest_url
str @p scanner
int @p time
str @p type
int @p version
str @p nickname
str @p fingerprint
str @p address
str @p master_key_ed25519
int @p relay_observed_bandwidth
int @p relay_average_bandwidth
int @p relay_burst_bandwidth
int @p consensus_bandwidth
int @p consensus_bandwidth_is_unmeasured
dict to_dict()
Result from_dict(dict d)
Result -- Relay
Result *-- Result.Relay
class Result.Relay {
str nickname
str fingerprint
str address
str master_key_ed25519
int observed_bandwidth
int average_bandwidth
int burst_bandwidth
int consensus_bandwidth
int consensus_bandwidth_is_unmeasured
class ResultError {
str @p msg
ResultError ^-- ResultErrorCircuit
class ResultErrorCircuit {
ResultError ^-- ResultErrorStream
class ResultSuccess {
list @p rtts
list @p downloads
ResultDump *-- Result
ResultDump -- Relay
class ResultDump {
dict data
int fresh_days
str datadir
Lock data_lock
Thread thread
Queue queue
store_result(Result result)
handle_result(Result result)
list results_for_relay(Relay relay)
class DestinationList {
list _rl
Destination next()
DestinationList @sm from_config(...)
DestinationList *-- Destination
class Destination {
str @p hostname
int @p port
str @p url
bool @p verify
bool is_usable()
Destination @sm from_config(str conf_section,int max_dl)
V3BWHeader -- Result
class V3BWHeader {
int timestamp
str version
str file_created
str latest_bandwidth
int num_lines
str software
str software_version
str generator_started
int number_eligible_relays
int minimum_number_eligible_relays
int number_consensus_relays
int percent_eligible_relays
int minimum_percent_eligible_relays
int @p num_lines
V3BWHeader @cm from_results(dict results)
int @sm earliest_bandwidth_from_results(dict results)
str @sm generator_started_from_file(dict results)
int @sm latest_bandwidth_from_results(dict results)
V3BWLine -- Result
class V3BWLine {
int bw
str node_id
str master_key_ed25519
str nick
int rtt
str time
int success
int error_stream
int error_circ
int error_misc
int bw_median
int bw_mean
int desc_bw_avg
int desc_bw_bur
int desc_bw_obs_last
int desc_bw_obs_mean
int @sm bw_mean_from_results(list results)
int @sm bw_median_from_results(list results)
int @sm desc_bw_obs_last_from_results(list results)
int @sm desc_bw_obs_mean_from_results(list results)
V3BWLine @cm from_results(list results)
str @sm last_time_from_results(list results)
dict @sm result_types_from_results(list results)
list @sm results_away_each_other(list results)
list @sm results_recent_than(list results)
V3BWFile *-- V3BWHeader
V3BWFile *-- V3BWLine
V3BWFile -- Result
class V3BWFile {
V3BWHeader header
list bw_lines
@p info_stats
bool @p is_min_perc
int @p max_bw
int @p mean_bw
int @p median_bw
int @p min_bw
int @p num
int @p sum_bw
V3BWFile @cm from_results(dict results, ...)
list @sm bw_kb(bw_lines)
list @sm bw_sbws_scale(bw_lines)
list @sm bw_torflow_scale(bw_lines)
bool @sm is_max_bw_diff_perc_reached(bw_lines)
(dict, bool) @sm measured_progress_stats(bw_lines)
int @sm read_number_consensus_relays(str consensus_path)
(list, list, list) to_plt()
list update_progress(bw_lines, ...)
warn_if_not_accurate_enough(bw_lines, ...)
tuple to_plt(...)
write(str output)
CircuitBuilder *-- RelayList
CircuitBuilder -- Relay
class CircuitBuilder {
set built_circuits
RelayList relay_list
list relays
Controller controller
int build_circuit()
void close_circuit()
CircuitBuilder ^-- GapsCircuitBuilder
class State {
\ No newline at end of file
Code design
.. todo::
- Link to refactor proposal.
- Change this page when refactoring is implemented.
UML classes diagram
.. image:: images/classes_original.*
:alt: UML classes diagram
`classes_original.svg <./_images/classes_original.svg>`_
Packages diagram
.. image:: ./images/packages_sbws.*
:alt: packages diagram
`packages_sbws.svg <./_images/packages_sbws.svg>`_
scanner threads
- `TorEventListener`: the thread that runs Tor and listens for events.
- ResultDump: the thread that get the measurement results from a queue
every second.
- `multiprocessing.ThreadPool` starts 3 independent threads:
- workers_thread
- tasks_thread
- results_thread
- measurement threads: they execute :func:`sbws.core.scanner.measure_relay`
There'll be a maximum of 3 by default.
.. image:: images/threads.*
:alt: scanner threads
Critical sections
Data types that are read or wrote from the threads.
.. image:: images/critical_sections.*
:alt: scanner critical sections
:height: 400px
:align: center
Call graph
Initialization calls to the moment where the measurement threads start.
.. image:: images/pycallgraph.png
:alt: call graph
:height: 400px
:align: center
`callgraph.png <./_images/pycallgraph.png>`_
......@@ -49,7 +49,8 @@ extensions = [
# Add any paths that contain templates here, relative to this directory.
......@@ -199,3 +200,5 @@ todo_include_todos = True
source_parsers = {
'.md': 'recommonmark.parser.CommonMarkParser',
numfig = True
.. _config_internal:
How sbws configuration works internally
Internal code configuration files
Sbws has two default config files it reads: on general, and one specific to
They all get combined internally to the same ``conf`` structure.
......@@ -34,8 +34,8 @@ The user example config file provided by ``sbws`` might look like this.
.. _default-config:
Default Config
Default Configuration
.. literalinclude:: config.default.ini
:caption: config.default.ini
.. _config_tor:
sbws scanner tor configuration
Internal Tor configuration for the scanner
At the time of writing, sbws sets the following torrc options for the following
reasons when it launches Tor. You can find them in ``sbws/`` and
The scanner needs an specific Tor configuration.
The following options are either set when launching Tor or required when
connection to an existing Tor daemon.
Default configuration:
- ``SocksPort auto``: To proxy requests over Tor.
- ``CookieAuthentication 1``: The easiest way to authenticate to Tor.
- ``LearnCircuitBuildTimeout 0``: To keep circuit build timeouts static.
- ``CircuitBuildTimeout 10``: To give up on struggling circuits sooner.
- ``UseEntryGuards 0``: To avoid path bias warnings.
- ``DataDirectory ...``: To set Tor's datadirectory to be inside sbws's.
- ``PidFile ...``: To make it easier to tell if Tor is running.
- ``ControlSocket ...``: To control Tor.
- ``Log notice ...``: To know what the heck is going on.
- ``UseMicrodescriptors 0``: Because full server descriptors are needed.
- ``SafeLogging 0``: Useful for logging, since there's no need for anonymity.
- ``LogTimeGranularity 1``
- ``ProtocolWarnings 1``
- ``LearnCircuitBuildTimeout 0``: To keep circuit build timeouts static.
Configuration that depends on the user configuration file:
- ``CircuitBuildTimeout ...``: The timeout trying to build a circuit.
- ``DataDirectory ...``: The Tor data directory path.
- ``PidFile ...``: The Tor PID file path.
- ``ControlSocket ...``: The Tor control socket path.
- ``Log notice ...``: The Tor log level and path.
Configuration that needs to be set on runtime:
- ``__DisablePredictedCircuits 1``: To build custom circuits.
- ``__LeaveStreamsUnattached 1``
Currently most of the code that sets this configuration is in :func:`sbws.util.stem.launch_tor`
and the default configuration is ``sbws/``.
.. note:: the location of these code is being refactored.
\ No newline at end of file
UML diagrams
Class Diagram
.. image:: ./images/classes_sbws.*
`classes_sbws.svg <./_images/classes_sbws.svg>`_
Packages diagram
.. image:: ./images/packages_sbws.*
`packages_sbws.svg <./_images/packages_sbws.svg>`_
\ No newline at end of file
.. _documenting:
Installing documentation dependendencies and building it
Installing and building the documentation
To build the documentation, extra Python dependencies are needed:
What the scanner and the generator do
Running the scanner
The :term:`scanner` obtain a list of relays from the Tor network.
It measures the bandwidth of each relay by creating a two hop circuit with the
relay to measure and download data from a :term:`destination` Web Server.
The :term:`generator` creates a :term:`bandwidth list file` that is read
by a :term:`directory authority` and used to report relays' bandwidth in its
.. image:: ./images/scanner.svg
:height: 200px
:align: center
.. At some point it should be able to get environment variables
#. Parse the command line arguments and configuration files.
#. Launch a Tor thread with an specific configuration or connect to a running
Tor daemon that is running with a suitable configuration.
#. Obtain the list of relays in the Tor network from the Tor consensus and
descriptor documents.
#. Read and parse the old bandwidth measurements stored in the file system.
#. Select a subset of the relays to be measured next, ordered by:
#. relays not measured.
#. measurements age.
.. image:: ./images/use_cases_data_sources.svg
:alt: data sources
:height: 200px
:align: center
Classes used in the initialization:
.. image:: ./images/use_cases_classes.svg
:alt: classes initializing data
:height: 300px
:align: center
Source code: :func:`sbws.core.scanner.run_speedtest`
Measuring relays
#. For every relay:
#. Select a second relay to build a Tor circuit.
#. Build the circuit.
#. Make HTTPS GET requests to the Web server over the circuit.
#. Store the time the request took and the amount of bytes requested.
.. image:: ./images/activity_all.svg
:alt: activity measuring relays
:height: 300px
:align: center
Source code: :func:`sbws.core.scanner.measure_relay`
Selecting a second relay
#. If the relay to measure is an exit, use it as an exit and obtain the
#. If the relay to measure is not an exit, use it as first hop and obtain
the exits.
#. From non-exits or exits, select one randomly from the ones that have
double consensus bandwidth than the relay to measure.
#. If there are no relays that satisfy this, lower the required bandwidth.
.. image:: ./images/activity_second_relay.svg
:alt: activity select second relay
:height: 400px
:align: center
Source code: :func:`sbws.core.scanner.measure_relay`
Selecting the data to download
#. While the downloaded data is smaller than 1GB or the number of download
is minor than 5:
#. Randomly, select a 16MiB range.
#. If it takes less than 5 seconds, select a bigger range and don't keep any
#. If it takes more than 10 seconds, select an smaller range and don't keep any
#. Store the number of bytes downloaded and the time it took.
Source code: :func:`sbws.core.scanner._should_keep_result`