Commit 09ea0377 authored by juga  's avatar juga
Browse files

Merge branch 'bug29149_squashed'

Solved conflicts:
	docs/source/specification.rst
parents 2a08a7ba 0fb01281
......@@ -19,7 +19,7 @@ Continue reading to install ``sbws`` in other ways.
System requirements
--------------------
- Tor
- Tor (last stable version is recommended)
- Python 3 (>= 3.5)
- virtualenv_ (while there is not ``stem`` release > 1.6.0, it is
recommended to install the required python dependencies in a virtualenv)
......
......@@ -19,7 +19,7 @@ of the Tor bandwidth authorities, to avoid creating unnecessary traffic.
**ADVICE**: It is recommended to read this documentation at
[Read the Docs](https://sbws.rtfd.io). In
[Github](http://github.com/torproject/sbws) some links won't be properly
[Github](https://github.com/torproject/sbws) some links won't be properly
rendered.
It can also be read after installing the Debian package ``sbws-doc`` in
``/usr/share/doc/sbws`` or after building it locally as explained in
......
@startuml
start
while (no SIGINT/SIGTERM?)
while (next relay to measure?)
:Select a destination;
:Select a second relay;
:Build a circuit;
:HTTP GET (Range-Bytes);
:Store measurement;
endwhile
endwhile
stop
@enduml
\ No newline at end of file
@startuml
start
if (relay to measure is exit?) then (yes)
:obtain non-exits;
else (no)
:obtain an exits
without bad flag
that can exit
to port 443;
endif
:potential second relays;
:obtain a relay
from potential
sencond relays
randomly;
if (second relay has 2x bandwidth?) then (yes)
elseif (other second relay has 1.5x bandwidth?) then (yes)
elseif (other second relay has 1x bandwidth?) then (yes)
else (nothing)
stop
endif
:second relay selected!;
:Build a circuit
whith exit as
second hop;
stop
@enduml
\ No newline at end of file
Bandwidth authorities in metrics
=================================
Current bandwidth authorities
-----------------------------
.. image:: images/bwauth.*
:alt: bandwidth authorities in metrics
https://metrics.torproject.org/rs.html
(flag:Authority)
Bandwidth Authorities - Measured Relays past 7 days
---------------------------------------------------
.. image:: images/bwauth_measured_7days.png
:alt: bandwidth measured in the past 7 days
https://consensus-health.torproject.org/graphs.html
Bandwidth Authorities - Measured Relays past 90 days
----------------------------------------------------
.. image:: images/bwauth_measured_90days.png
:alt: bandwidth measured in the past 90 days
https://consensus-health.torproject.org/graphs.html
Relays' bandwidth distribution
===================================
sbws raw measurements compared to Torflow measurements
------------------------------------------------------
.. image:: images/43710932-ac1eeea8-9960-11e8-9e7e-21fddff2f7a3.png
:alt: sbws and torflow raw measurements distribution
.. image:: images/43710933-ac95e0bc-9960-11e8-9aaf-0bb1f83b65e2.png
:alt: sbws and torflow raw measurements distribution 2
sbws linear scaling
--------------------
Multiply each relay bandwidth by ``7500/median``
See bandwidth_file_spec_ appendix B to know how about linear scaling.
Code: :func:`sbws.lib.v3bwfile.sbws_scale`
.. image:: images/20180901_163442.png
:alt: sbws linear scaling
sbws Torflow scaling
-----------------------
See bandwidth_file_spec_ appendix B to know how about torflow scaling.
Code: :func:`sbws.lib.v3bwfile.torflow_scale`
.. image:: images/20180901_164014.png
:alt: sbws torflow scaling
.. _bandwidth_file_spec: https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt
@startuml
class RelayList {
stem.Controller _controller
Lock _refresh_lock
int _last_refresh
list @p relays
list @p bad_exits
list @p exits
list @p non_exits
list @p authorities
bool _need_refresh()
_init_relays()
_refresh()
list _relays_with_flag(int flag)
list _relays_without_flag(int flag)
list exits_not_bad_can_exit_to_port(int port)
}
RelayList *-- Relay
class Relay {
stem.RouterStatusEntryV3 _from_ns
stem.RelayDescriptor _from_desc
str @p nickname
str @p fingerprint
list @p flags
ExitPolicy @p exit_policy
str @p address
str @p master_key_ed25519
int @p observed_bandwidth
int @p average_bandwidth
int @p burst_bandwidth
int @p consensus_bandwidth
int @p consensus_bandwidth_is_unmeasured
obj _from_ns(attr)
obj _from_desc(attr)
bool can_exit_to_port(int port)
bool is_exit_not_bad_allowing_port(int port)
}
class RelayPrioritizer {
int fresh_seconds
ResultDump result_dump
RelayList relay_list
bool measure_authorities
generator best_priority()
}
RelayPrioritizer *-- RelayList
RelayPrioritizer *-- ResultDump
Result ^-- ResultError
Result ^-- ResultSuccess
Result -- Destination
class Result {
Result.Relay _relay
list @p circ
str @p dest_url
str @p scanner
int @p time
str @p type
int @p version
str @p nickname
str @p fingerprint
str @p address
str @p master_key_ed25519
int @p relay_observed_bandwidth
int @p relay_average_bandwidth
int @p relay_burst_bandwidth
int @p consensus_bandwidth
int @p consensus_bandwidth_is_unmeasured
dict to_dict()
Result from_dict(dict d)
}
Result -- Relay
Result *-- Result.Relay
class Result.Relay {
str nickname
str fingerprint
str address
str master_key_ed25519
int observed_bandwidth
int average_bandwidth
int burst_bandwidth
int consensus_bandwidth
int consensus_bandwidth_is_unmeasured
}
class ResultError {
str @p msg
}
ResultError ^-- ResultErrorCircuit
class ResultErrorCircuit {
}
ResultError ^-- ResultErrorStream
class ResultSuccess {
list @p rtts
list @p downloads
}
ResultDump *-- Result
ResultDump -- Relay
class ResultDump {
dict data
int fresh_days
str datadir
Lock data_lock
Thread thread
Queue queue
store_result(Result result)
handle_result(Result result)
enter()
list results_for_relay(Relay relay)
}
class DestinationList {
list _rl
Destination next()
DestinationList @sm from_config(...)
}
DestinationList *-- Destination
class Destination {
str @p hostname
int @p port
str @p url
bool @p verify
bool is_usable()
Destination @sm from_config(str conf_section,int max_dl)
}
V3BWHeader -- Result
class V3BWHeader {
int timestamp
str version
str file_created
str latest_bandwidth
int num_lines
str software
str software_version
str generator_started
int number_eligible_relays
int minimum_number_eligible_relays
int number_consensus_relays
int percent_eligible_relays
int minimum_percent_eligible_relays
int @p num_lines
V3BWHeader @cm from_results(dict results)
add_stats(**kwargs)
int @sm earliest_bandwidth_from_results(dict results)
str @sm generator_started_from_file(dict results)
int @sm latest_bandwidth_from_results(dict results)
}
V3BWLine -- Result
class V3BWLine {
int bw
str node_id
str master_key_ed25519
str nick
int rtt
str time
int success
int error_stream
int error_circ
int error_misc
int bw_median
int bw_mean
int desc_bw_avg
int desc_bw_bur
int desc_bw_obs_last
int desc_bw_obs_mean
consensus_bandwidth
consensus_bandwidth_is_unmeasured
int @sm bw_mean_from_results(list results)
int @sm bw_median_from_results(list results)
int @sm desc_bw_obs_last_from_results(list results)
int @sm desc_bw_obs_mean_from_results(list results)
V3BWLine @cm from_results(list results)
str @sm last_time_from_results(list results)
dict @sm result_types_from_results(list results)
list @sm results_away_each_other(list results)
list @sm results_recent_than(list results)
}
V3BWFile *-- V3BWHeader
V3BWFile *-- V3BWLine
V3BWFile -- Result
class V3BWFile {
V3BWHeader header
list bw_lines
@p info_stats
bool @p is_min_perc
int @p max_bw
int @p mean_bw
int @p median_bw
int @p min_bw
int @p num
int @p sum_bw
V3BWFile @cm from_results(dict results, ...)
list @sm bw_kb(bw_lines)
list @sm bw_sbws_scale(bw_lines)
list @sm bw_torflow_scale(bw_lines)
bool @sm is_max_bw_diff_perc_reached(bw_lines)
(dict, bool) @sm measured_progress_stats(bw_lines)
int @sm read_number_consensus_relays(str consensus_path)
(list, list, list) to_plt()
list update_progress(bw_lines, ...)
warn_if_not_accurate_enough(bw_lines, ...)
tuple to_plt(...)
write(str output)
}
CircuitBuilder *-- RelayList
CircuitBuilder -- Relay
class CircuitBuilder {
set built_circuits
RelayList relay_list
list relays
Controller controller
int build_circuit()
void close_circuit()
}
CircuitBuilder ^-- GapsCircuitBuilder
class State {
get()
}
@enduml
\ No newline at end of file
Code design
=================
.. todo::
- Link to refactor proposal.
- Change this page when refactoring is implemented.
UML classes diagram
--------------------
.. image:: images/classes_original.*
:alt: UML classes diagram
`classes_original.svg <./_images/classes_original.svg>`_
Packages diagram
-----------------
.. image:: ./images/packages_sbws.*
:alt: packages diagram
`packages_sbws.svg <./_images/packages_sbws.svg>`_
scanner threads
----------------
- `TorEventListener`: the thread that runs Tor and listens for events.
- ResultDump: the thread that get the measurement results from a queue
every second.
- `multiprocessing.ThreadPool` starts 3 independent threads:
- workers_thread
- tasks_thread
- results_thread
- measurement threads: they execute :func:`sbws.core.scanner.measure_relay`
There'll be a maximum of 3 by default.
.. image:: images/threads.*
:alt: scanner threads
Critical sections
-----------------
Data types that are read or wrote from the threads.
.. image:: images/critical_sections.*
:alt: scanner critical sections
:height: 400px
:align: center
Call graph
--------------
Initialization calls to the moment where the measurement threads start.
.. image:: images/pycallgraph.png
:alt: call graph
:height: 400px
:align: center
`callgraph.png <./_images/pycallgraph.png>`_
......@@ -49,7 +49,8 @@ extensions = [
'sphinx.ext.coverage',
'sphinx.ext.githubpages',
'sphinx.ext.imgmath',
'sphinx.ext.intersphinx'
'sphinx.ext.intersphinx',
'sphinx.ext.viewcode'
]
# Add any paths that contain templates here, relative to this directory.
......@@ -199,3 +200,5 @@ todo_include_todos = True
source_parsers = {
'.md': 'recommonmark.parser.CommonMarkParser',
}
numfig = True
.. _config_internal:
How sbws configuration works internally
----------------------------------------
Internal code configuration files
==================================
Sbws has two default config files it reads: on general, and one specific to
logging.
They all get combined internally to the same ``conf`` structure.
......@@ -34,8 +34,8 @@ The user example config file provided by ``sbws`` might look like this.
.. _default-config:
Default Config
--------------
Default Configuration
----------------------
.. literalinclude:: config.default.ini
:caption: config.default.ini
......
.. _config_tor:
sbws scanner tor configuration
-------------------------------
Internal Tor configuration for the scanner
------------------------------------------
At the time of writing, sbws sets the following torrc options for the following
reasons when it launches Tor. You can find them in ``sbws/globals.py`` and
``sbws/util/stem.py``.
The scanner needs an specific Tor configuration.
The following options are either set when launching Tor or required when
connection to an existing Tor daemon.
Default configuration:
- ``SocksPort auto``: To proxy requests over Tor.
- ``CookieAuthentication 1``: The easiest way to authenticate to Tor.
- ``LearnCircuitBuildTimeout 0``: To keep circuit build timeouts static.
- ``CircuitBuildTimeout 10``: To give up on struggling circuits sooner.
- ``UseEntryGuards 0``: To avoid path bias warnings.
- ``DataDirectory ...``: To set Tor's datadirectory to be inside sbws's.
- ``PidFile ...``: To make it easier to tell if Tor is running.
- ``ControlSocket ...``: To control Tor.
- ``Log notice ...``: To know what the heck is going on.
- ``UseMicrodescriptors 0``: Because full server descriptors are needed.
- ``SafeLogging 0``: Useful for logging, since there's no need for anonymity.
- ``LogTimeGranularity 1``
- ``ProtocolWarnings 1``
- ``LearnCircuitBuildTimeout 0``: To keep circuit build timeouts static.
Configuration that depends on the user configuration file:
- ``CircuitBuildTimeout ...``: The timeout trying to build a circuit.
- ``DataDirectory ...``: The Tor data directory path.
- ``PidFile ...``: The Tor PID file path.
- ``ControlSocket ...``: The Tor control socket path.
- ``Log notice ...``: The Tor log level and path.
Configuration that needs to be set on runtime:
- ``__DisablePredictedCircuits 1``: To build custom circuits.
- ``__LeaveStreamsUnattached 1``
Currently most of the code that sets this configuration is in :func:`sbws.util.stem.launch_tor`
and the default configuration is ``sbws/globals.py``.
.. note:: the location of these code is being refactored.
\ No newline at end of file
UML diagrams
=============
Class Diagram
--------------------
.. image:: ./images/classes_sbws.*
`classes_sbws.svg <./_images/classes_sbws.svg>`_
Packages diagram
-----------------
.. image:: ./images/packages_sbws.*
`packages_sbws.svg <./_images/packages_sbws.svg>`_
\ No newline at end of file
.. _documenting:
Installing documentation dependendencies and building it
---------------------------------------------------------
Installing and building the documentation
-----------------------------------------
To build the documentation, extra Python dependencies are needed:
......
What the scanner and the generator do
======================================
Running the scanner
-----------------------
Overview
~~~~~~~~~
The :term:`scanner` obtain a list of relays from the Tor network.
It measures the bandwidth of each relay by creating a two hop circuit with the
relay to measure and download data from a :term:`destination` Web Server.
The :term:`generator` creates a :term:`bandwidth list file` that is read
by a :term:`directory authority` and used to report relays' bandwidth in its
vote.
.. image:: ./images/scanner.svg
:height: 200px
:align: center
Intialization
~~~~~~~~~~~~~~
.. At some point it should be able to get environment variables
#. Parse the command line arguments and configuration files.
#. Launch a Tor thread with an specific configuration or connect to a running
Tor daemon that is running with a suitable configuration.
#. Obtain the list of relays in the Tor network from the Tor consensus and
descriptor documents.
#. Read and parse the old bandwidth measurements stored in the file system.
#. Select a subset of the relays to be measured next, ordered by:
#. relays not measured.
#. measurements age.
.. image:: ./images/use_cases_data_sources.svg
:alt: data sources
:height: 200px
:align: center
Classes used in the initialization:
.. image:: ./images/use_cases_classes.svg
:alt: classes initializing data
:height: 300px
:align: center
Source code: :func:`sbws.core.scanner.run_speedtest`
Measuring relays
~~~~~~~~~~~~~~~~~
#. For every relay:
#. Select a second relay to build a Tor circuit.
#. Build the circuit.
#. Make HTTPS GET requests to the Web server over the circuit.
#. Store the time the request took and the amount of bytes requested.
.. image:: ./images/activity_all.svg
:alt: activity measuring relays
:height: 300px
:align: center
Source code: :func:`sbws.core.scanner.measure_relay`
Selecting a second relay
~~~~~~~~~~~~~~~~~~~~~~~~
#. If the relay to measure is an exit, use it as an exit and obtain the
non-exits.
#. If the relay to measure is not an exit, use it as first hop and obtain
the exits.
#. From non-exits or exits, select one randomly from the ones that have
double consensus bandwidth than the relay to measure.
#. If there are no relays that satisfy this, lower the required bandwidth.
.. image:: ./images/activity_second_relay.svg
:alt: activity select second relay
:height: 400px
:align: center
Source code: :func:`sbws.core.scanner.measure_relay`
Selecting the data to download
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#. While the downloaded data is smaller than 1GB or the number of download
is minor than 5:
#. Randomly, select a 16MiB range.
#. If it takes less than 5 seconds, select a bigger range and don't keep any
information.
#. If it takes more than 10 seconds, select an smaller range and don't keep any
information.
#. Store the number of bytes downloaded and the time it took.
Source code: :func:`sbws.core.scanner._should_keep_result`
Writing the measurements to the filesystem
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~