= How to Run and Troubleshoot a Bandwidth-Measuring Directory Authority =
# So You Want to Fix the Tor Network
# - or -
# How to Run and Troubleshoot a Bandwidth-Measuring Directory Authority
== How Do Bandwidth Authorities Work? ==
## How Do Bandwidth Authorities Work?
Bandwidth authorities measure relay capacity. Then they send their results to a directory authority, and the directory authority puts the results in its vote. The directory authority votes change the consensus weights of relays.
If your bandwidth authority isn't used for voting or testing, it's just wasting bandwidth.
== Notes ==
## Notes
These instructions are as of commit e268151aaa1436a8ce2d4959d1a48e69368dbf3d but probably apply anyway.
== Setup ==
## Setup
Check out [https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.BwAuthorities the readme] for setup instructions. On an Ubuntu 16.04 machine, setup.sh worked quite well. (Minor hiccups were encountered due to missing packages, and the recovery from those was only moderately confusing - it amounted to rm-ing directories and just starting the script over again.)
Check out [the readme](https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.BwAuthorities) for setup instructions. On an Ubuntu 16.04 machine, setup.sh worked quite well. (Minor hiccups were encountered due to missing packages, and the recovery from those was only moderately confusing - it amounted to rm-ing directories and just starting the script over again.)
On *BSD or macOS, setup.sh probably won't work at all.
== Configuring and Running ==
## Configuring and Running
Are you using your own data file server, or the default? It would be better to run your own. To do this you'll have to set up a server and then edit this line in bwauthority_child.py:
This list of files gets very big. I manually tar and compress them once a month, I don't have a script to do that yet.
== Sanity Checking ==
## Sanity Checking
Watch the output of 'data/aggregate-debug.log' - you should see the percentages creep upwards over time, and when you hit 60% you'll start producing a file.
So you've got bandwidth values, but how do you know if they're accurate?
You can check your top 25 relays and see if they come close to [https://atlas.torproject.org/#top10 what Atlas has].
You can check your top 25 relays and see if they come close to [what Atlas has](https://atlas.torproject.org/#top10).
analyze_bwauth_thing() { echo $1 $2 `join $1 $2 | cut -d " " -f 2- | sort -n -r | head -n 200 | python -c 'import sys; d=lambda l : (abs(l[0]-l[1]) / ((l[0]+l[1])/2))*100; lines = [l.split(" ") for l in sys.stdin.readlines()]; lines = [(float(l[0]), float(l[1])) for l in lines]; print "\n".join([str(d(l)) for l in lines]);' | awk '{a+=$1} END{print a/NR}'`}
\ls *.data | python -c 'import sys; import itertools; fi = [f.strip() for f in sys.stdin.readlines()]; c= [l for l in itertools.combinations(fi, 2)]; print "\n".join(["analyze_bwauth_thing " + i[0] + " " + i[1] for i in c]) '
}}}
```
And you can look at [https://consensus-health.torproject.org/graphs.html the Consensus Health graphs], and see if your bwauth seems sane based on that. (Again, required your bwauth to be voting.)
And you can look at [the Consensus Health graphs](https://consensus-health.torproject.org/graphs.html), and see if your bwauth seems sane based on that. (Again, required your bwauth to be voting.)
== Monitoring ==
## Monitoring
After the bwauth has been running for a few days, you might wish to set up some sanity checks for it. Tom Ritter uses [https://github.com/tomrittervg/checker checker] for his, specifically with [https://github.com/tomrittervg/checker/blob/master/samplejobs/BWAuthChecker.py this script]. The script checks five things:
After the bwauth has been running for a few days, you might wish to set up some sanity checks for it. Tom Ritter uses [checker](https://github.com/tomrittervg/checker) for his, specifically with [this script](https://github.com/tomrittervg/checker/blob/master/samplejobs/BWAuthChecker.py). The script checks five things:
1. Is the bwauth machine still running (checks Apache)
2. Does the bwauth bandwidths file have a sufficiently recent timestamp?
...
...
@@ -70,44 +70,44 @@ After the bwauth has been running for a few days, you might wish to set up some
More details:
=== Timestamp and Number of Relays ===
### Timestamp and Number of Relays
Symlink ~/bwauth/torflow/NetworkScanners/BwAuthority/bwscan.V3BandwidthsFile out to your Apache directory. The top line is a timestamp. I make sure it has a timestamp in the last four hours. I choose a number of relays that is a bit below the current number of measured relays by other bwauths (currently 7600). This number ebs and flows. I might edit it 5-6 times a year.
=== Percentage of the network measured ===
### Percentage of the network measured
I have a crontab entry:
{{{ 10 * ** * grep "of all tor nodes" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/aggregate-debug.log > /var/www/html/bwauth/AA_percent-measured.txt }}}
` 10 * * * * grep "of all tor nodes" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/aggregate-debug.log > /var/www/html/bwauth/AA_percent-measured.txt `
That outputs the percent measured to https://bwauth.ritter.vg/bwauth/AA_percent-measured.txt and I check the last line to make sure it is reasonably high (> 96).
=== Scanner Loop Time ===
### Scanner Loop Time
This one is less intuitive. There are 9 scanners. Sometimes a scanner gets stuck. It's very hard to detect when this happens based on the data output, by the time any of the above checks would fire, the data is excessively stale. So this check is pretty important.
The crontab entry to generate this info is:
{{{ 10 * ** * for i in 1 2 3 4 5 6 7 8 9; do echo "Scanner $i"; egrep "Starting slice for percentiles [0-9]+.0-" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/scanner.$i/bw.log; done }}}
` 10 * * * * for i in 1 2 3 4 5 6 7 8 9; do echo "Scanner $i"; egrep "Starting slice for percentiles [None..None](../compare/None...None)+.0-" /home/tom/bwauth/torflow/NetworkScanners/BwAuthority/data/scanner.$i/bw.log; done `
It outputs it to https://bwauth.ritter.vg/bwauth/AA_scanner_loop_times.txt. I check that the last line of each scanner is within a reasonable time frame (6 days).
== Debugging ==
## Debugging
=== Bandwidth Authority Tor Fails to Start ===
### Bandwidth Authority Tor Fails to Start
1. Make the Log, DataDirectory, and PidFile paths absolute paths (#20456)
=== Bandwidth Authority Scripts Fail on BSD / OS X ===
### Bandwidth Authority Scripts Fail on BSD / OS X
1. Install a readlink that supports -f
OR
1. Manually install dependencies from [https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.BwAuthorities the BwAuthority instructions]
1. Manually install dependencies from [the BwAuthority instructions](https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.BwAuthorities)
2. Manually set SCANNER_DIR in cron.sh and run_scan.sh
=== Bandwidth Authority Fails on Small (Test) Networks ===
### Bandwidth Authority Fails on Small (Test) Networks
1. Small networks might be missing Guards, Guards+Exits, Middles, or Exits (#20467)
2. Small networks might have bandwidths below the minimum of 1MByte/second (#20505)
...
...
@@ -116,62 +116,62 @@ On small networks, the following features can lead to no measured bandwidths:
* bandwidth authorities measure the bandwidth of directory authorities, but don't aggregate them in the results,
* the consensus does not include any measured bandwidths until there are at least 3 bandwidth authorities.
=== Bandwidth Authorities use an Old Tor Version ===
### Bandwidth Authorities use an Old Tor Version
1. Update the bwauth to use tor 0.2.9, because it's LTS (#20453)
2. ~~If using tor 0.3.0 or later, add "UseMicrodescriptors 0" to the torrc (#20621)~~ (fixed in tor)
3. You might get some errors using Tor 0.3.0 or later (#24110)
=== Scanner Fails to Import Required Python Libraries ===
### Scanner Fails to Import Required Python Libraries
1. Change the PYTHONPATH in the scripts (#20466)
=== Excessive Log Entries ===
### Excessive Log Entries
1. Remove the download URL that doesn't work (#20580)
2. Turn pathbias off (#20457)
3. Fix the NEWCONSENSUS event code (#20619)
== stretch setup ==
## stretch setup
1: Create a new user to run the bwscanner (I call mine bwscanner)
{{{
```
(root) adduser --system bwscanner
}}}
```
2: Check out torflow from https://git.torproject.org/torflow
{{{
```
(root) apt-get install git ca-certificates
(root) su - bwscanner -s /bin/bash
git clone https://git.torproject.org/torflow
cd torflow
git rev-parse HEAD
}}}
```
The last command shows you what you actually got. There are no signed tags
for torflow, so try to verify as best as you can.
{{{
```
git submodule init
git submodule update
}}}
```
3: Install a system tor
{{{
```
(root) apt-get install tor
}}}
```
4: Provide virtualenv (the sql dependencies are too new on stretch)