Trac issueshttps://gitlab.torproject.org/legacy/trac/-/issues2020-06-13T16:13:59Zhttps://gitlab.torproject.org/legacy/trac/-/issues/27788sbws: weight bandwidths based on the time since the last bandwidth2020-06-13T16:13:59Zteorsbws: weight bandwidths based on the time since the last bandwidthSplit off #27346:
weight bandwidths based on the time since the last bandwidth, because:
* if we only record bandwidths when they change, bandwidths that are updated soon after the last bandwidth are weighted too high
we can either:
* ...Split off #27346:
weight bandwidths based on the time since the last bandwidth, because:
* if we only record bandwidths when they change, bandwidths that are updated soon after the last bandwidth are weighted too high
we can either:
* record the bandwidths every hour, even if they haven't changed
* weight each bandwidth by the time since the last bandwidthsbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27787sbws: use at least 3 days of observed bandwidths2020-06-13T16:13:58Zteorsbws: use at least 3 days of observed bandwidthsSplit off #27346:
use at least 3 days of observed bandwidths, because:
* a single download at the changeover point can affect 2 daysSplit off #27346:
use at least 3 days of observed bandwidths, because:
* a single download at the changeover point can affect 2 dayssbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27786sbws: use at least 4 measurements that are at least 6 hours apart2020-06-13T16:13:58Zteorsbws: use at least 4 measurements that are at least 6 hours apartSplit off #27346:
use at least 4 measurements that are at least 6 hours apart, because:
* there is a daily cycle
* each day contains 2 similar points in the cycle (it is an up and down cycle)
* if all 4 measurements happen within a few ...Split off #27346:
use at least 4 measurements that are at least 6 hours apart, because:
* there is a daily cycle
* each day contains 2 similar points in the cycle (it is an up and down cycle)
* if all 4 measurements happen within a few hours, they will still be biased towards that point in the cyclesbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27365Implement sbws features in the tor authority code2020-06-13T16:13:52ZteorImplement sbws features in the tor authority codesbws and torflow measure relay bandwidth, and then process the results. Some of this processing could be implemented on tor directory authorities instead. The authority implementation would be more consistent, and reduce the amount of ef...sbws and torflow measure relay bandwidth, and then process the results. Some of this processing could be implemented on tor directory authorities instead. The authority implementation would be more consistent, and reduce the amount of effort required to implement new bandwidth measurement tools.
Here are some potential authority features:
* an absolute node consensus weight cap
* a relative node consensus weight cap
* the MaxAdvertisedBandwidth cap
* other post-processing in the bandwidth file spec:
https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txtsbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27362(sub-)packages outside of core (cli) should not need to know about confs and ...2020-06-13T16:13:51Zjuga(sub-)packages outside of core (cli) should not need to know about confs and argsThat would help to have a more modular design and use other (sub)packages and (sub)modules without the need of creating ConfigParser and ArgumentParser objects.
It also would help to simplify tests configurations.
Additionally, a progra...That would help to have a more modular design and use other (sub)packages and (sub)modules without the need of creating ConfigParser and ArgumentParser objects.
It also would help to simplify tests configurations.
Additionally, a program should take into account in this order:
- cli arguments
- environment variables
- user configuration files
- system configuration files
- program defaults
That is currently almost match. but it would be better if they all can be took into account in a simpler way.
This is not for MVP, but creating the ticket cause i'm creating new code taking this into account, and would be nice to change at some point.
Some tickets, as #27358, happen because of this.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27361Tests that launch sbws in a subprocess2020-06-13T16:13:50ZpastlyTests that launch sbws in a subprocessThat way we exercise all the startup code.That way we exercise all the startup code.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27346Improve sbws bandwidth accuracy2020-06-13T16:13:49ZteorImprove sbws bandwidth accuracyBetter designs SHOULD:
* use at least 4 measurements that are at least 6 hours apart, because:
- there is a daily cycle
- each day contains 2 similar points in the cycle (it is an up and down cycle)
- if all 4 measurements happen w...Better designs SHOULD:
* use at least 4 measurements that are at least 6 hours apart, because:
- there is a daily cycle
- each day contains 2 similar points in the cycle (it is an up and down cycle)
- if all 4 measurements happen within a few hours, they will still be biased
* use at least 3 days of observed bandwidths, because:
- a single download at the changeover point can affect 2 days
* weight bandwidths based on the time since the last bandwidth, because:
- if we only record bandwidths when they change, bandwidths that are updated soon after the last bandwidth are weighted too high
- we can either:
* record the bandwidths every hour, even if they haven't changed
* weight each bandwidth by the time since the last bandwidth
* use a decaying average for measured and observed bandwidths, because:
- recent bandwidths are closer to the relay's current capacity
- and we want accurate resultssbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27343Dockerfile for sbws basic install2020-06-13T16:13:48ZTracDockerfile for sbws basic install
```
FROM debian
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get -qy install python3-pip git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN useradd --shell /bin/bash -u 1000...
```
FROM debian
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get -qy install python3-pip git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN useradd --shell /bin/bash -u 1000 -m test
USER test
RUN git clone https://git.torproject.org/stem.git /home/test/stem
RUN git clone https://github.com/pastly/simple-bw-scanner.git /home/test/simple-bw-scanner
USER root
RUN pip3 install /home/test/stem
RUN pip3 install /home/test/simple-bw-scanner
USER test
CMD /usr/local/bin/sbws scanner
```
Just a basic Dockerfile to get sbws installed inside Docker. Note that it needs to be adapted a little bit, as it lacks a config file... Whenever I figure out the config file aspect, I can get that included easily.
**Trac**:
**Username**: gabesbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27339Work out a policy for resolving differences between torflow and sbws2020-06-13T16:13:46ZteorWork out a policy for resolving differences between torflow and sbwsIn #27135, we talk about how to deal with differences between torflow and sbws.
Here are some options:
1. Any difference between sbws and torflow is a bug in sbws that should be fixed.
2. If a sbws deployment is within X% of an existing...In #27135, we talk about how to deal with differences between torflow and sbws.
Here are some options:
1. Any difference between sbws and torflow is a bug in sbws that should be fixed.
2. If a sbws deployment is within X% of an existing bandwidth authority, sbws is ok. (The total consensus weights of the existing bandwidth authorities are within 50% of each other, see #25459.)
3. Let's choose an ideal bandwidth distribution for the Tor network, and modify sbws until we get that distribution.
Juga suggested that we start with policy 2, and use research to work out when to move to policy 3.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27107Transition plan from Torflow to sbws2020-06-13T16:13:39ZjugaTransition plan from Torflow to sbwsTicket for the tasks related to get sbws in production.
These would include work we have been already doing, but probably we should need to add more tasks:
1. Get sbws in Debian (#26848)
2. Check that the bandwidth files results are sim...Ticket for the tasks related to get sbws in production.
These would include work we have been already doing, but probably we should need to add more tasks:
1. Get sbws in Debian (#26848)
2. Check that the bandwidth files results are similar to Torflow:
We have being doing this in https://github.com/pastly/simple-bw-scanner/issues/182, though is growing.
So far we checked:
- sbws raw results compared to torflow: shape is quite different
- sbws raw results compared to sbws scaled: big relays get bigger bw, but shape still different from torflow
- sbws raw results compared to sbws raw results having a bigger time for downloading: they're similar
- sbws raw results compared to sbws raw results resting the rtt from the dowload time: they're similar
- implement parsing Torflow raw files
- sbws raw results compared to Torflow results: they are similar, so it is the scaling method which makes results different
3. i'd create child tickets for the WIP
- implement torflow scaling
- check sbws using torflow scaling compare to Torflow
- change specification
- ...
4. Get one bwauth to run sbws
5. Archive bw files (#21378)
6. Compare Tor bw files from bwauths running Torflow and from the one running sbws
7. ...
Edit:
- formattingsbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/25925bwauth improvements (ex-parent ticket for SoP planned tasks)2020-06-13T16:13:27Zjugabwauth improvements (ex-parent ticket for SoP planned tasks)sbws: unspecifiedjugajugahttps://gitlab.torproject.org/legacy/trac/-/issues/16559bwauth code needs to be smarter about failed circuits2020-06-13T16:13:26ZTvdWbwauth code needs to be smarter about failed circuitsIn the current bandwidth authority code, when a fetch attempt fails, it will still be counted as a circuit that went through all of the nodes -- even if those nodes weren't responsible for the failure.
This has the potential of resultin...In the current bandwidth authority code, when a fetch attempt fails, it will still be counted as a circuit that went through all of the nodes -- even if those nodes weren't responsible for the failure.
This has the potential of resulting in a relay not being measured sufficiently, or at all: the code will consider failures from unstable nodes to be relevant for nodes that are perfectly stable.
In slices where exits and entries aren't well-distributed (like, all of them) this can result in some nodes not being measured at all, and losing their consensus weight. This seems to affect exits a lot more than it does other relay types: people on tor-relays@ have mentioned that removing their exit policies gets their consensus weight back, and I have been able to reproduce this.sbws: unspecifiedjugajugahttps://gitlab.torproject.org/legacy/trac/-/issues/10791Detect overtuned exit relays2020-06-13T16:13:25ZcypherpunksDetect overtuned exit relaysMany relays breaks usability for users by trying so small time of syn packet that connection timeout is very low if relay overloaded and syn was lost. Tor client have no ability to retry with another circuit if reason for end cell is tim...Many relays breaks usability for users by trying so small time of syn packet that connection timeout is very low if relay overloaded and syn was lost. Tor client have no ability to retry with another circuit if reason for end cell is timeout or refused. By default Tor can to retry only if no answer for 10 (15) seconds or some end reasons.
Torflow should to test exit relays if they answers timeout faster than after 15 seconds or refuses for known working host.
Relays that overtuned or overfirewalled still usefull as non-exit relays but should be marked as BadExits if stable Tor client version can't to retry.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/7177Understand how accurate the bandwidth authority estimates are2020-06-13T16:13:24ZKarsten LoesingUnderstand how accurate the bandwidth authority estimates are(Re-using text from Roger and Mike for this ticket description.)
It would be good to have a better understanding of how accurate the bandwidth authority estimates are. Why do some really fast relays get huge weights, and other really f...(Re-using text from Roger and Mike for this ticket description.)
It would be good to have a better understanding of how accurate the bandwidth authority estimates are. Why do some really fast relays get huge weights, and other really fast relays don't? Does it have to do with location of the measurers? What exactly is the trade-off between having fast nodes all nearby each other (and nearby the bandwidth authorities) in the network, and having nodes in geographically dispersed places?
We probably should figure out a way for the bandwidth authorities to utilize per-node as well as ambient circuit failure (#7023, #7037).
There's a bunch of related stuff for TCP socket exhaustion, too. All of it probably involves some fairly diligent monitoring of results and experimentation though.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/5457Bw auths don't count circuit failures in descriptor mode2022-03-10T15:30:22ZMike PerryBw auths don't count circuit failures in descriptor modeWhen we are using descriptor bandwidth (ie no feedback), we are unable to properly use circuit failure statistics to penalize nodes that are either attempting path bias, or are just experiencing CPU overload.
The fix *should* be simple....When we are using descriptor bandwidth (ie no feedback), we are unable to properly use circuit failure statistics to penalize nodes that are either attempting path bias, or are just experiencing CPU overload.
The fix *should* be simple. I think we just need to add another clause in aggregate.py where we check for use_circ_fails to also check for use_desc_bw and properly combine the pid_error and circ_error for that case (perhaps just by multiplying them).sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/4709Implement bwauth cap for TCP socket exhaustion2020-06-13T16:13:22ZMike PerryImplement bwauth cap for TCP socket exhaustionStep 0 is to determine if any of the Gbit+ tor relays (especially Guard+Exit nodes) ever come close to running out of TCP sockets.
Step 1 is find some way to measure stream failures from the bwauths, compute a stream_error value, and us...Step 0 is to determine if any of the Gbit+ tor relays (especially Guard+Exit nodes) ever come close to running out of TCP sockets.
Step 1 is find some way to measure stream failures from the bwauths, compute a stream_error value, and use it. See #4708 for more details on that.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/4708Implement bwauth cap for latency2022-03-01T15:29:15ZMike PerryImplement bwauth cap for latencyRobert, Sebastian and I hashed out an idea for another feedback mechanism for the bw auths based on latency.
Basically, the idea is to create another cap called latency_error similar to how we use circ_error. If a node's latency exceeds...Robert, Sebastian and I hashed out an idea for another feedback mechanism for the bw auths based on latency.
Basically, the idea is to create another cap called latency_error similar to how we use circ_error. If a node's latency exceeds some quantile of the population (by being higher than the latencies of say 75% of all nodes), we would then compute and use a pid_error-style error value based on the distance from this 75% quantile setpoint, and use it if it is a more negative number than pid_error and circ_error.
I think the way we want to measure this latency is from CREATE to STREAM FAILED EXITPOLICY for a circuit creation + stream failure for a 1-hop stream exit attempt to localhost. This way we measure both cryptoworker queue latency as well as orconn, circuit, and stream latency.
I think the simplest way to build this is as a separate process from bwauthority.py that simply builds 1-hop circuits and attempts to exit to localhost from them. It would then output a separate, additional measurement file for the network that would be read in by aggregate.py, and used to compute latency_error.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/4359Minimize time between new relay appearing and having some bw vote for it2020-06-13T16:13:21ZRoger DingledineMinimize time between new relay appearing and having some bw vote for itIn #2286 I point out a huge security problem in Tor, which is that new relays can lie about their bandwidth and get away with it.
One of the components of my suggested fix is to minimize the period of time between when a new relay appea...In #2286 I point out a huge security problem in Tor, which is that new relays can lie about their bandwidth and get away with it.
One of the components of my suggested fix is to minimize the period of time between when a new relay appears in the network, and when we have an opinion about its bandwidth.
So it would be great for the bwauths to recognize new relays and schedule them for high-priority tests.
Or is that already done? What's the expected turnaround time?
This ticket is perhaps related to #2550.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/2888Testing framework/dataset for bw auths2020-06-13T16:13:20ZMike PerryTesting framework/dataset for bw authsWe should create a testing framework and dataset for the bw auths, so we can more easily verify and identify issues in sqlalchemy upgrades and database back-end migration.We should create a testing framework and dataset for the bw auths, so we can more easily verify and identify issues in sqlalchemy upgrades and database back-end migration.sbws: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/2550bwauth should reschedule quicker bandwidth test when bandwidthrate changes?2022-02-17T19:36:19ZRoger Dingledinebwauth should reschedule quicker bandwidth test when bandwidthrate changes?https://metrics.torproject.org/relay-search.html?search=AEIOUm+2011-02-13
He apparently switches between 100KB/s and 2MB/s bandwidthrate depending on time of day. His bwauth votes ended up being very skewed:
```
moria1 says
w Bandwidth...https://metrics.torproject.org/relay-search.html?search=AEIOUm+2011-02-13
He apparently switches between 100KB/s and 2MB/s bandwidthrate depending on time of day. His bwauth votes ended up being very skewed:
```
moria1 says
w Bandwidth=141 Measured=15
ides says
w Bandwidth=141 Measured=26
urras says
w Bandwidth=141 Measured=1480
gabelmoo says
w Bandwidth=141 Measured=935
```
It's a shame that we're giving really low numbers to a node that wants to be 2MB/s at some times of day. It probably also means we give high numbers to a slow node if the measurement times are different.sbws: unspecified