|
|
# Experimental Plan
|
|
|
|
|
|
We're going to perform a series of parameter changes based on consensus
|
|
|
parameter changes, both to observe their effects on performance
|
|
|
characteristics of the live network, as well as to try to reproduce these
|
|
|
effects on a testing network or simulator for use in future experiments.
|
|
|
|
|
|
For each of these experiments, we will enumerate the parameter changes, the
|
|
|
metrics required to measure their effects, the expected results, and any
|
|
|
anonymity risks.
|
|
|
|
|
|
Each experiments will be performed over a period one week per parameter,
|
|
|
alternating the parameter value between one of the experimental values and the
|
|
|
default every 24 hour period for metrics that rely only on output from
|
|
|
torperf, and over 48 hour periods for metrics that require data from extrainfo
|
|
|
descriptors (currently only one experiment requires this).
|
|
|
|
|
|
# Metrics Definitions
|
|
|
|
|
|
See [the performance metrics
|
|
|
page](https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics)
|
|
|
for all of our metrics for performance tuning, performance experiments,
|
|
|
development items, and research tasks.
|
|
|
|
|
|
# Experiments
|
|
|
|
|
|
## Circuit Build Timeout
|
|
|
|
|
|
* **Parameter values to test**:
|
|
|
* consensus: cbtquantile=80 (current), cbtquantile=70, cbtquantile=60
|
|
|
* **Metrics**:
|
|
|
* **CDF-TTFB**
|
|
|
* **Failure rainbow**
|
|
|
* **Circuit timeout rates**
|
|
|
* **Expected results**:
|
|
|
* Reducing cbtquantile improves CDF-TTFB considerably (CDF-TTLB and CDF-DL
|
|
|
would show less improvement, since congestion changes over time)
|
|
|
* As we set cbtquantile lower, the CDF-TTFB graph should become a sharper
|
|
|
cliff, and move to the left. This is because it will both reduce latnecy
|
|
|
-- the left shift, and reduce performance variance -- make the CDF more
|
|
|
cliff-like.
|
|
|
* Because this is a congestion-avoidance mechanism, we should see
|
|
|
*increasing* amounts of returns for each percentile of decrease of
|
|
|
cbtquantile parameter (this is because all clients will be avoiding more
|
|
|
congested+slow circuits, which means less congested and slow circuits on
|
|
|
the network overall, which means less overall latency).
|
|
|
* Tor clients give up on the selected percentage of circuits (not more, not
|
|
|
less)
|
|
|
* Circuit failure rates likely go down for timeout-related failures
|
|
|
* **Potential Sources of Model Error**:
|
|
|
* The circuit build timeout code was designed when we used three guards. It
|
|
|
may no longer actually enforce that a proper cbtquantile of circuits time
|
|
|
out with 1 or 2 guards. This may affect performance positively or
|
|
|
negatively, as well as have anonymity impact.
|
|
|
* Since torperf does not use guards, it may exhibit different results
|
|
|
without them than with them; we may want to perform this experiment in
|
|
|
tandem with the **Number of Guards** experiment (or at least run an
|
|
|
additional torperf instance with a short GuardLifetime value, as suggested
|
|
|
in the **Number of Guards** experiment).
|
|
|
* **Anonymity effects**:
|
|
|
* Path reduction (clients will only use the fastest 'cbtquantile' percent of
|
|
|
paths, which means less network paths are used)
|
|
|
* In extreme cases of very low cbtquantile, clients will tend to prefer
|
|
|
network paths that contain only routers that are geographically close to
|
|
|
them, which may leak information about their geographical location.
|
|
|
* **Instrumentation Needed To Verify Operation**:
|
|
|
* On the torperf clients, the BUILDTIMEOUT_SET control port event should
|
|
|
have a CUTOFF_QUANTILE field value that matches the cbtquantile consensus
|
|
|
parameter. Additionally, the rate of circuit timeouts on torperf should
|
|
|
match 1.0-CUTOFF_QUANTILE, as well as match the TIMEOUT_RATE field of
|
|
|
BUILDTIMEOUT_SET.
|
|
|
* **Abort Criteria**:
|
|
|
* If the TIMEOUT_RATE field or the manually counted circuit timeout rate
|
|
|
exceeds 1.0-CUTOFF_QUANTILE by more than 0.1, the experiment should be
|
|
|
stopped and we should investigate and debug the circuit build timeout
|
|
|
code.
|
|
|
* The failure rate of torperf and onionperf should be closely monitored, to
|
|
|
ensure that onion services do not have unexpected amounts of additional
|
|
|
failure during this experiment. If failure rates increase, we should
|
|
|
abort.
|
|
|
* If Torperf uses exit nodes and rendezvous points out of proportion to
|
|
|
their consensus weights after this change, we should abort. [The vanguards
|
|
|
rendguard
|
|
|
component](https://github.com/mikeperry-tor/vanguards/blob/master/src/vanguards/rendguard.py)
|
|
|
has code to monitor rend point use already; it can be adapted for exits as
|
|
|
well.
|
|
|
* **User Impact/What to Tell Users:**
|
|
|
* So long as the abort criteria are not met, the user impact should be
|
|
|
minimal for small changes to this parameter: latency should just improve.
|
|
|
For very low values, the geolocation concern increases, but we should be
|
|
|
able to rule those out through the abort criteria.
|
|
|
|
|
|
## Fast Relay Cutoff
|
|
|
|
|
|
* **Parameters**:
|
|
|
* consensus: FastFlagMinThreshold=$(bandwidth_of_n_percent_fastest_nodes)
|
|
|
* torrc: AuthDirFastGuarantee=0
|
|
|
* **Metrics**:
|
|
|
* **Failure rainbow**
|
|
|
* **CDF-TTLB**
|
|
|
* **CDF-DL**
|
|
|
* **Per-Relay Spare Network Capacity CDF**
|
|
|
* **Expected results**:
|
|
|
* Slow relays in the network are overloaded more than faster relays. Cutting
|
|
|
them out should reduce the overall rates of timeout-related failures.
|
|
|
* It should similarly reduce the variance of the performance of the network,
|
|
|
to the extent that the slow relays would have been chosen. This should
|
|
|
mean that the CDF-TTLB and CDF-DL graphs become more cliff-like (but
|
|
|
should not shift left overall, like we expected for CBT).
|
|
|
* The **Per Relay Spare Network Capacity CDF** should narrow and become
|
|
|
more cliff-like, since slow relays are more overloaded than the rest of
|
|
|
the network.
|
|
|
* **Potential Sources of Model Error**:
|
|
|
* We don't know which slow relays are slow, or why. Depending on the
|
|
|
threshold, we may cut out unused relays, overloaded relays, or relays
|
|
|
suffering from other bugs (see the KIST experiment).
|
|
|
* **Anonymity effects**:
|
|
|
* Less relays means less diversity and less possible network paths, in
|
|
|
proportion to where we set the cutoff at.
|
|
|
* **Instrumentation Needed To Verify Operation:**:
|
|
|
* Relays with a measured bandwidth below the cutoff should no longer appear
|
|
|
in the consensus
|
|
|
* **Abort Criteria**:
|
|
|
* If relays other than the expected cutoff set disappear from the consensus,
|
|
|
abort.
|
|
|
* **User Impact/What to Tell Users:**
|
|
|
* Relay operators on tor-relays should be made aware of these plans;
|
|
|
possibly also mailing contact info of affected relays, where it is
|
|
|
available.
|
|
|
|
|
|
## KIST
|
|
|
|
|
|
* **Parameters**:
|
|
|
* Torperf torrc: KISTSchedRunInterval at 2ms
|
|
|
* consensus: KISTSchedRunInterval at 2ms, 5ms, 10ms (default)
|
|
|
* **Metrics**:
|
|
|
* **CDF-TTFB**
|
|
|
* **CDF-DL**
|
|
|
* **Expected Results**:
|
|
|
* The KIST scheduler interval has an effect on how often we are able to read
|
|
|
and write data to the network. For relays with lots of TCP connections, a
|
|
|
larger interval is better. For relays with only very few, a smaller
|
|
|
interval is better. See #29427.
|
|
|
* Depending on the number of connections that typical relays have, different
|
|
|
values of this parameter may increase performance variance of steady state
|
|
|
downloads (CDF-DL), as well as have an impact on latency (CDF-TTFB).
|
|
|
* **Potential Sources of Model Error**:
|
|
|
* The KIST scheduler consensus value may also apply to the Torperf client
|
|
|
itself, which will deeply impact the results we see.
|
|
|
* **Anonymity Effects**:
|
|
|
* This experiment may help us get better performance out of slow relays,
|
|
|
which will improve anonymity.
|
|
|
* **Instrumentation Needed To Verify Operation:**:
|
|
|
* XXX: dgoulet/pastly?
|
|
|
* **Abort Criteria**:
|
|
|
* XXX: dgoulet/pastly?
|
|
|
* **User Impact/What to Tell Users:**
|
|
|
* XXX: Did we even tell our users when we deployed KIST or EWMA apart from
|
|
|
the changelog lines?
|
|
|
* Rob: Yes, here is the KIST blog post:
|
|
|
https://blog.torproject.org/kist-and-tell-tors-new-traffic-scheduling-feature
|
|
|
|
|
|
## Number of Guards
|
|
|
|
|
|
* **Parameters**:
|
|
|
* consensus: guard-n-primary-guards-to-use=2 (1 is default)
|
|
|
* consensus: guard-n-primary-guards=2 (1 is default)
|
|
|
* torperf torrc: UseEntryGuards 1
|
|
|
* torperf torrc: GuardLifetime 1 day (or less; requires tor patch)
|
|
|
* **Metrics**:
|
|
|
* **CDF-TTFB**
|
|
|
* **CDF-TTLB**
|
|
|
* **CDF-DL**
|
|
|
* **Failure rainbow**
|
|
|
* **Circuit timeouts** (maybe)
|
|
|
* **Expected Results**:
|
|
|
* With only one guard, Torperf's variance for all performance
|
|
|
characteristics should be much larger. Additionally, with more than one
|
|
|
guard, Circuit Build Timeout should be able to avoid one of the guards if
|
|
|
either become temporarily overloaded. As a result, we should also see an
|
|
|
increase in average performance. So switching to two guards should make
|
|
|
all CDFs more cliff-like and move them all to the left (towards
|
|
|
origin/faster performance).
|
|
|
* **Potential Sources of Model Error**:
|
|
|
* Torperf doesn't use guards by default; making it do so in a way that
|
|
|
allows us to get an idea of the performance variance of different
|
|
|
combinations of guards will require a much longer run of this experiment
|
|
|
than of the others.
|
|
|
* **Anonymity Effects**:
|
|
|
* See Proposal 291
|
|
|
* **Instrumentation Needed To Verify Operation:**:
|
|
|
* Control port monitoring of the GUARD and ORCONN events to ensure that our
|
|
|
torperf instances use the expected number of guards
|
|
|
* **Abort Criteria**:
|
|
|
* If more guards than we expect are used, abort.
|
|
|
* **User Impact/What to Tell Users:**
|
|
|
* Probably requires a blog post; guard issues are a deep rabbithole.
|
|
|
|
|
|
## Preemptive Circuit Building
|
|
|
|
|
|
* **Parameters**:
|
|
|
* **Metrics**:
|
|
|
* **Expected Results**:
|
|
|
* **Potential Sources of Model Error**:
|
|
|
* If torperf does not use the same number of circuits as we expect most
|
|
|
clients to use, and in the same patterns, this will bias our results.
|
|
|
* **Anonymity Effects**:
|
|
|
* **Instrumentation Needed To Verify Operation:**:
|
|
|
* **Abort Criteria**:
|
|
|
* **User Impact/What to Tell Users:**
|
|
|
|
|
|
## EWMA
|
|
|
|
|
|
* **Parameters**:
|
|
|
* **Metrics**:
|
|
|
* **Expected Results**:
|
|
|
* **Potential Sources of Model Error**:
|
|
|
* **Anonymity Effects**:
|
|
|
* **Instrumentation Needed To Verify Operation:**:
|
|
|
* **Abort Criteria**:
|
|
|
* **User Impact/What to Tell Users:**
|
|
|
|
|
|
|
|
|
## Sbws and Torflow Comparison
|
|
|
|
|
|
* **Parameters**:
|
|
|
* **Metrics**:
|
|
|
* **Expected Results**:
|
|
|
* **Potential Sources of Model Error**:
|
|
|
* **Anonymity Effects**:
|
|
|
* **Instrumentation Needed To Verify Operation:**:
|
|
|
* **Abort Criteria**:
|
|
|
* **User Impact/What to Tell Users:**
|
|
|
|
|
|
## Experiment Template
|
|
|
|
|
|
* **Parameters**:
|
|
|
* **Metrics**:
|
|
|
* **Expected Results**:
|
|
|
* **Potential Sources of Model Error:**:
|
|
|
* **Instrumentation Needed To Verify Operation:**:
|
|
|
* **Anonymity Effects**:
|
|
|
* **Abort Criteria**:
|
|
|
* **User Impact/What to Tell Users:** |