Signed-off-by: David Goulet <dgoulet@ev0ke.net>

David Goulet · ffd04f11
--- a/NetworkTeam/Sponsor61/PerformanceExperiments.md
+++ b/NetworkTeam/Sponsor61/PerformanceExperiments.md
+# Experimental Plan
+
+We're going to perform a series of parameter changes based on consensus
+parameter changes, both to observe their effects on performance
+characteristics of the live network, as well as to try to reproduce these
+effects on a testing network or simulator for use in future experiments.
+
+For each of these experiments, we will enumerate the parameter changes, the
+metrics required to measure their effects, the expected results, and any
+anonymity risks.
+
+Each experiments will be performed over a period one week per parameter,
+alternating the parameter value between one of the experimental values and the
+default every 24 hour period for metrics that rely only on output from
+torperf, and over 48 hour periods for metrics that require data from extrainfo
+descriptors (currently only one experiment requires this).
+
+# Metrics Definitions
+
+See [the performance metrics
+page](https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics)
+for all of our metrics for performance tuning, performance experiments,
+development items, and research tasks.
+
+# Experiments
+
+## Circuit Build Timeout
+
+* **Parameter values to test**:
+  * consensus: cbtquantile=80 (current), cbtquantile=70, cbtquantile=60
+* **Metrics**:
+  * **CDF-TTFB**
+  * **Failure rainbow**
+  * **Circuit timeout rates**
+* **Expected results**:
+  * Reducing cbtquantile improves CDF-TTFB considerably (CDF-TTLB and CDF-DL
+    would show less improvement, since congestion changes over time)
+  * As we set cbtquantile lower, the CDF-TTFB graph should become a sharper
+    cliff, and move to the left. This is because it will both reduce latnecy
+    -- the left shift, and reduce performance variance -- make the CDF more
+    cliff-like.
+  * Because this is a congestion-avoidance mechanism, we should see
+    *increasing* amounts of returns for each percentile of decrease of
+    cbtquantile parameter (this is because all clients will be avoiding more
+    congested+slow circuits, which means less congested and slow circuits on
+    the network overall, which means less overall latency).
+  * Tor clients give up on the selected percentage of circuits (not more, not
+    less)
+  * Circuit failure rates likely go down for timeout-related failures
+* **Potential Sources of Model Error**:
+  * The circuit build timeout code was designed when we used three guards. It
+    may no longer actually enforce that a proper cbtquantile of circuits time
+    out with 1 or 2 guards. This may affect performance positively or
+    negatively, as well as have anonymity impact.
+  * Since torperf does not use guards, it may exhibit different results
+    without them than with them; we may want to perform this experiment in
+    tandem with the **Number of Guards** experiment (or at least run an
+    additional torperf instance with a short GuardLifetime value, as suggested
+    in the **Number of Guards** experiment).
+* **Anonymity effects**:
+  * Path reduction (clients will only use the fastest 'cbtquantile' percent of
+    paths, which means less network paths are used)
+  * In extreme cases of very low cbtquantile, clients will tend to prefer
+    network paths that contain only routers that are geographically close to
+    them, which may leak information about their geographical location.
+* **Instrumentation Needed To Verify Operation**:
+  * On the torperf clients, the BUILDTIMEOUT_SET control port event should
+    have a CUTOFF_QUANTILE field value that matches the cbtquantile consensus
+    parameter. Additionally, the rate of circuit timeouts on torperf should
+    match 1.0-CUTOFF_QUANTILE, as well as match the TIMEOUT_RATE field of
+    BUILDTIMEOUT_SET.
+* **Abort Criteria**:
+  * If the TIMEOUT_RATE field or the manually counted circuit timeout rate
+    exceeds 1.0-CUTOFF_QUANTILE by more than 0.1, the experiment should be
+    stopped and we should investigate and debug the circuit build timeout
+    code.
+  * The failure rate of torperf and onionperf should be closely monitored, to
+    ensure that onion services do not have unexpected amounts of additional
+    failure during this experiment. If failure rates increase, we should
+    abort.
+  * If Torperf uses exit nodes and rendezvous points out of proportion to
+    their consensus weights after this change, we should abort. [The vanguards
+    rendguard
+    component](https://github.com/mikeperry-tor/vanguards/blob/master/src/vanguards/rendguard.py)
+    has code to monitor rend point use already; it can be adapted for exits as
+    well.
+* **User Impact/What to Tell Users:**
+  * So long as the abort criteria are not met, the user impact should be
+    minimal for small changes to this parameter: latency should just improve.
+    For very low values, the geolocation concern increases, but we should be
+    able to rule those out through the abort criteria.
+
+## Fast Relay Cutoff
+
+* **Parameters**:
+  * consensus: FastFlagMinThreshold=$(bandwidth_of_n_percent_fastest_nodes)
+  * torrc: AuthDirFastGuarantee=0
+* **Metrics**:
+  * **Failure rainbow**
+  * **CDF-TTLB**
+  * **CDF-DL**
+  * **Per-Relay Spare Network Capacity CDF**
+* **Expected results**:
+  * Slow relays in the network are overloaded more than faster relays. Cutting
+    them out should reduce the overall rates of timeout-related failures.
+  * It should similarly reduce the variance of the performance of the network,
+    to the extent that the slow relays would have been chosen. This should
+    mean that the CDF-TTLB and CDF-DL graphs become more cliff-like (but
+    should not shift left overall, like we expected for CBT).
+  * The **Per Relay Spare Network Capacity CDF**  should narrow and become
+    more cliff-like, since slow relays are more overloaded than the rest of
+    the network.
+* **Potential Sources of Model Error**:
+  * We don't know which slow relays are slow, or why. Depending on the
+    threshold, we may cut out unused relays, overloaded relays, or relays
+    suffering from other bugs (see the KIST experiment).
+* **Anonymity effects**:
+  * Less relays means less diversity and less possible network paths, in
+    proportion to where we set the cutoff at.
+* **Instrumentation Needed To Verify Operation:**:
+  * Relays with a measured bandwidth below the cutoff should no longer appear
+    in the consensus
+* **Abort Criteria**:
+  * If relays other than the expected cutoff set disappear from the consensus,
+    abort.
+* **User Impact/What to Tell Users:**
+  * Relay operators on tor-relays should be made aware of these plans;
+    possibly also mailing contact info of affected relays, where it is
+    available.
+
+## KIST
+
+* **Parameters**:
+  * Torperf torrc: KISTSchedRunInterval at 2ms
+  * consensus: KISTSchedRunInterval at 2ms, 5ms, 10ms (default)
+* **Metrics**:
+  * **CDF-TTFB**
+  * **CDF-DL**
+* **Expected Results**:
+  * The KIST scheduler interval has an effect on how often we are able to read
+    and write data to the network. For relays with lots of TCP connections, a
+    larger interval is better. For relays with only very few, a smaller
+    interval is better. See #29427.
+  * Depending on the number of connections that typical relays have, different
+    values of this parameter may increase performance variance of steady state
+    downloads (CDF-DL), as well as have an impact on latency (CDF-TTFB).
+* **Potential Sources of Model Error**:
+  * The KIST scheduler consensus value may also apply to the Torperf client
+    itself, which will deeply impact the results we see.
+* **Anonymity Effects**:
+  * This experiment may help us get better performance out of slow relays,
+    which will improve anonymity.
+* **Instrumentation Needed To Verify Operation:**:
+  * XXX: dgoulet/pastly?
+* **Abort Criteria**:
+  * XXX: dgoulet/pastly?
+* **User Impact/What to Tell Users:**
+  * XXX: Did we even tell our users when we deployed KIST or EWMA apart from
+    the changelog lines?
+    * Rob: Yes, here is the KIST blog post:
+      https://blog.torproject.org/kist-and-tell-tors-new-traffic-scheduling-feature
+
+## Number of Guards
+
+* **Parameters**:
+  * consensus: guard-n-primary-guards-to-use=2 (1 is default)
+  * consensus: guard-n-primary-guards=2 (1 is default)
+  * torperf torrc: UseEntryGuards 1
+  * torperf torrc: GuardLifetime 1 day (or less; requires tor patch)
+* **Metrics**:
+  * **CDF-TTFB**
+  * **CDF-TTLB**
+  * **CDF-DL**
+  * **Failure rainbow**
+  * **Circuit timeouts** (maybe)
+* **Expected Results**:
+  * With only one guard, Torperf's variance for all performance
+    characteristics should be much larger. Additionally, with more than one
+    guard, Circuit Build Timeout should be able to avoid one of the guards if
+    either become temporarily overloaded. As a result, we should also see an
+    increase in average performance. So switching to two guards should make
+    all CDFs more cliff-like and move them all to the left (towards
+    origin/faster performance).
+* **Potential Sources of Model Error**:
+  * Torperf doesn't use guards by default; making it do so in a way that
+    allows us to get an idea of the performance variance of different
+    combinations of guards will require a much longer run of this experiment
+    than of the others.
+* **Anonymity Effects**:
+  * See Proposal 291
+* **Instrumentation Needed To Verify Operation:**:
+  * Control port monitoring of the GUARD and ORCONN events to ensure that our
+    torperf instances use the expected number of guards
+* **Abort Criteria**:
+  * If more guards than we expect are used, abort.
+* **User Impact/What to Tell Users:**
+  * Probably requires a blog post; guard issues are a deep rabbithole.
+
+## Preemptive Circuit Building
+
+* **Parameters**:
+* **Metrics**:
+* **Expected Results**:
+* **Potential Sources of Model Error**:
+  * If torperf does not use the same number of circuits as we expect most
+    clients to use, and in the same patterns, this will bias our results.
+* **Anonymity Effects**:
+* **Instrumentation Needed To Verify Operation:**:
+* **Abort Criteria**:
+* **User Impact/What to Tell Users:**
+
+## EWMA
+
+* **Parameters**:
+* **Metrics**:
+* **Expected Results**:
+* **Potential Sources of Model Error**:
+* **Anonymity Effects**:
+* **Instrumentation Needed To Verify Operation:**:
+* **Abort Criteria**:
+* **User Impact/What to Tell Users:**
+
+
+## Sbws and Torflow Comparison
+
+* **Parameters**:
+* **Metrics**:
+* **Expected Results**:
+* **Potential Sources of Model Error**:
+* **Anonymity Effects**:
+* **Instrumentation Needed To Verify Operation:**:
+* **Abort Criteria**:
+* **User Impact/What to Tell Users:**
+
+## Experiment Template
+
+* **Parameters**:
+* **Metrics**:
+* **Expected Results**:
+* **Potential Sources of Model Error:**:
+* **Instrumentation Needed To Verify Operation:**:
+* **Anonymity Effects**:
+* **Abort Criteria**:
+* **User Impact/What to Tell Users:**