Trac issueshttps://gitlab.torproject.org/legacy/trac/-/issues2022-03-04T12:51:36Zhttps://gitlab.torproject.org/legacy/trac/-/issues/33076Graph consensus and vote information from Rob's experiments2022-03-04T12:51:36ZMike PerryGraph consensus and vote information from Rob's experimentsThis is a ticket for the work to graph the historical onionperf data from Rob's relay flooding experiment.
Some discussion threads:
https://lists.torproject.org/pipermail/tor-scaling/2019-December/000077.html
https://lists.torproject.or...This is a ticket for the work to graph the historical onionperf data from Rob's relay flooding experiment.
Some discussion threads:
https://lists.torproject.org/pipermail/tor-scaling/2019-December/000077.html
https://lists.torproject.org/pipermail/tor-scaling/2020-January/000081.html
Basically, we want to have a standard way to graph results from key metrics from before, during, and after the experiment.
In this case, we want CDF-TTFB, CDF-DL from onionperf results.
We also want CDF-Relay-Stream-Capacity and CDF-Relay-Utilization for the consensus, as well as from the votes, to see if the votes from TorFlow drastically differ from sbws during the experiment.
https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor/PerformanceMetrics
**Update from June 10, 2020: We finished the CDF-TTFB and CDF-DL portions by adding these graphs to OnionPerf's visualize mode. The remaining parts are the CDF-Relay-* graphs that are based on consensuses and votes. Keep this in mind when reading comments up to June 10, 2020.**https://gitlab.torproject.org/legacy/trac/-/issues/33421Track which Guard is used for experimental measurements2020-06-16T11:41:01ZAna CusturaTrack which Guard is used for experimental measurementsSometimes tor uses Guards other than its main one; we need to differentiate this in the Onionperf results from measurement rotation events.Sometimes tor uses Guards other than its main one; we need to differentiate this in the Onionperf results from measurement rotation events.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/33434Allow users to select Onion Service version to measure2020-06-13T18:15:35ZAna CusturaAllow users to select Onion Service version to measureCurrently OP runs measurements using both v2 and v3 Onion Services by default, when the user may only wish to only test one of them.Currently OP runs measurements using both v2 and v3 Onion Services by default, when the user may only wish to only test one of them.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/33391Add new metadata fields and definitions2020-06-13T18:15:33ZAna CusturaAdd new metadata fields and definitionsDefine the instance metadata fields to help us differentiate experimental measurements.Define the instance metadata fields to help us differentiate experimental measurements.https://gitlab.torproject.org/legacy/trac/-/issues/33323O2.1 Add instance metadata: We need a way to distinguish our current four lon...2020-06-13T18:15:33ZGabagaba@torproject.orgO2.1 Add instance metadata: We need a way to distinguish our current four long-term OnionPerf measurements that are automatically published to the Metrics portal from short-term experimental measurements.In this task, we will add instance metadata to OnionPerf’s results in order to differentiate each experiment; we will store that data along with the actual measurement data in a separate, single archive. Without this task, we will be una...In this task, we will add instance metadata to OnionPerf’s results in order to differentiate each experiment; we will store that data along with the actual measurement data in a separate, single archive. Without this task, we will be unable to distinguish the new experiments from the currently running OnionPerf instances, which makes the data collected unusable.https://gitlab.torproject.org/legacy/trac/-/issues/33397Update metrics-web to only plot "official" data2020-06-13T18:15:32ZAna CusturaUpdate metrics-web to only plot "official" dataUpdate metrics-web to only plot "official" data based on new metadata field for existing graphs. This depends on implementations described in #33323 and #33391 =>#33393Update metrics-web to only plot "official" data based on new metadata field for existing graphs. This depends on implementations described in #33323 and #33391 =>#33393https://gitlab.torproject.org/legacy/trac/-/issues/24041Unify Metrics' products operational configuration2020-06-13T18:14:28ZKarsten LoesingUnify Metrics' products operational configurationThe method/process of deployment configuration of Metrics' products should be unified.
Afterwards, we look for the 'tool' or lib helping to achieve a simplification.
-----
The thought that triggered this ticket:
Looks like metrics-bot r...The method/process of deployment configuration of Metrics' products should be unified.
Afterwards, we look for the 'tool' or lib helping to achieve a simplification.
-----
The thought that triggered this ticket:
Looks like metrics-bot recently switched to using Apache Commons Configuration as configuration library (#23933). We should consider doing the same for other metrics parts, maybe after waiting a bit and hearing how that works out.https://gitlab.torproject.org/legacy/trac/-/issues/32268second new onionoo backend2020-06-13T18:10:32Zweasel (Peter Palfrader)second new onionoo backendNow that we know how to set up new onionoo backends, we can make another one. (Then we can retire the old ones.)
We'll probably want basically a copy of onionoo-backend-01, probably also on gnt-fsn.
irl can set it up eventually, but no...Now that we know how to set up new onionoo backends, we can make another one. (Then we can retire the old ones.)
We'll probably want basically a copy of onionoo-backend-01, probably also on gnt-fsn.
irl can set it up eventually, but not in the next few days as he's otherwise engaged. However, if we get the instance set up soonish, he may find the time to set up the service in the next few weeks.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/28271Check OnionPerf instances from Nagios2020-06-13T18:10:23ZirlCheck OnionPerf instances from NagiosThere are a few things that we can check, some are easier than others.
* Is the host up and the webserver running? (this is easy with built-in checks)
* Is the tgen server running on the Internet? (this is easy with built-in checks)
* I...There are a few things that we can check, some are easier than others.
* Is the host up and the webserver running? (this is easy with built-in checks)
* Is the tgen server running on the Internet? (this is easy with built-in checks)
* Is the analyze task running? (needs a plugin)
* Is the tgen server running on an Onion service? (needs a plugin)
For monitoring the Onion service, I'm looking at reusable plugins, so there are two tests. One checks to see how old the descriptor is and a second test actually tries connecting to the service. The first of these tests is affected by #28269 (but not blocked) and both are blocked by [[https://github.com/robgjansen/onionperf/issues/42|onionperf#42]].
As a workaround for monitoring the Onion service, which really is the bit that is breaking, we can instead monitor the analysis of timeouts from Tor Metrics' CSV files.https://gitlab.torproject.org/legacy/trac/-/issues/28327Make sure that each service has at least two operators2020-06-13T18:10:23ZKarsten LoesingMake sure that each service has at least two operatorsWe should make sure that each service has at least two operators.
This is already the case with CollecTor and Onionoo. But we're still missing two services: metrics-web and ExoneraTor. Here are the required steps:
- metrics-web:
- ...We should make sure that each service has at least two operators.
This is already the case with CollecTor and Onionoo. But we're still missing two services: metrics-web and ExoneraTor. Here are the required steps:
- metrics-web:
- Go through the process of building the war file (`ant war`), uploading it to meronense, updating the start script, stopping the running Jetty process, and starting the new one.
- Explain how to update the R code and restart the Rserve process.
- Update the Java code used by data-processing modules, while keeping necessary local changes, after making sure that there's no update going on at the time.
- ExoneraTor:
- Build and upload a new jar file and replace the one that is used in the hourly running database importer.
- Build and upload a new war file and replace the web part of ExoneraTor that runs behind https://exonerator.tp.o/query.json.
We might streamline processes while adding a second operator, though we should be careful not to get distracted too much by that.https://gitlab.torproject.org/legacy/trac/-/issues/34316Make -o/-i arguments mutually exclusive2020-06-13T18:04:42ZKarsten LoesingMake -o/-i arguments mutually exclusiveRight now, it's possible to specify both `-o` and `-i` at the same time, which doesn't make any sense.
Also, it's rather non-intuitive that both argument defaults are `True` and that setting either of them changes their value to `False`...Right now, it's possible to specify both `-o` and `-i` at the same time, which doesn't make any sense.
Also, it's rather non-intuitive that both argument defaults are `True` and that setting either of them changes their value to `False`.
I found this issue while working on #34216, but this issue seemed sufficiently different to justify having its own ticket. I'll post a patch for review in a minute.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/34257Analyze unusual distribution of time to extend to first hop in circuit2020-06-13T18:04:40ZKarsten LoesingAnalyze unusual distribution of time to extend to first hop in circuitI spent some time looking at OnionPerf measurements today. I found something that I did not expect: It seems like the time required to build the first hop in a circuit has a huge variance and rather unusual distribution. I'll attach a gr...I spent some time looking at OnionPerf measurements today. I found something that I did not expect: It seems like the time required to build the first hop in a circuit has a huge variance and rather unusual distribution. I'll attach a graph shortly that visualizes that.
Looking at that graph, we can see a few things:
- The three OnionPerf instances have very different performance regarding circuit extension. (I checked the earlier instances op-hk, op-nl, and op-us, and they had the same characteristics.)
- There are huge plateaus for op-us2 and op-nl2 in the first hop graph where _some_ circuits have been successfully extended and others not. Typically, we'd expect a distribution like op-nl2's, just pulled to the right. But that's not the case here. The blue line is special at around 0.6 seconds and the red line at around 0.9 seconds. In fact, the green line is also a bit special at around 0.3 seconds when it almost flattens, only to increase linearly until it reaches 100% at around 0.6 seconds.
- If we assume that the U.S. and Hong Kong are simply far away from many relays in a geographical sense, that doesn't explain why extending to the middle node and to the exit goes relatively fast even for those two hosts. Keep in mind that extending to the second hop requires a round-trip to the first hop, and that extending to the third hop requires a round-trip to the first and the second hop.
What's going on here? What properties of these relays should we be looking at? I already looked at:
- consensus weight,
- date/time of building these circuits, and
- whether these are just a small number of guards being reused over and over;
but nothing of these explained the shape of these ECDFs. I'm going to attach the data file that this graph is based on, if others want to take a look.
And would it make sense to try out running an OnionPerf instance on another set of hosts that are geographically close to our current hosts? Maybe it's related to how these hosts are set up, including their network? (Just in case that other set of hosts produces different results, we would still have to investigate how that affects our overall measurements of things like time to first byte or throughput.)
This is relevant to Sponsor 59, because we need to make sure that our current measurements are going to be a solid baseline for future experiments. Classifying as potential defect.https://gitlab.torproject.org/legacy/trac/-/issues/34031Figure out warning about unknown error type when exporting .tpf file2020-06-13T18:04:32ZKarsten LoesingFigure out warning about unknown error type when exporting .tpf fileI found this warning on an OnionPerf test instance:
```
2020-04-27 13:00:01 1587992401.168824 [onionperf] [INFO] saving analysis results to /home/cloud/onionperf-data/htdocs/op-nl2-51200-2020-04-27.tpf
2020-04-27 13:00:01 1587992401.169...I found this warning on an OnionPerf test instance:
```
2020-04-27 13:00:01 1587992401.168824 [onionperf] [INFO] saving analysis results to /home/cloud/onionperf-data/htdocs/op-nl2-51200-2020-04-27.tpf
2020-04-27 13:00:01 1587992401.169561 [onionperf] [WARNING] KeyError while exporting torperf file, missing key _PROXY_END_MISC_, skipping transfer 'transfer50k:2'
2020-04-27 13:00:01 1587992401.170384 [onionperf] [INFO] done!
```
I don't have time to look into this yet, but I'll attach log files to find out later.Ana CusturaAna Custurahttps://gitlab.torproject.org/legacy/trac/-/issues/34024Reduce timeout and stallout values2020-06-13T18:04:30ZKarsten LoesingReduce timeout and stallout valuesOn #33974 we discussed a suggestion to reduce timeouts for our three downloads as follows:
- 50 KiB download with 15 seconds timeout rather than 295 seconds,
- 1 MiB download with 60 seconds timeout rather than 1795 seconds, and
- 5 ...On #33974 we discussed a suggestion to reduce timeouts for our three downloads as follows:
- 50 KiB download with 15 seconds timeout rather than 295 seconds,
- 1 MiB download with 60 seconds timeout rather than 1795 seconds, and
- 5 MiB download with 120 seconds timeout rather than 3595 seconds.
Similarly, stallouts would be dropped entirely:
- 50 KiB download with 0 seconds stallout rather than 300 seconds,
- 1 MiB download with 0 seconds stallout rather than 1800 seconds, and
- 5 MiB download with 0 seconds stallout rather than 3600 seconds.
After discussing this with irl we concluded that we might want to pick values somewhere in the middle. The smaller values above are being used by TGen for generating load for Shadow simulations, in that case it makes sense to use timeouts similar to how users would behave. But in the measurements we're doing with OnionPerf we can easily record more data even after a human user would have given up and later filter out measurements taking longer than whatever timeouts we want to use.
In particular, it would be important for us to use timeouts that are higher than timeouts used internally by the Tor client, so that we can observe what happens in those cases. Even if a human user would long have given up.
How about we use timeouts and stallouts close to 5 minutes, so that we avoid overlapping measurements? Like 270 seconds for all three download sizes? What would we use as stallout value here? 0?Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/34023Reduce the number of 50 KiB downloads2020-06-13T18:04:30ZKarsten LoesingReduce the number of 50 KiB downloadsOn #33076 we discussed whether we should kill the 50 KiB downloads in deployed OnionPerfs and only keep the 1 MiB and 5 MiB downloads. The primary reason would be that our [throughput](https://metrics.torproject.org/onionperf-throughput....On #33076 we discussed whether we should kill the 50 KiB downloads in deployed OnionPerfs and only keep the 1 MiB and 5 MiB downloads. The primary reason would be that our [throughput](https://metrics.torproject.org/onionperf-throughput.html) graphs would be based on five times as many data points per day, because they only include 1 MiB and 5 MiB downloads, but not 50 KiB downloads. This would not affect our [circuit round-trip latencies graphs](https://metrics.torproject.org/onionperf-latencies.html) which include all three downloaded file sizes.
The main reason against killing 50 KiB downloads is that OnionPerfs would consume more bandwidth and also put more load on the Tor network. Let's consider two scenarios with and without 50 KiB downloads. In both scenarios we're making a new download every 5 minutes, randomly chosen with a weight of 1.0 for 5 MiB runs, 2.0 for 1 MiB runs, and either 12.0 or 0.0 for 50 KiB runs:
- With 50 KiB downloads we're downloading on average `12/15 * 50 KiB + 2/15 * 1 MiB + 1/15 * 5 MiB = 517 KiB` every 5 minutes, or `517 * 8 * 1024 / (300 * 1000) = 14 kbps`.
- Without 50 KiB downloads we're downloading on average `2/3 * 1 MiB + 1/3 * 5 MiB = 2389 KiB` every 5 minutes, or `2389 * 8 * 1024 / (300 * 1000) = 65 kbps`.
These numbers are both tiny in comparison to the overall network capacity and to other services like the bandwidth scanners.
I'm going to make this change and deploy it on new OnionPerf instances tomorrow, unless I hear objections here.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/33974Update OnionPerf to TGen 1.0.02020-06-13T18:04:28ZKarsten LoesingUpdate OnionPerf to TGen 1.0.0[TGen 1.0.0](https://github.com/shadow/tgen/releases/tag/v1.0.0) comes with a "change in the format of some of the configuration options that breaks compatibility with the previous version 0.0.1."
I tried to update OnionPerf to write ou...[TGen 1.0.0](https://github.com/shadow/tgen/releases/tag/v1.0.0) comes with a "change in the format of some of the configuration options that breaks compatibility with the previous version 0.0.1."
I tried to update OnionPerf to write out TGen files that TGen 1.0.0 understands. Here's the diff:
```
diff --git a/onionperf/model.py b/onionperf/model.py
index 3c057c5..90c824e 100644
--- a/onionperf/model.py
+++ b/onionperf/model.py
@@ -77,9 +77,9 @@ class TorperfModel(GeneratableTGenModel):
if self.socksproxy is not None:
g.node["start"]["socksproxy"] = self.socksproxy
g.add_node("pause", time="5 minutes")
- g.add_node("transfer50k", type="get", protocol="tcp", size="50 KiB", timeout="295 seconds", stallout="300 seconds")
- g.add_node("transfer1m", type="get", protocol="tcp", size="1 MiB", timeout="1795 seconds", stallout="1800 seconds")
- g.add_node("transfer5m", type="get", protocol="tcp", size="5 MiB", timeout="3595 seconds", stallout="3600 seconds")
+ g.add_node("stream50k", recvsize="50 KiB", timeout="295 seconds", stallout="300 seconds")
+ g.add_node("stream1m", recvsize="1 MiB", timeout="1795 seconds", stallout="1800 seconds")
+ g.add_node("stream5m", recvsize="5 MiB", timeout="3595 seconds", stallout="3600 seconds")
g.add_edge("start", "pause")
@@ -88,9 +88,9 @@ class TorperfModel(GeneratableTGenModel):
g.add_edge("pause", "pause")
# these are chosen with weighted probability, change edge 'weight' attributes to adjust probability
- g.add_edge("pause", "transfer50k", weight="12.0")
- g.add_edge("pause", "transfer1m", weight="2.0")
- g.add_edge("pause", "transfer5m", weight="1.0")
+ g.add_edge("pause", "stream50k", weight="12.0")
+ g.add_edge("pause", "stream1m", weight="2.0")
+ g.add_edge("pause", "stream5m", weight="1.0")
return g
@@ -109,10 +109,10 @@ class OneshotModel(GeneratableTGenModel):
g.add_node("start", serverport=self.tgen_port, peers=server_str, loglevel="info", heartbeat="1 minute")
if self.socksproxy is not None:
g.node["start"]["socksproxy"] = self.socksproxy
- g.add_node("transfer5m", type="get", protocol="tcp", size="5 MiB", timeout="15 seconds", stallout="10 seconds")
+ g.add_node("stream5m", recvsize="5 MiB", timeout="15 seconds", stallout="10 seconds")
- g.add_edge("start", "transfer5m")
- g.add_edge("transfer5m", "start")
+ g.add_edge("start", "stream5m")
+ g.add_edge("stream5m", "start")
return g
```
I'll let an OnionPerf instance run for a day to see at the output, also to see if we need to make adjustments to OnionPerf's analyze mode due to slightly changed log messages.
Until then, do these changes above look reasonable? Or did I miss something? Thanks!Ana CusturaAna Custurahttps://gitlab.torproject.org/legacy/trac/-/issues/33438OnionPerf: Scalability, Performance, Establishing Baseline Metrics2020-06-13T18:04:27ZGabagaba@torproject.orgOnionPerf: Scalability, Performance, Establishing Baseline MetricsIn this project, the Tor Metrics and Tor Network teams will work together to improve OnionPerf so that it is a useful tool for all developers and researchers. As a result of this project, we will be able to conduct more meaningful experi...In this project, the Tor Metrics and Tor Network teams will work together to improve OnionPerf so that it is a useful tool for all developers and researchers. As a result of this project, we will be able to conduct more meaningful experiments on the Tor network. Enhancing OnionPerf is a critical foundational step in the work to scale the Tor network.
The goals of this project are to:
* Make operational improvements to existing OnionPerf deployments and make it easier to deploy new OnionPerf instances;
* Expand the kinds of measurements OnionPerf can take by making improvements to its codebase; and
* Make improvements to the way we analyze performance metrics.
Teams involved:
network health team
network team
metrics team
research director
More information in https://trac.torproject.org/projects/tor/wiki/org/sponsors/Sponsor59https://gitlab.torproject.org/legacy/trac/-/issues/33435Document BASETORRC environment variable2020-06-13T18:04:27ZAna CusturaDocument BASETORRC environment variableOP can configure the Tor client through the BASETORRC environment variable. This should be added to the documentation with examples.OP can configure the Tor client through the BASETORRC environment variable. This should be added to the documentation with examples.Philipp Winterphw@torproject.orgPhilipp Winterphw@torproject.orghttps://gitlab.torproject.org/legacy/trac/-/issues/33433Add error handling for older stem versions2020-06-13T18:04:25ZAna CusturaAdd error handling for older stem versionsCurrently OP fails if the python-stem version required for enabling v3 services is not found. Instead, it should warn the user and either continue with v2 services only, or exit and ask the user to manually exclude v3 services on the com...Currently OP fails if the python-stem version required for enabling v3 services is not found. Instead, it should warn the user and either continue with v2 services only, or exit and ask the user to manually exclude v3 services on the command line.https://gitlab.torproject.org/legacy/trac/-/issues/33432Multiple downloads for oneshot mode2020-06-13T18:04:25ZAna CusturaMultiple downloads for oneshot modeCurrently, oneshot mode performs a singe 5M download and then exits. We should allow users to request multiple downloads, which will aid future OP testing setups.Currently, oneshot mode performs a singe 5M download and then exits. We should allow users to request multiple downloads, which will aid future OP testing setups.Philipp Winterphw@torproject.orgPhilipp Winterphw@torproject.org