As I understand this, the results should record whether the CBT was set at the time of measurement. This could just be a CBT_SET True/False field in the tpf results file. The CBT resets in certain conditions, and this should be reflected in the results. The events should be parsed from the tor logs.
We should check with Mike if this meets the requirements.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I'm afraid I don't understand what needs to be done here.
Here's what we have right now: OnionPerf logs BUILDTIMEOUT_SET and CIRC events and includes TIMEOUT_MS and CUTOFF_QUANTILE values from the last known BUILDTIMEOUT_SET event when parsing a CIRC event with status LAUNCHED. The result is the following structure in the analysis .json file:
Note that not all circuit objects in OnionPerf's analysis .json files contain these two fields. OnionPerf analyzes (processes) log files on a daily basis. Now, if a CIRC even with status LAUNCHED gets fired before the first BUILDTIMEOUT_SET event arrives for the day, it won't be able to tell the quantile and timeout used for building this circuit. The result is that these two fields are missing, which we should treat as "we don't know what the values were" rather than "there was no timeout set at the time".
Also note that I have not yet seen a BUILDTIMEOUT_SET event of another type than COMPUTED in the wild. I scanned all past logs from op-hk and op-hk2, but did not find a single one. I know they are specified and implemented, but I just haven't seen them. I would have expected them to occur on op-hk and op-hk2, because those typically have the most interesting network conditions of all OnionPerf instances.
So, what is this ticket about? Do we want to include more information from the BUILDTIMEOUT_SET event in the circuit object? If so, what are we interested in?
And once we have these additional fields (assuming this is what the ticket is about), what are we going to do with them? Should we visualize this information, or is it sufficient to keep it in the analysis .json files for later use?
Is the goal here to prepare for measuring changes in a future Tor network that fires wildly different BUILDTIMEOUT_SET events than what we see today? If so, what can we expect?
Setting to needs_information and cc'ing mikeperry and gaba to shed light on this before we implement something at random.
Trac: Cc: acute to acute, mikeperry, gaba Status: accepted to needs_information
It makes sense that BUILDTIMEOUT_SET events other than COMPUTED are rare in onionperf production instances, because CBT only resets after many timeouts, and only SUSPENDs if TLS activity stops.
For the CBT experiments, the goal is to only include (or only perform) measurements when there is a buildtime set. This means either discard or don't measure between BUILDTIMEOUT_SET RESET and BUILDTIMEOUT_SET COMPUTED. We will also want to include TIMEOUT_RATE and CLOSE_RATE, because we will want to see how close those match CUTOFF_QUANTILE in practice, as we vary CUTOFF_QUANTILE from the consensus.
We will also want to be able to filter the results to include/exclude certain guard identities and sets of guard nodes. But we'll want to do this kind of filtering generally.
I think that not performing measurements between BUILDTIMEOUT_SET RESET and BUILDTIMEOUT_SET COMPUTED will be difficult. But we can exclude those measurements in the visualization step. We'll likely want to add a command-line switch for that. It might be easier to first do #34231 (moved) and #33260 (moved) and then work on this ticket.
I'm moving mikeperry's DROPTIMEOUTS feature branch from #33399 (moved) here, because this ticket is about circuit build timeouts whereas #33399 (moved) is about static guards.
It's possible that we'll want to do both (use static guards and drop build timeouts when dropping guards), but the development effort can happen in separate tickets.
LMK if this looks good and I'll open a sub-ticket for network-team to merge.
mikeperry added a bit later:
I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.
This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.
I'll also add that we'll need a Stem patch in order to send the DROPTIMEOUTS command. We can add that later after adding the Tor patch and after being sure that it does what we need.
I briefly ran mikeperry's patch and want to share my experience here.
I had to patch the branch by calling the relevant functions as part of dropping guards. The only reason was that Stem (obviously) doesn't have a drop_timeouts() function yet. Here's my patch on top of mikeperry's branch:
use entry guards (which isn't technically required for this branch, but I did this when working on #33399 (moved)),
increase the NEWNYM interval from 5 minutes to 1 hour and drop guards and timeouts right after sending the NEWNYM command (even though this could have happened separately with NEWNYM being sent every 5 minutes and the two drop commands happening once per hour), and
distinguish BUILDTIMEOUT_SET COMPUTED events from other BUILDTIMEOUT_SET events.
Here's my patch:
diff --git a/onionperf/analysis.py b/onionperf/analysis.pyindex 20ca354..7913adc 100644--- a/onionperf/analysis.py+++ b/onionperf/analysis.py@@ -12,7 +12,7 @@ from socket import gethostname from abc import ABCMeta, abstractmethod # stem imports-from stem import CircEvent, CircStatus, CircPurpose, StreamStatus+from stem import CircEvent, CircStatus, CircPurpose, StreamStatus, TimeoutSetType from stem.response.events import CircuitEvent, CircMinorEvent, StreamEvent, BandwidthEvent, BuildTimeoutSetEvent from stem.response import ControlMessage, convert@@ -630,8 +630,12 @@ class TorCtlParser(Parser): self.bandwidth_summary['bytes_written'][int(arrival_dt)] = event.written def __handle_buildtimeout(self, event, arrival_dt):- self.build_timeout_last = event.timeout- self.build_quantile_last = event.quantile+ if event.set_type == TimeoutSetType.COMPUTED:+ self.build_timeout_last = event.timeout+ self.build_quantile_last = event.quantile+ else:+ self.build_timeout_last = None+ self.build_quantile_last = None def __handle_event(self, event, arrival_dt): if isinstance(event, (CircuitEvent, CircMinorEvent)):diff --git a/onionperf/measurement.py b/onionperf/measurement.pyindex 4a58bc4..c8fc9f8 100644--- a/onionperf/measurement.py+++ b/onionperf/measurement.py@@ -388,8 +388,8 @@ WarnUnsafeSocks 0\nSafeLogging 0\nMaxCircuitDirtiness 60 seconds\nDataDirectory tor_config = tor_config + f.read() if name == "client" and self.additional_client_conf: tor_config += self.additional_client_conf- if not 'UseEntryGuards' in tor_config and not 'UseBridges' in tor_config:- tor_config += "UseEntryGuards 0\n"+ #if not 'UseEntryGuards' in tor_config and not 'UseBridges' in tor_config:+ #tor_config += "UseEntryGuards 0\n" if name == "server" and self.single_onion: tor_config += "HiddenServiceSingleHopMode 1\nHiddenServiceNonAnonymousMode 1\n" return tor_config@@ -466,7 +466,7 @@ WarnUnsafeSocks 0\nSafeLogging 0\nMaxCircuitDirtiness 60 seconds\nDataDirectory time.sleep(3) torctl_events = [e for e in monitor.get_supported_torctl_events() if e not in ['DEBUG', 'INFO', 'NOTICE', 'WARN', 'ERR']]- newnym_interval_seconds = 300+ newnym_interval_seconds = 3600 torctl_args = (control_port, torctl_writable, torctl_events, newnym_interval_seconds, self.done_event) torctl_helper = threading.Thread(target=monitor.tor_monitor_run, name="torctl_{0}_helper".format(name), args=torctl_args) torctl_helper.start()diff --git a/onionperf/monitor.py b/onionperf/monitor.pyindex 5387bff..074876e 100644--- a/onionperf/monitor.py+++ b/onionperf/monitor.py@@ -64,6 +64,7 @@ class TorMonitor(object): if newnym_interval_seconds is not None and interval_count >= newnym_interval_seconds: interval_count = 0 torctl.signal(Signal.NEWNYM)+ torctl.drop_guards() except KeyboardInterrupt: pass # the user hit ctrl+c
I'm attaching the OnionPerf analysis file from running this branch for a couple hours. When you grep for "build", you'll find that there alternating batches of circuits having "build_quantile" and "build_timeout" fields followed by batches of circuits without these two fields. In the latter case there was no build timeout set.
Reassigning to metrics-team, so that others can look into mikeperry's patch and also think about remaining steps here. This ticket is far from being done.
Trac: Owner: karsten to metrics-team Status: needs_information to assigned
I have not dug deep into this, but as a quick note: recall that it takes at least 1000 seconds after you get a BUILDTIMEOUT_SET RESET event before you'll get a BUILDTIMEOUT_SET COMPUTED event. This means there will be no build_quantile or build_timeout value during this time.
So without filtering those measurements, you will have plenty of measurements without a build_quantile and a build_timeout. This is normal. Unless you were already filtering these measurements, in which case, there is an issue with that filter in the onionperf codebase. I hope that is not an issue!