Add CBT events to Onionperf result files

Trac:
Parent Ticket: #33321 (moved)

added actualpoints::0.3 component::metrics/onionperf metrics-team-roadmap-2020 metrics-team-roadmap-2020-june owner::metrics-team parent::33321 points::2.0 priority::medium severity::normal sponsor::59-must status::assigned type::enhancement labels

Trac:
Keywords: N/A deleted, metrics-team-roadmap-2020 added
Points: N/A to 1
Sponsor: N/A to Sponsor59

Moving to Sponsor59-must, because we should really do these in order to call Sponsor59 done.

Trac:
Sponsor: Sponsor59 to Sponsor59-must

Trac:
Parent: #33419 (moved) to #33321 (moved)

Looking into this.

Trac:
Owner: metrics-team to karsten
Status: new to accepted

I'm afraid I don't understand what needs to be done here.

Here's what we have right now: OnionPerf logs BUILDTIMEOUT_SET and CIRC events and includes TIMEOUT_MS and CUTOFF_QUANTILE values from the last known BUILDTIMEOUT_SET event when parsing a CIRC event with status LAUNCHED. The result is the following structure in the analysis .json file:

          "15965": {
            "build_quantile": 0.8, <---------------------------- from CUTOFF_QUANTILE
            "build_timeout": 1500, <---------------------------- from TIMEOUT_MS
            "buildtime_seconds": 0.25999999046325684,
            "circuit_id": 15965,
            "elapsed_seconds": [
              [
                "GENERAL:LAUNCHED",
                0.0
              ],
              [
                "GENERAL:EXTENDED",
                0.07999992370605469
              ],
              [
                "GENERAL:EXTENDED",
                0.14999985694885254
              ],
              [
                "GENERAL:EXTENDED",
                0.25999999046325684
              ],
              [
                "GENERAL:BUILT",
                0.25999999046325684
              ],
              [
                "GENERAL:CLOSED",
                670.9900000095367
              ]
            ],
            "path": [
              [
                "$1CD48F4ED0F1821FFBF1940802A13EEFD4C27502~Piratenpartei00",
                0.07999992370605469
              ],
              [
                "$8C00FA7369A7A308F6A137600F0FA07990D9D451~GrmmlLitavis",
                0.14999985694885254
              ],
              [
                "$A99AC59B0C94FBBBC44FA2660CEE1A1362F4C649~deepSpace1001",
                0.25999999046325684
              ]
            ],
            "unix_ts_end": 1590624905.93,
            "unix_ts_start": 1590624234.94
          },

Note that not all circuit objects in OnionPerf's analysis .json files contain these two fields. OnionPerf analyzes (processes) log files on a daily basis. Now, if a CIRC even with status LAUNCHED gets fired before the first BUILDTIMEOUT_SET event arrives for the day, it won't be able to tell the quantile and timeout used for building this circuit. The result is that these two fields are missing, which we should treat as "we don't know what the values were" rather than "there was no timeout set at the time".

Also note that I have not yet seen a BUILDTIMEOUT_SET event of another type than COMPUTED in the wild. I scanned all past logs from op-hk and op-hk2, but did not find a single one. I know they are specified and implemented, but I just haven't seen them. I would have expected them to occur on op-hk and op-hk2, because those typically have the most interesting network conditions of all OnionPerf instances.

So, what is this ticket about? Do we want to include more information from the BUILDTIMEOUT_SET event in the circuit object? If so, what are we interested in?

And once we have these additional fields (assuming this is what the ticket is about), what are we going to do with them? Should we visualize this information, or is it sufficient to keep it in the analysis .json files for later use?

Is the goal here to prepare for measuring changes in a future Tor network that fires wildly different BUILDTIMEOUT_SET events than what we see today? If so, what can we expect?

Setting to needs_information and cc'ing mikeperry and gaba to shed light on this before we implement something at random.

Trac:
Cc: acute to acute, mikeperry, gaba
Status: accepted to needs_information

Adding my 0.3 actual points so far.

Trac:
Actualpoints: N/A to 0.3

It makes sense that BUILDTIMEOUT_SET events other than COMPUTED are rare in onionperf production instances, because CBT only resets after many timeouts, and only SUSPENDs if TLS activity stops.

For the CBT experiments, the goal is to only include (or only perform) measurements when there is a buildtime set. This means either discard or don't measure between BUILDTIMEOUT_SET RESET and BUILDTIMEOUT_SET COMPUTED. We will also want to include TIMEOUT_RATE and CLOSE_RATE, because we will want to see how close those match CUTOFF_QUANTILE in practice, as we vary CUTOFF_QUANTILE from the consensus.

We will also want to be able to filter the results to include/exclude certain guard identities and sets of guard nodes. But we'll want to do this kind of filtering generally.

Thanks for the clarification!

I think that not performing measurements between BUILDTIMEOUT_SET RESET and BUILDTIMEOUT_SET COMPUTED will be difficult. But we can exclude those measurements in the visualization step. We'll likely want to add a command-line switch for that. It might be easier to first do #34231 (moved) and #33260 (moved) and then work on this ticket.

Adding all this tickets to the OnionPerf roadmap for June.

Trac:
Keywords: N/A deleted, metrics-team-roadmap-2020-june added

Adding 1.0 points from #33422 (moved) which I'm going to close as duplicate of this ticket now.

Trac:
Points: 1 to 2.0

I'm moving mikeperry's DROPTIMEOUTS feature branch from #33399 (moved) here, because this ticket is about circuit build timeouts whereas #33399 (moved) is about static guards.

It's possible that we'll want to do both (use static guards and drop build timeouts when dropping guards), but the development effort can happen in separate tickets.

mikeperry wrote:

https://github.com/mikeperry-tor/tor/commits/droptimeouts provides this functionality.

https://github.com/mikeperry-tor/torspec/commits/droptimeouts provides the spec.

LMK if this looks good and I'll open a sub-ticket for network-team to merge.

mikeperry added a bit later:

I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.

This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.

I'll also add that we'll need a Stem patch in order to send the DROPTIMEOUTS command. We can add that later after adding the Tor patch and after being sure that it does what we need.

I briefly ran mikeperry's patch and want to share my experience here.

I had to patch the branch by calling the relevant functions as part of dropping guards. The only reason was that Stem (obviously) doesn't have a drop_timeouts() function yet. Here's my patch on top of mikeperry's branch:

diff --git a/src/feature/control/control_cmd.c b/src/feature/control/control_cmd.c
index 74d542ec6d..3cd586ccff 100644
--- a/src/feature/control/control_cmd.c
+++ b/src/feature/control/control_cmd.c
@@ -1393,8 +1393,12 @@ handle_control_dropguards(control_connection_t *conn,
   }
 
   remove_all_entry_guards();
+  circuit_build_times_reset(get_circuit_build_times_mutable());
   send_control_done(conn);
 
+  cbt_control_event_buildtimeout_set(get_circuit_build_times(),
+                                     BUILDTIMEOUT_SET_EVENT_RESET);
+
   return 0;
 }

I then patched OnionPerf to:

use entry guards (which isn't technically required for this branch, but I did this when working on #33399 (moved)),
increase the NEWNYM interval from 5 minutes to 1 hour and drop guards and timeouts right after sending the NEWNYM command (even though this could have happened separately with NEWNYM being sent every 5 minutes and the two drop commands happening once per hour), and
distinguish BUILDTIMEOUT_SET COMPUTED events from other BUILDTIMEOUT_SET events.

Here's my patch:

diff --git a/onionperf/analysis.py b/onionperf/analysis.py
index 20ca354..7913adc 100644
--- a/onionperf/analysis.py
+++ b/onionperf/analysis.py
@@ -12,7 +12,7 @@ from socket import gethostname
 from abc import ABCMeta, abstractmethod
 
 # stem imports
-from stem import CircEvent, CircStatus, CircPurpose, StreamStatus
+from stem import CircEvent, CircStatus, CircPurpose, StreamStatus, TimeoutSetType
 from stem.response.events import CircuitEvent, CircMinorEvent, StreamEvent, BandwidthEvent, BuildTimeoutSetEvent
 from stem.response import ControlMessage, convert
 
@@ -630,8 +630,12 @@ class TorCtlParser(Parser):
         self.bandwidth_summary['bytes_written'][int(arrival_dt)] = event.written
 
     def __handle_buildtimeout(self, event, arrival_dt):
-        self.build_timeout_last = event.timeout
-        self.build_quantile_last = event.quantile
+        if event.set_type == TimeoutSetType.COMPUTED:
+            self.build_timeout_last = event.timeout
+            self.build_quantile_last = event.quantile
+        else:
+            self.build_timeout_last = None
+            self.build_quantile_last = None
 
     def __handle_event(self, event, arrival_dt):
         if isinstance(event, (CircuitEvent, CircMinorEvent)):
diff --git a/onionperf/measurement.py b/onionperf/measurement.py
index 4a58bc4..c8fc9f8 100644
--- a/onionperf/measurement.py
+++ b/onionperf/measurement.py
@@ -388,8 +388,8 @@ WarnUnsafeSocks 0\nSafeLogging 0\nMaxCircuitDirtiness 60 seconds\nDataDirectory
                 tor_config = tor_config + f.read()
         if name == "client" and self.additional_client_conf:
             tor_config += self.additional_client_conf
-        if not 'UseEntryGuards' in tor_config and not 'UseBridges' in tor_config:
-            tor_config += "UseEntryGuards 0\n"
+        #if not 'UseEntryGuards' in tor_config and not 'UseBridges' in tor_config:
+            #tor_config += "UseEntryGuards 0\n"
         if name == "server" and self.single_onion:
             tor_config += "HiddenServiceSingleHopMode 1\nHiddenServiceNonAnonymousMode 1\n"
         return tor_config
@@ -466,7 +466,7 @@ WarnUnsafeSocks 0\nSafeLogging 0\nMaxCircuitDirtiness 60 seconds\nDataDirectory
         time.sleep(3)
 
         torctl_events = [e for e in monitor.get_supported_torctl_events() if e not in ['DEBUG', 'INFO', 'NOTICE', 'WARN', 'ERR']]
-        newnym_interval_seconds = 300
+        newnym_interval_seconds = 3600
         torctl_args = (control_port, torctl_writable, torctl_events, newnym_interval_seconds, self.done_event)
         torctl_helper = threading.Thread(target=monitor.tor_monitor_run, name="torctl_{0}_helper".format(name), args=torctl_args)
         torctl_helper.start()
diff --git a/onionperf/monitor.py b/onionperf/monitor.py
index 5387bff..074876e 100644
--- a/onionperf/monitor.py
+++ b/onionperf/monitor.py
@@ -64,6 +64,7 @@ class TorMonitor(object):
                     if newnym_interval_seconds is not None and interval_count >= newnym_interval_seconds:
                         interval_count = 0
                         torctl.signal(Signal.NEWNYM)
+                        torctl.drop_guards()
             except KeyboardInterrupt:
                 pass  # the user hit ctrl+c

I'm attaching the OnionPerf analysis file from running this branch for a couple hours. When you grep for "build", you'll find that there alternating batches of circuits having "build_quantile" and "build_timeout" fields followed by batches of circuits without these two fields. In the latter case there was no build timeout set.

Reassigning to metrics-team, so that others can look into mikeperry's patch and also think about remaining steps here. This ticket is far from being done.

Trac:
Owner: karsten to metrics-team
Status: needs_information to assigned

Trac:
onionperf.analysis.json.xz

OnionPerf analysis JSON file from running mikeperry's branch plus some more patches

I have not dug deep into this, but as a quick note: recall that it takes at least 1000 seconds after you get a BUILDTIMEOUT_SET RESET event before you'll get a BUILDTIMEOUT_SET COMPUTED event. This means there will be no build_quantile or build_timeout value during this time.

So without filtering those measurements, you will have plenty of measurements without a build_quantile and a build_timeout. This is normal. Unless you were already filtering these measurements, in which case, there is an issue with that filter in the onionperf codebase. I hope that is not an issue!

At any rate, I committed the state file sync to the DROPTIMEOUTS command: https://github.com/mikeperry-tor/tor/commit/2e341098f9388e02d849feca161d8992c2645427

https://github.com/mikeperry-tor/tor/commits/droptimeouts

changed time estimate to 16h

added 2h 24m of time spent

mentioned in issue #33422 (moved)

mentioned in issue #33321 (moved)

Add CBT events to Onionperf result files

Child items 0

Activity