The specifications for measuring this are as follows:
OP should measure one guard or set of guards at a time. It could use NumEntryGuards in the torrc file to support more than 1 guard
A new guard/set of guards must be chosen after a day of measurement
All CBT data must be erased when choosing a new set of guards, after day of measurements (at UTC midnight). This could mean erasing or replacing the state file for CBT.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Currently guards are disabled in OP by setting UseEntryGuards=0 in the client torrc file. To enable them, UseEntryGuards should be set to 1, and additionally NumEntryGuards should be set to 1 (or a number >1 to test multiple guards). I have left an OP test instance running with this set to 3 to gather some data.
Purging the state:
To achieve this, the file called 'state' in the tor_client directory must be removed after log rotation. The guards previously measured could be extracted from this and added to the analysis output. The Tor process must be restarted/reloaded after the logs have rotated. All of this would only happen if the measurement mode is 'guard-enabled'.
Adding a new measurement mode:
A new mode should be made available to the cli, perhaps allowing the admin to specify how many guards to measure at once.
IIRC, we're using UseEntryGuards=0 for the tor process on both client and server side. If we start using guards for a limited time now, we should do so on both sides.
We should experiment with the time we want to keep guards static. That time could range from (a) five minutes for a single measurement, (b) an hour, (c) a day, or even (d) several days.
A possible downside of changing guards at UTC midnight is that we might have a harder time identifying trends over time, because the choice of guards might overlay any other changes in the network.
If we pick a time that is too short, our results might be blurred by the stabilizing phase after choosing new guards.
Maybe we need to experiment with something like changing guards every hour and analyze how different the first few measurements in that hour are from those towards the end of the hour.
Rather than removing the state file we might try out the DROPGUARDS controller command which is supposed to achieve the same thing. What it might not do is remove circuit build timeout state, but maybe Tor is smart enough to consider the event of dropping all guards as drastic enough network change to reset the timeout back to the default and send a BUILDTIMEOUT_SET RESET event---I haven't checked. Note that even after going back to defaults, the first measurement or two will likely be different from those afterwards, because Tor will have to learn what a good timeout is with the new guard(s). Maybe it doesn't matter if we let Tor learn itself that something has changed. This is related to the previous thought on how often to change guards.
Leaving this ticket assigned to metrics-team. If somebody wants to grab it, please do!
The function to reset buildtimeout is circuit_build_times_reset(). It is called when there are too many timeouts. It is not called via DROPGUARDS.
We could make a DROPTIMEOUTS or similar command just like DROPGUARDS, that calls circuit_build_times_reset(), if that is simpler than removing the state file. I don't think DROPGUARDS should necessarily automatically reset CBT.
It takes 100 circuits to learn a circuit build timeout. During this phase, circuits are launched roughly every 10 seconds. So it takes about 1000 seconds to learn a timeout, at which point the BUILDTIMEOUT_SET COMPUTED event will be delivered again.
During this time, fix-guards onionperf should not record perf measurements between RESET and SET (as per #33420 (moved)).
It makes sense that BUILDTIMEOUT_SET events other than COMPUTED are rare in onionperf production instances, because CBT only resets after many timeouts, and only SUSPENDs if TLS activity stops.
The idea of using a controller command for dropping timeouts rather than removing the state file came from robgjansen who was thinking about running similar experiments in Shadow. I'd say we should at least give it a try and see how complicated it is to implement such a command. Maybe we'll get help from friendly network team people.
Still leaving this ticket assigned to metrics-team to be picked up. It's certainly not a tiny amount of work, but that's already reflected in the 4.0 points estimated for this ticket. If somebody picks it up, please remember to release early and often by sharing intermediate results on this ticket. Thanks!
One additional wrinkle: circuit_build_times_reset() does not emit a BUILDTIMEOUT_SET RESET event by itself. For sanity, I am guessing the DROPTIMEOUTS command should cause this RESET event to get emitted.
This DROPTIMEOUTS command should be a relatively simple patch. If you need it, I can probably hack that up in an hour or two.
One additional wrinkle: circuit_build_times_reset() does not emit a BUILDTIMEOUT_SET RESET event by itself. For sanity, I am guessing the DROPTIMEOUTS command should cause this RESET event to get emitted.
This DROPTIMEOUTS command should be a relatively simple patch. If you need it, I can probably hack that up in an hour or two.
I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.
This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.
I just moved the discussion of DROPTIMEOUTS to #33420 (moved). Let's focus on static guards in this ticket and leave everything related to circuit build timeouts for #33420 (moved). It might be that we'll want to use both features together once they exist, but development can happen in parallel in these two tickets.
I just noticed that DROPGUARDS has a call to or_state_mark_dirty() buried deep in its callpath. I did not do this for DROPTIMEOUTS, but it is easy enough to throw a call in there.
This should only matter if there is a risk of restarting or SIGHUPing the tor process right after DROPTIMEOUTS. The CBT code will mark the state file dirty again as soon as it records 10 circuit build times.