Document and maybe improve how we're mapping TGen transfers to Tor streams/circuits
OnionPerf uses TGen to make transfers using a local Tor client. OnionPerf also uses Stem to connect to the Tor client's control port and register for control events.
This ticket is about documenting how we can map TGen transfers to Tor streams and circuits. OnionPerf did this to produce the .tpf output format (which we just killed in legacy/trac#34141 (moved)). But we'll also need this functionality to implement legacy/trac#34218 (moved) or legacy/trac#33260 (moved).
Here's what we're doing in metrics-lib right now to map transfers and streams:
- Index Tor circuits by their circuit ID.
- Index Tor streams by their source port; if there are two or more streams with the same source port, remember them all.
- Go through TGen transfers one by one. For each, extract the local source port.
- Go through Tor streams with the same source port and check if transfer end and stream end happened within 150 seconds.
- If there's a match, look up the corresponding circuit by circuit ID.
Note that OnionPerf took a simpler approach for producing .tpf files by remembering just one stream by source port and not applying that 150 seconds heuristic. The result was that some mappings were wrong. The approach taken by metrics-lib leads to a few missing mappings (probably as many as OnionPerf had), and apparently no wrong mappings.
Is there a way to have an exact mapping that doesn't require a heuristic? And is there a way to do it without having to wait for transfer and stream to end?