Relevant to legacy/trac#13792 (moved), we (nickm and I) discussed what would be the way to collect and gather those statistics and measurements in a private network for performance analysis and debugging.
This would be completely deactivated at compile time meaning tracepoints would be NOP. You would have to explicitly enable that feature during the configure process.
We think the best way to go with that, in terms of code, is to go in a header with something that could look like this:
The really good thing about this approach is that 1) everything is centralized in one place, 2) you NOP the call if not configured thus no performance issue, 3) We can support lots of different tracers as well as shadow. It provides a simple way to hook any other tool in the trace event facility of the code.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I've talked to David a bit about this, and I do think it's a good way to run performance testing on private networks. It could also IMO make Shadow output easier to analyze, if the two facilities can be made to cooperate.
Preliminaries:
We should do a little research to make sure that there isn't some other important userspace instrumentation API that we should keep in mind for maintaining compatibility with. So far, it doesn't look like there is.
This should be, as noted above, controlled by a compile-time and a runtime option; both should be off by default.
We should make sure to document that this is for researchers and developers, not something that you should be running on a live relay.
As of now, I have 17 very useful tracepoints mostly in the hidden service subsystem. I don't think they are ready to be merged right now since we have yet to prove that all of them are useful to our performance measurement and that the data they record is actually what we really need (or any missing).
I suggest we move that to 0.2.7 because I would like to see that upstream soonish but with more eyes on the usefulness of the tracepoints. I think next tor dev. meeting, we should take a couple of minutes to discuss this in person with real use case I can demonstrate.
Trac: Summary: Add LTTng instrumentation to tor to Add instrumentation to tor Description: Relevant to legacy/trac#13792 (moved), we (nickm and I) discussed what would be the way to collect and gather those statistics and measurements in a private network for performance analysis and debugging.
Using LTTng (https://lttng.org) to gather events of the different subsystems of tor (in userspace of course) seems like a good solution. This would be completely deactivated at compile time meaning tracepoints would be NOP. You would have to explicitly enable that feature during the configure process.
We think the best way to go with that, in terms of code, is to go in a header with something that could look like this:
The really good thing about this approach is that 1) everything is centralized in one place, 2) you NOP the call if not configured thus no performance issue, 3) We can support other tool like DTrace or SystemTap and possibly Windows tools. It provides a simple way to hook any other tool in the trace event facility of the code.
to
Relevant to legacy/trac#13792 (moved), we (nickm and I) discussed what would be the way to collect and gather those statistics and measurements in a private network for performance analysis and debugging.
This would be completely deactivated at compile time meaning tracepoints would be NOP. You would have to explicitly enable that feature during the configure process.
We think the best way to go with that, in terms of code, is to go in a header with something that could look like this:
The really good thing about this approach is that 1) everything is centralized in one place, 2) you NOP the call if not configured thus no performance issue, 3) We can support lots of different tracers as well as shadow. It provides a simple way to hook any other tool in the trace event facility of the code. Cc: rob.g.jansen@nrl.navy.milto robgjansen
The crypto-instrumentation part of this is in for Nov if we can do it.
ACK. I have a branch that needs cleanup which adds the basics (build system integration and templates) to add any kind of low level instrumentation that we can then hook a tracer, shadow or any thing we would like that collect the data. I'll make sure to have it available soon for the crypto work to start.
For the crypto instrumentation, I'll need a Yawning or nickm for this. Basically, if you guys can give me callsite you want to instrument that is precise location in the code and payload to record, I can start adding them to the branch so you guys can see the mechanism and design involved.
Let me know if you would like to proceed differently. FYI, I plan in mid-November to work on this with Rob to instrument the connection buffers for his experiment and for that to be upstream also.
This would be a step that would be impossible to undo, and would be potentially very negative.
I see LTTng has pretty documentation, though. Why don't we use their documentation formatter?
This would be a step that would be impossible to undo, and would be potentially very negative.
Low-level instrumentation is indeed something that once added it has to be considered long live thus quite stable. I don't see that as negative because it's useful and as long as we think correctly about which event we add, it should be fine. The pros are beating the cons imo. It will become an external "ABI/API" like our control port.
I see LTTng has pretty documentation, though. Why don't we use their documentation formatter?
Not sure to understand here. LTTng has indeed pretty documentation on their website but it's first and foremost a tracer that can be hooked to those instrumentation event we'll add, not baked into tor.
I'm working with Rob (through legacy/trac#17598 (moved)) on adding bunch of instrumentation points and a shadow component to hook on those so Rob can conduct KIST experiment (with and without shadow).
The goal is to submit the instrumentation and a tracing framework with some meat around the bone so we can directly start using it once merged and other developers have a concrete example on how to add more in the future.
To look at the work or follow it, see branch ticket13802_028_01 which is still in heavy development (needs cleanup) but at least can give an idea of the direction and structure of things.
Depending on Rob's time to test and review, optimistic ETA is by the end of the month.
It is branched off of a slightly outdated release-0.2.8.
Differences from dgoulet's work
remove all/most of the included shadow traces
add shadow trace on inbuf
add shadow trace when cell is written to outbuf
add shadow trace when outbuf is flushed some
The shadow traces have been extensively tested as I'm using them. dgoulet's generalized code works wonderfully, and I don't think I really changed any of it.
IMO the code can be merged (after review) as my testing shows it's working. It's possible, though unlikely, that I'll need to change the shadow-specific traces.
Ok so I've cleaned up a bit the branch that is squashed all the commit into one and rebased it on master. I've also improved the comments and minor fixes.
I think this is ready for merge, it is working fine. Note that this is just the "tracing infrastructure". It is not used in the code base anywhere and you need to specifically enable it at compile time for the "tracepoint to log_debug()" framework (--enable-tracing-debug).
The point of having this skeleton is that once it's in, we can start adding tracepoint to subsystems that we want to trace using a specific framework (here we only have the log_debug() framework and even there it's very basic). For instance, KIST work at NRL are using this infrastructure to route their tracepoint to log_debug() but with some extra code around it. See doc/HACKING/Tracing.md for more information.
@nickm, if you prefer having it in use in the codebase before merging, I'm fine with it that means we differ this until we actually have tracepoint upstream. I think our friend at NRL can survive without it upstream yet I think because most of their stuff is a "new framework" plugged in this infrastructure.
Please rename 'tracing' to 'event tracing' everywhere. Otherwise, somebody will see the word "tracing" and freak out.
(I'm sure somebody will see the word "tracing" and freak out anyway, but let's at least give them a better chance to realize that it's silly to complain about debugging code.)