On legacy/trac#7727 (moved) we had some profile results that we used to identify bottleneck functions in Tor to optimize. We should get some fresh profiles on 0.2.5.4-alpha once it's out (assuming we merge legacy/trac#9841 (moved)), and then see which of those bottlenecks need the most attention.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
One thought I had about that last profile, though: SHA1 was at the top of the list. OpenSSL's PRNG uses a bunch of SHA1, and is ridiculously slow for a userspace PRNG. If SHA1 turns up at the top of the list again, we should investigate whether it's our protocols' uses of SHA1, TLS's uses of SHA1, or OpenSSL's PRNG's uses of SHA1 that are most to blame.
What I most need is information about 0.2.5.4-alpha or later. 0.2.5.4-alpha has some performance improvements that should (I hope) affect the numbers a lot.
[But information about 0.2.5.2-alpha is still useful, because (a) it will let us know what stuff looked like before 0.2.5.4-alpha, and (b) it will help us figure out what the instructions for running perf should say.]
Was this version of Tor built without debugging symbols or something?
It was built with this settings:
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64 --disable-silent-rules --disable-dependency-tracking --disable-buf-freelists --enable-asciidoc --docdir=/usr/share/doc/tor-0.2.5.4_alpha-r1 --enable-instrument-downloads --disable-bufferevents --enable-curve25519 --disable-nat-pmp --enable-gcc-hardening --enable-linker-hardening --disable-transparent --enable-threads --disable-upnp --disable-tor2web-mode --disable-unittests --disable-coverage
Nickm, thank you for pointing out the correct way to use configure. For those Ubuntu users finding this ticket later; these are the magic spells needed on a Ubuntu machine:
Attached gprof results for run of latest master as a middle relay, relaying about 1.18G using 659 seconds of CPU time. Yeah, siphash is kinda expensive.
Nice; I hadn't thought to also try grpof. I wonder if siphash has also started showing up under perf; it didn't show up to high under any of the other perf profiles, but those were before the changes in master that started using siphash for the circid/channel to circuit mappings (legacy/trac#11750 (moved)).
Also, those results won't cover time spent in openssl. Can you get perf results too under current master?
Nice; I hadn't thought to also try grpof. I wonder if siphash has also started showing up under perf; it didn't show up to high under any of the other perf profiles, but those were before the changes in master that started using siphash for the circid/channel to circuit mappings (legacy/trac#11750 (moved)).
Also, those results won't cover time spent in openssl. Can you get perf results too under current master?
Not without a fair amount of pain-in-the-assery. I think I need different kernel options for it to work.
Nice; I hadn't thought to also try grpof. I wonder if siphash has also started showing up under perf; it didn't show up to high under any of the other perf profiles, but those were before the changes in master that started using siphash for the circid/channel to circuit mappings (legacy/trac#11750 (moved)).
Also, those results won't cover time spent in openssl. Can you get perf results too under current master?
I'd guess the easiest way to get OpenSSL results is to rebuild it with -pg and statically link to it.
Okay; I think we have enough info for 0.2.5.4-alpha. In 0.2.5.5-alpha and later, we should confirm that there are no new bottlenecks, and evaluate whether our fixes in legacy/trac#12170 (moved) and legacy/trac#12169 (moved) did any good.
Trac: Summary: Get a fresh set of relay/exit profiles on 0.2.5.4-alpha or later; optimize accordingly. to Get a fresh set of relay/exit profiles on 0.2.5.5-alpha or later; optimize bottlenecks if found
Attaching a fresh gprof output for a build linked against a profiled OpenSSL. This relayed 6.5G in 85 hours of wall clock time using 2075 seconds of CPU time.
Neat. I don't see anything in there that seems like a terrible issue. There are a few one-percent functions in Tor that could probably turn into 0-percent functions, but nothing that looks like it's a bug.
Unless somebody else sees an issue in this profile, I say we call this fixed and open a new ticket to look at profiles in 0.2.6. :)