Profile, identify code bottlenecks, and optimize
I've been avoiding premature optimization in my code so far, but there are probably places where we can get a lot faster. We should identify them via profiling and fix them.
Some situations to experiment with are:
- Bootstrapping a directory
- Building a bunch of circuits
- Running when offline (also see #311, #329 (closed))
- Bootstrapping failure conditions (see #329 (closed))
- Primary guard unreachable
- Primary guards go down after bootstrap
- Data transfer
- Data transfer with a lot of circuits
- Data transfer with a lot of streams
- Huge number of socks connections/connection attempts at once
We won't know what followup work to do here until we've got some initial profiling information. We should make sure that the tests above are repeatable, so we can re-profile from time to time. The
arti-bench crate would be a good place.
We should make sure that we measure CPU and RAM usage; both are critical for mobile users. Probably the
tokio-metrics crate would help too.
Tools to use include:
valgrind --tool=massifand massif-visualizer
#377 (closed) hex decoding shows up a lot, and allocates a lot
#383 (closed) Error::Internal allocates a backtrace and is called unconditionally in tor-circmgr.
#384 (closed) Intern relay families to save memory.
#385 (closed) Intern protover entries to save memory.
#386 (closed) Needless slack space in hashmaps
#387 (closed) Make GenericRouterstatus smaller.
#388 (closed) Call shrink_to_fit on missing microdescs hashmap
#389 Use less intermediate RAM to load microdescriptors from sqlite
#390 Stream directory responses to save memory and latency
arti-benchshould allocate less for receive buffers.
#392 (closed) investigate whether we can find faster aes-ctr and/or sha1 implementations.
#393 Don't re-verify so much cryptography on startup (maybe)
#441 (closed) Use sha1/asm?
#442 (closed) Use openssl sha1 and aes?