Simplify some costly Tor functions (by profile)

mo attached a few profiles to #7572 (moved), from a fast host with aesni The top functions overall are are:

44222784  8.0379  libcrypto.so.1.0.0       libcrypto.so.1.0.0       sha1_block_data_order_avx
39059344  7.0994  nf_conntrack             nf_conntrack             /nf_conntrack
35552271  6.4620  libcrypto.so.1.0.0       libcrypto.so.1.0.0       bn_sqr4x_mont
31025085  5.6391  libcrypto.so.1.0.0       libcrypto.so.1.0.0       aesni_cbc_sha1_enc_avx
17425081  3.1672  tor                      tor                      circuit_get_by_rend_token_and_purpose.constprop.11
17185351  3.1236  libc-2.15.so             libc-2.15.so             /lib/x86_64-linux-gnu/libc-2.15.so
15106522  2.7458  tor                      tor                      circuit_unlink_all_from_channel
13422467  2.4397  libevent-2.0.so.5.1.4    libevent-2.0.so.5.1.4    /usr/lib/libevent-2.0.so.5.1.4
9045536   1.6441  libcrypto.so.1.0.0       libcrypto.so.1.0.0       bn_mul4x_mont_gather5
8295787   1.5078  libcrypto.so.1.0.0       libcrypto.so.1.0.0       aesni_ctr32_encrypt_blocks
7454822   1.3550  e1000e                   e1000e                   /e1000e
6667075   1.2118  tor                      tor                      circuitmux_find_map_entry

And the top functions, considering Tor only, are:

17545411 13.9182  circuitlist.c:1116          tor                      circuit_get_by_rend_token_and_purpose.constprop.11
15232931 12.0838  circuitlist.c:1028          tor                      circuit_unlink_all_from_channel
6729424   5.3382  circuitmux.c:698            tor                      circuitmux_find_map_entry
3802661   3.0165  buffers.c:2468              tor                      assert_buf_ok
3344356   2.6530  circuitlist.c:980           tor                      circuit_get_by_circid_channel
3217962   2.5527  relay.c:2094                tor                      channel_flush_from_first_active_circuit
2927776   2.3225  buffers.c:520               tor                      buf_datalen
2367670   1.8782  connection.c:2512           tor                      connection_bucket_refill
2210529   1.7535  connection.c:2824           tor                      connection_handle_read
2200952   1.7459  relay.c:168                 tor                      circuit_receive_relay_cell
2095244   1.6621  container.c:167             tor                      smartlist_isin
1618112   1.2836  crypto.c:1364               tor                      crypto_cipher_crypt_inplace

The crypto's about what we'd expect, but it could be helpful to see if we can optimize some of the remaining Tor things. In particular:

  • circuit_get_by_rend_token_and_purpose should have a map backing it; it appears that this might be one of those costly linear searches.
  • I wonder if circuit_unlink_all_from_channel could be taught to walk a list of circuits on the channel rather than the list of all circuits.
  • I don't see a good way to make circuitmux_find_map_entry faster without tweaking data structures.
  • assert_buf_ok(), we can call less.
  • circuit_get_by_circid_channel would also need data structure impreovements.
  • buf_datalen() should just be made into an inline function.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information