Simplify some costly Tor functions (by profile)
mo attached a few profiles to legacy/trac#7572, from a fast host with aesni The top functions overall are are: ``` 44222784 8.0379 libcrypto.so.1.0.0 libcrypto.so.1.0.0 sha1_block_data_order_avx 39059344 7.0994 nf_conntrack nf_conntrack /nf_conntrack 35552271 6.4620 libcrypto.so.1.0.0 libcrypto.so.1.0.0 bn_sqr4x_mont 31025085 5.6391 libcrypto.so.1.0.0 libcrypto.so.1.0.0 aesni_cbc_sha1_enc_avx 17425081 3.1672 tor tor circuit_get_by_rend_token_and_purpose.constprop.11 17185351 3.1236 libc-2.15.so libc-2.15.so /lib/x86_64-linux-gnu/libc-2.15.so 15106522 2.7458 tor tor circuit_unlink_all_from_channel 13422467 2.4397 libevent-2.0.so.5.1.4 libevent-2.0.so.5.1.4 /usr/lib/libevent-2.0.so.5.1.4 9045536 1.6441 libcrypto.so.1.0.0 libcrypto.so.1.0.0 bn_mul4x_mont_gather5 8295787 1.5078 libcrypto.so.1.0.0 libcrypto.so.1.0.0 aesni_ctr32_encrypt_blocks 7454822 1.3550 e1000e e1000e /e1000e 6667075 1.2118 tor tor circuitmux_find_map_entry ``` And the top functions, considering Tor only, are: ``` 17545411 13.9182 circuitlist.c:1116 tor circuit_get_by_rend_token_and_purpose.constprop.11 15232931 12.0838 circuitlist.c:1028 tor circuit_unlink_all_from_channel 6729424 5.3382 circuitmux.c:698 tor circuitmux_find_map_entry 3802661 3.0165 buffers.c:2468 tor assert_buf_ok 3344356 2.6530 circuitlist.c:980 tor circuit_get_by_circid_channel 3217962 2.5527 relay.c:2094 tor channel_flush_from_first_active_circuit 2927776 2.3225 buffers.c:520 tor buf_datalen 2367670 1.8782 connection.c:2512 tor connection_bucket_refill 2210529 1.7535 connection.c:2824 tor connection_handle_read 2200952 1.7459 relay.c:168 tor circuit_receive_relay_cell 2095244 1.6621 container.c:167 tor smartlist_isin 1618112 1.2836 crypto.c:1364 tor crypto_cipher_crypt_inplace ``` The crypto's about what we'd expect, but it could be helpful to see if we can optimize some of the remaining Tor things. In particular: * circuit_get_by_rend_token_and_purpose should have a map backing it; it appears that this might be one of those **costly** linear searches. * I wonder if circuit_unlink_all_from_channel could be taught to walk a list of circuits on the channel rather than the list of all circuits. * I don't see a good way to make circuitmux_find_map_entry faster without tweaking data structures. * assert_buf_ok(), we can call less. * circuit_get_by_circid_channel would also need data structure impreovements. * buf_datalen() should just be made into an inline function.
issue