arti-bench: support multiple streams per circuit, multiple circuits per sample.
This branch supports circuit-parallelism as well as stream parallelism, in order to help us benchmark all kinds of load for #87.
There's some refactoring too, so that streams can happen in parallel (rather than just data transfer). It might be a good to review the commits separately for that.
Closes #380 (closed).
Please do not merge this just now; it seems okay but I haven't tested much it yet.
Assigning to @eta for review since she groks arti-bench.
Edited by Nick Mathewson