@@ -12,7 +12,7 @@ So, we have 3 main kinds of crypto that servers do a lot.
Parallelizing cell crypto seems like the easiest to do as a start. Here's my current thoughts on that.
== Parallelizing cell crypto ==
## Parallelizing cell crypto
Right now, we do relay cell crypto in relay_crypt_one_payload, which gets called in a few places, all of which are fortunately in relay.c. In circuit_package_relay_cell, it gets called 1..N times for a cell that we're packaging from an edge connection and about to stick on a cell queue. In relay_crypt, we just got a relay cell on a circuit, and we are about to decrypt it, see whether it should get handled by us, and maybe pass it on or maybe process it. If we handle it, we are going to pass it to connection_edge_process_relay_cell. If we pass it on, we are going to append it to a cell_queue_t.
...
...
@@ -20,7 +20,7 @@ For pure relays, the relay_crypt() case dominates. For exit nodes, the circuit_
I think that the right solution here involves adding another pair of cell queue structures (not necessarily cell_queue_t) to each circuit, containing cells going in each direction that have not yet been encrypted. We can then have a few worker threads whose job it is to do the relay crypto for such circuits.
=== Cell crypto complications ===
### Cell crypto complications
We will want some kind of a work-queue implementation for passing info to the worker threads. To implement this, we'll probably need a condition-variable implementation. On pthreads this is trivial: just use pthread_cond_t. For pre-Vista Windows, it's less so. Fortunately, I needed to write a windows condition variable implementation for Libevent, so we can just use that one.
...
...
@@ -52,7 +52,7 @@ Once we have these, we can mark the callbacks used to implement bufferevent_open
Of course, there is at least a little bit of design-and-implementation work to get through here.
== Parallelizing SSL: Design 2. ==
## Parallelizing SSL: Design 2.
This version only requires Libevent 2.0, and a bit of chutzpah and hacking.
...
...
@@ -66,11 +66,11 @@ We'd need to avoid race conditions when inspecting things about bufferevents, ad
= Coding plan =
# Coding plan
== Phase 1: Cell crypto ==
## Phase 1: Cell crypto
=== Cell crypto: The design ===
### Cell crypto: The design
We'll need a set of worker threads. We'll pass data to worker threads on a basic unbounded work queue implementation. For starters, all threads can share a single queue, though we should keep the implementation abstract so that we can support other modalities in the future. For communicating back to the main thread, we'll also use an unbounded queue. This means that we will need two work queue implementations .
...
...
@@ -107,7 +107,7 @@ Whenever you add the first cell to a circuit for crypto, or for handling, you ne
Closing a circuit gets a little complicated in this case. If the circuit is marked, you never add it to a work queue. If a worker sees a marked circuit, it should simply remove it from the work queue and do no processing on it. (Right?) The only trick is that you can't actually close a marked circuit while it is still on a work queue or getting handled by a worker thread.
=== Cell crypto: The plan ===
### Cell crypto: The plan
1. Crypto abstractions
a. Add needed fields to packed_cell_t
...
...
@@ -138,7 +138,7 @@ Closing a circuit gets a little complicated in this case. If the circuit is mar
a. Probe automatically for the number of CPUs when NumCPUs is 0.
a. Make queue-EWMA code (maybe) consider uncrypted/unhandled cell counts.
=== Athena's Notes ===
### Athena's Notes
On cell_t vs. packed_cell_t:
- Why this talk of adding things to packed_cell_t in https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto ?