you'll notice that once the program cuts off the downloads at 30 seconds or so, Tor continues to report hefty incoming bandwidth.
The attached dump from Wireshark shows that long after the client has sent a RELAY_END cell for stream 11629 it continues to receive data for that stream from the exit. A rough calculation suggests about 30K worth of data is received after the RELAY_END.
I think this is the exit flushing it's cell queue, but given the client officially stopped caring after RELAY_END that's quite a bit of wasted bandwidth.
(To read the attached dump search for 'RELAY_END' and keep searching until you find the RELAY_END for stream 11629. Everything after that is the exit flushing unwanted cells at the client.)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
If these cells are getting sent after the RELAY_END arrives at an exit node, they are probably coming because [case 1] they're already in the cell queue or [case 2] in the connection_or's outbuf. Unless [case 3] data in the connection_edge's inbuf is somehow getting packaged even after a the RELAY_END cell arrives.
It's also possible [case 4] that some of the cells are already in transit (queued somewhere between the exit node's SSL and the client's Tor) when the client sends the RELAY_END cell. We can't really do anything about those.
So, breaking down the cases:
case 1: cells already on the cell_queue) We don't have a good way to remove cells from a circuit's cell queue now at all. By the time they get there, they're encrypted and ready for delivery. Removing them would throw of the circuit's stream cipher. Furthermore, the cells are already packed at that point, so figuring out what streams they went with would be nontrivial.
We could mitigate this by lowering how full we're willing to fill a cell queue with cells from streams before we stop reading from the streams. This might help stream fairness. I don't know what it would do to performance.
In Tor 0.2.3.x, we're going to have to change the point at which we encrypt cells on queues in order to implement #1479. When we do this, we can revisit the possibility of removing already-queued cells for a dead stream.
case 2: cells already in the connection_or's outbuf) This is way too late to remove the cells.
We could mitigate this by lowering how much we're willing to fill an connection_or's outbuf.
case 3: packaging cells from an exit connection even after getting a relay_end) That would be a bug. If it's happening, we should fix it.
case 4: the cells are in transit on the net) There's nothing we can do about that with our current protocol.
So IMO the right thing to do is make sure that case 3 isn't happening, and think about what tuning we could do to mitigate 1 and 2, and whether it's a net win or not.
The total number of RELAY_DATA cells received for the three streams combinaed, after a RELAY_END for each had been sent, is exactly 1000.
I believe that's the size of the cell_queue for a circuit - the spec calls it a 'window', I assume that's what it's referring to.
I think this tells us that the exit used up the entire cell_queue and only stopped sending when it received no further SEND_MEs for the individual streams.
However, even though the client had sent RELAY_ENDs for each of the streams, as it continued to receive cells on the circuit it continued to send SENDMEs with a Circuit ID of zero every hundred of the unwanted RELAY_DATA cells it received on the circuit. Even though it had no active streams on the circuit.
If my understanding of the spec is correct, had it not send those SENDMEs the exit might have stopped pumping the unwanted RELAY_DATAs sooner. The client could have observed it had no active streams on the circuit and sent no SENDMEs at all.
Interesting. My hope is that the exit relay packages and sends 1000 cells, then it receives the stream end cells, then it receives the extra sendmes you're talking about. If so, it wouldn't send any more cells for the streams that it had already ended.
The general problem you describe is known to happen when you're on a slow link, you click on a web page, and then you click stop soon after. You've already set in motion the receiving of that page, and your Tor will waste bandwidth receiving the cells and discarding them since they're for an unrecognized stream.
We could reduce the impact if we reduce the default circuit package window, which we can control from the consensus: see circuit_initial_package_window() in circuitlist.c. But doing so may hurt overall throughput, just like capping tcp's window size will hurt throughput.
As Nick says, it would be good to confirm that we're not sending any further stream cells after we get the stream end cell.
I'd expect the cells to be long gone (case 4) in general. I guess for a really slow exit relay (or a really slow middle hop), that might still be in queues in the exit relay.
Interesting. My hope is that the exit relay packages and sends 1000 cells, then it receives the stream end cells, then it receives the extra sendmes you're talking about. If so, it wouldn't send any more cells for the streams that it had already ended.
On reflection, that's exactly what it appears to be doing. It receives the RELAY_END cells and send exactly 1000 more cells along the circuit - it keeps pumping until the window is used up. Withholding SENDMEs at that stage of the game has no effect in the testing I've done. And I think that proves the exit is just flushing what it had already written to the queue.
This problem doesn't seem to happen on the one or two fast circuits I've watched. Which reinforces the likelihood that case 4 is not happening.
After a little more cell-counting I can see that the total number of RELAY_DATA cells sent for each stream is roughly equivalent to the number of stream-level SENDMEs + 500 (the size of the stream-level package window).
So that seems sane. However, what still puzzles me is why the total number of RELAY_DATA cells sent on the circuit after each stream has sent a RELAY_END is always 1000 cells exactly - i.e. the size of the circuit-level package window. Maybe that will make sense tomorrow. I'm obviously not clear on how the two windows interact.
So that seems sane. However, what still puzzles me is why the total number of RELAY_DATA cells sent on the circuit after each stream has sent a RELAY_END is always 1000 cells exactly - i.e. the size of the circuit-level package window. Maybe that will make sense tomorrow. I'm obviously not clear on how the two windows interact.
Duh - the circuit package window is the upper limit window for the circuit so that's why there will always be at most 1000 cells remaining to flush to the client after a RELAY_END rather than the combined value of the three stream package windows (1500).
To close this out I will follow Nick's suggestion and perform this test with my Tor acting as both client and exit and ensure the cell queue isn't being filled after the RELAY_END. I really don't expect this to be happening.
To close this out I will follow Nick's suggestion and perform this test with my Tor acting as both client and exit and ensure the cell queue isn't being filled after the RELAY_END. I really don't expect this to be happening.
I tested it with an exit node and client running as separate Tor instances on the same machine and by the time the exit receives the RELAY_ENDs its cell queue for the circuit is completely full, hence the 1000 trailing cells the client receives after sending its last RELAY_END. There's nothing the exit can do to save bandwidth here except tear down the or_conn if it establishes all the streams on all the circuits multiplexed on that or_conn have received a RELAY_END. I tried this in my testing too, and it works well enough - but there is no pretty way of destroying an or_conn without flushing all the bytes on it first so I'm not sure if such an approach is acceptable, even if it's worthwhile.
I'm going to work with Karsten to implement a metric for counting the number of bytes/cells exits flush to circuits with no active streams.
The best way to read the log is to search for 'removing stream', find the last instance of it, and work back up through the cell packaging messages. I added some extra info to the log messages so you can track the package_window for individual streams.
One thing that struck me was the number of cases where the client sends a stream-level SENDME when the exit's stream level package window is greater than zero but the circ level package_window is zero so no bytes can be packaged.
So, is there a bug here that should be fixed for 0.2.2.x, or should I take it out of the milestone?
There's no bug - you can take it out of the milestone. There's some potential for saving bandwidth so will leave it open to see what adding a metric tells us:
I don't know what further diagnosis this one needs. If you send a stream begin, then data cells start coming at you. If you then send a stream end, the ones that are coming at you continue and then arrive.
I don't think anything in this ticket has shown that the exit relay queues more data cells for the client after it has received the end cell?
I think this one might be a works-for-me, since Tor is doing what we designed it to do. Maybe somebody should produce a better design, but that's a different ticket. :)