Cells that don't get inserted into cell queues can clog connection flushing

changed milestone to %Tor: 0.2.5.x-final in legacy/trac

added 023-backport in Legacy / Trac 024-backport in Legacy / Trac 2016-bug-retrospective in Legacy / Trac component::core tor/tor in Legacy / Trac milestone::Tor: 0.2.5.x-final in Legacy / Trac priority::high in Legacy / Trac resolution::fixed in Legacy / Trac status::closed in Legacy / Trac tor-relay in Legacy / Trac type::defect in Legacy / Trac labels

< asn_> I admit I don't understand the issue well enough to propose a good fix.
< oftc_must_be_destroyed> no good fix, only kludge is to append to queue
< oftc_must_be_destroyed> but then need to change logic of circ free
< oftc_must_be_destroyed> circuit_close_all_marked
< oftc_must_be_destroyed> so i didin't free circ till any cells on queue
< oftc_must_be_destroyed> or refactor code so no non queued cells at all
< oftc_must_be_destroyed> but then padding will wait queue
< oftc_must_be_destroyed> then it will lead non padding
< oftc_must_be_destroyed> why need padding that arrives after several hours
< oftc_must_be_destroyed> and about destroy
< oftc_must_be_destroyed> why it need to clear cell queue if destroy arrived
< oftc_must_be_destroyed> cell queues design failure

Attaching a potential partial fix from oftc_must_be_destroyed. It tries to insert DESTROY cells in cell queues too.

Trac:
223477

Replying to asn:

potential partial fix

It's wrong patch, it can't handle case of sending destroy cell if TRUNCATE command arrived.

Need another queue for destroy cells to fix clog connection flushing. So circ cell queues and destroy cell queue flushed in some order.

Attaching another patch from oftc_must_be_destroyed which seems to be setting up a cell queue specifically for DESTROY cells.

< oftc_must_be_destroyed> another idea, common queue for all destroy cells. some destroy will arrive latter but it will happen anyway. http://paste.debian.net/plain/223527
< oftc_must_be_destroyed> if any active circ, yeah idea need to fix a little.
< oftc_must_be_destroyed> or it need queue of queues.

Trac:
223690

Trac:
223690.2

Need to review those more closely; it's an interesting idea.

What happens if we try to CREATE a circuit which we think we have already DESTROYED, but for which the DESTROY cell is still pending? It seems we send CREATE cells immediately, and when we make a create cell, we could pick any circuit ID which isn't used ... including ones where we're going to send a DESTROY cell soon.

Trac:
Status: new to needs_review

(I agree this is something that we should fix in 0.2.4 unless the fix is very destabilizing)

Looks like this should target 0.2.4.9 unless 0.2.4.8 gets delayed; I'll want to test any fix here before shipping it.

For a fixed-up version of the fix -- I guess we could change the circuit ID selection logic so that it doesn't pick any circID currently in use, and it doesn't pick any circuitID that currently has a queued destroy cell. That would work.

I agree with the proposition that this isn't the best possible fix. Not sure what the best possible fix here would be; let's get something working in place and then go from there.

What about 0.2.3.x, non fixed 0.2.3.x leaves working attack alone for almost year or more till 0.2.4.x become stable and majority relays upgraded.

It need to parse all queue to find any a queued destroy cell that has some circuitID, if queue huge enough then it leads to DoS. It's possible to create bitfield with present ID in the destroy queue but that req 4KB per conn.

The best fix in theory is to detach cell queues to independent creature and to use it as pipe that every time attached by one end to conn and another end attached to circuit if needed. It must be detachable from circuit. It need to free only if no attach to circuit and no cells. Queue must be marked as active or non active instead of circuit as it does right now. And so on.

Once such design implemented it need to discuss what to do with exist cells on the queue if destroy cell appends to it.

IMO this should be a backport candidate for 0.2.3.x. We need to trade-off two factors there: the risk of not fixing it immediately, versus the risk of backporting a fix that isn't correct or tested too early. Marking as "023-backport" -- let's do a fix in 0.2.4 and make sure nothing breaks before we backport.

It need to parse all queue to find any a queued destroy cell that has some circuitID, if queue huge enough then it leads to DoS. It's possible to create bitfield with present ID in the destroy queue but that req 4KB per conn.

Hm. A linear search over 32K of cells does seem pretty excessive at first glance. I ran a quick test, to see how slow a linear search over 65535 cells would be (yes, I made sure the cells were fragmented in memory). On my laptop, it took 4.6 msec per worst-case search. Compare that to 2.4 msec per old-style onion handshake, and we're feeling some pain there.

There's an easier way to hold this stuff, so long as we're : Just use an array of uint16_t. When I tried that, I got decent performance. If we fake it with a smartlist holding uint16_t values in its void*s, we still only get 40 usec per worst-case search and 25 usec per worst-case memmove when removing the first entry. The maximum size is higher, but less than would be needed for a cell queue.

The best fix in theory is to detach cell queues to independent creature and to use it as pipe that every time attached by one end to conn and another end attached to circuit if needed

I agree that's a good design; let's save it for 0.2.5.

Trac:
Keywords: tor-relay deleted, tor-relay 023-backport added

Hm. A linear search over 32K of cells does seem pretty excessive at first glance. I ran a quick test, to see how slow a linear search over 65535 cells would be (yes, I made sure the cells were fragmented in memory). On my laptop, it took 4.6 msec per worst-case search. Compare that to 2.4 msec per old-style onion handshake, and we're feeling some pain there.

Number of destroy cells in the queue or output buffer doesn't limited by number available circuit IDs. It depends of read bw, write bw, number of clients, number of circuits and method they used to fill space with destroy cells. Enough evil attacker could OOM even in same extremely edge case with it. I could be wrong with it however.

I could be wrong with it however.

yep, I'm wrong in case of your circuit ID selection logic. But then it will leads to even faster another DoS with eating all IDs.

Replying to cypherpunks:

I could be wrong with it however.

yep, I'm wrong in case of your circuit ID selection logic. But then it will leads to even faster another DoS with eating all IDs.

The only fix for that one is wider circuit IDs, I believe. Time to dust off proposal 214? (Roger, your thoughts? ISTR we have code for that sitting around.)

Replying to nickm:

The only fix for that one is wider circuit IDs, I believe. Time to dust off proposal 214? (Roger, your thoughts? ISTR we have code for that sitting around.)

We do, at legacy/trac#7351 (moved).

Started coding up the approach discussed above, then andrea came up with an idea for an even better approach. We both agree that the 'real' approach for post-0.2.4 is to make cell queues survive the death of their circuits.

(i hope I can code up the idea andrea came up with some time wednesday for testing and review)

Replying to nickm:

(i hope I can code up the idea andrea came up with some time wednesday for testing and review)

Actually, make that "wednesday" into "next week" -- I should spend time between now and the small-features deadline on Sunday merging and implementing small features.

Trac:
Status: needs_review to needs_revision

Destroy cells should go in a queue rather than always skipping to the head of the line. Probably the easiest option here is to pretend we have a special virtual circuit for all 'control' cells like DESTROY on a channel, and schedule it with circuitmux like the other circuits. Another possibility would be to schedule the cells for the circuit they affect; not sure if forcing a DESTROY to wait behind any data we have for that circuit would change semantics in a problematic way.

There's most of an implementation for 0.2.4 in branch "bug7912" in my public repository. The missing part is putting the new destroy cell queue queue into the circuitmux. Andrea said she'd have a look at it with me when she's done with what she's working on now.

Backporting to 0.2.3 will be some work, but not an insane amount.

Trac:
Priority: normal to major

I'm also okay with special-casing the destroy queue so that it gets a high priority, so long as it doesn't starve all the other circuits.

It's also not necessary to ensure that all the other cells on a circuit are flushed before the destroy goes through; it's more like a RST than a FIN.

I think the best solution to this is to treat DESTROY and any other similar entities we want to schedule distinctly from any particular circuit as a special virtual circuit for scheduling purposes, but this will be somewhat tricky to do because right now the code in circuitmux.c uses circuit_t * in a lot of places: keeping a linked list of active circuits, passing a circuit_t * into function pointers, and so on. In the function pointer case, it seems acceptable to modify it to pass NULL to mean 'no particular circuit' along with a separate circuitmux-wide instance of circuitmux_policy_circ_data_t, but the usage internal to the circuitmux_t will be trickier.

Probably, we should switch to some other data structure that wraps a circuit_t * or has a flag to say 'this is the magic control circuit', but that's an extensive enough code modification that I think it's definitely an 0.2.5 feature. The suggestion of special-casing the destroy queue to be high-priority without starvation seems like a good right-now option.

Replying to nickm:

I'm also okay with special-casing the destroy queue so that it gets a high priority, so long as it doesn't starve all the other circuits.

It's also not necessary to ensure that all the other cells on a circuit are flushed before the destroy goes through; it's more like a RST than a FIN.

Okay; what about keeping a per-circuit pending destroy flag? For scheduling purposes we'd treat a circuit with n cells an the pending destroy flag set as having n+1 cells, but when we actually pick cells from the circuit we see the pending destroy and send a destroy instead of the queue contents?

From conversation with andrea: I'll need to modify the channel->cmux interface at the point where channel calls get_first_active_circuit so that it can maybe get a destroy instead.

I'll need to make the destroy queue get freed when the cmux gets freed

I'll need to actually put the cells onto the destroy queue.

I'll need to mark all this for refactoring

I may need to mess with the functions that compute the number of pending cells and active circuits on a cmux, or with their callers.

Replying to nickm:

I may need to mess with the functions that compute the number of pending cells and active circuits on a cmux, or with their callers.

It appears that circuitmux_num_cells will need to include pending destroys, but that circuitmux_num_active_circuits can stay as it is.

Okay, there's an implementation for 0.2.4 now in branch "bug7912". It's not very well tested, but could use review.

Need to think about testing strategy here. What I'd be afraid of is that we'd run out of circuit IDs over time, or that we'd never actually send destroy cells for some reason.

Trac:
Status: needs_revision to needs_review

Begin code review:

967503c1 looks good

In 17d64c8ce75c564a3e12cda29c79b9dbf280af67, you have circuit_set_p_circid_chan() clearing circ->p_delete_pending and marking the circuit ID unusable on the old channel, but circuit_set_n_circid_chan() sets circ->n_delete_pending in the analogous case (which is a no-op since the test for the if already implies that). That seems unlikely to be correct.

Also, if you've marked the circuit ID unusable in the old channel, what's going to mark it usable again when the delete is finished?

Hmm, I think f5a46e3f1ac125cf23488d86dfb6d862c24f6be1 addresses the latter concern I raised with respect to 17d64c8ce75c564a3e12cda29c79b9dbf280af67, and looks okay in itself.

I think ab8f14702f968ff8b82382daba2da5542fd42d78 is okay, save that we might want to consider more flexible policy than alternation. How thoroughly has this been tested, given the number of different bits of code it touches?

you have circuit_set_p_circid_chan() clearing circ->p_delete_pending and marking the circuit ID unusable on the old channel, but circuit_set_n_circid_chan() sets circ->n_delete_pending in the analogous case (which is a no-op since the test for the if already implies that). That seems unlikely to be correct.

Okay; will investigate and fix.

we might want to consider more flexible policy than alternation.

Agreed.

How thoroughly has this been tested, given the number of different bits of code it touches?

Hardly at all. I think I ran it on a chutney network and it didn't explode, but that doesn't mean much.

Replying to nickm:

you have circuit_set_p_circid_chan() clearing circ->p_delete_pending and marking the circuit ID unusable on the old channel, but circuit_set_n_circid_chan() sets circ->n_delete_pending in the analogous case (which is a no-op since the test for the if already implies that). That seems unlikely to be correct.

Okay; will investigate and fix.

Added a fixup commit; thanks!

we might want to consider more flexible policy than alternation.

Agreed.

(We should open a ticket for this; it feels like an 0.2.5 thing)

How thoroughly has this been tested, given the number of different bits of code it touches?

Hardly at all. I think I ran it on a chutney network and it didn't explode, but that doesn't mean much.

I can try it on another chutney network. How else should we check? I'd be especially worried about bugs that prevented deletes from getting sent entirely. Any ideas about how can we audit for those?

Replying to nickm:

Replying to nickm:

you have circuit_set_p_circid_chan() clearing circ->p_delete_pending and marking the circuit ID unusable on the old channel, but circuit_set_n_circid_chan() sets circ->n_delete_pending in the analogous case (which is a no-op since the test for the if already implies that). That seems unlikely to be correct.

Okay; will investigate and fix.

Added a fixup commit; thanks!

Fixup looks fine.

we might want to consider more flexible policy than alternation.

Agreed.

(We should open a ticket for this; it feels like an 0.2.5 thing)

How thoroughly has this been tested, given the number of different bits of code it touches?

Hardly at all. I think I ran it on a chutney network and it didn't explode, but that doesn't mean much.

I can try it on another chutney network. How else should we check? I'd be especially worried about bugs that prevented deletes from getting sent entirely. Any ideas about how can we audit for those?

Hmmmm - do you mean bugs where no deletes or significantly fewer deletes get sent, or are you worried about more subtle cases where once in a great while a delete gets stuck or lost?

In the former case, it might be easiest to approach it statistically: add a log message on how many deletes have been sent out of how many cells in total, and compare against master. The latter probably needs trickier debug output for deletes entering and leaving the queue or something like that.

Maybe an easy way to spot if there's an issue meriting further investigation would be to count deletes when they enter the queue (i.e., at the earliest point where this patch series changes any of the logic) and then when they're sent, and see if differences between the counters stay in a bounded range (i.e., the size of the queue under equilibrium conditions) or increase as the node stays running longer?

Replying to andrea:

Maybe an easy way to spot if there's an issue meriting further investigation would be to count deletes when they enter the queue (i.e., at the earliest point where this patch series changes any of the logic) and then when they're sent, and see if differences between the counters stay in a bounded range (i.e., the size of the queue under equilibrium conditions) or increase as the node stays running longer?

Seems likely to work out. You'd want a per-connection counter, I think, since if a connection dies, we won't actually consume any destroys. I'll implement something like that, then merge.

Trac:
Status: needs_review to needs_revision

Replying to nickm:

Seems likely to work out. You'd want a per-connection counter, I think, since if a connection dies, we won't actually consume any destroys. I'll implement something like that, then merge.

I still think the best strategy here is a (signed) delete counter per channel, incremented on queueing a delete and decremented on send. Possibly also a global delete counter, but decrement by the number of pending deletes on the channel if one dies, to address the issue you discussed while still accumulating long-term statistics over the life of the whole Tor process. Have you made progress implementing so we can test-and-merge this?

See the instrumented version of this in my bug7912 branch; test run looks good so far.

Trac:
Status: needs_revision to needs_review

Update on testing: after 6.5 hours, a middle relay configured with 80 kbyte/sec average, 120 kbyte/sec burst has the following heartbeat log entry:

Jun 13 05:20:35.000 [notice] Heartbeat: Tor's uptime is 6:30 hours, with 7 circuits open. I've sent 435.56 MB and received 431.80 MB.

The destroy statistics are:

andrea@tor-dev:~/tor-test/bug7912 [39]$ grep -Hne 'notify_xmit_destroy' debug.log | wc -l
681
andrea@tor-dev:~/tor-test/bug7912 [40]$ grep -Hne 'notify_xmit_destroy' debug.log | tail
debug.log:8219257:Jun 13 05:19:04.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0xee50e0 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8219663:Jun 13 05:19:10.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x1816c50 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8219896:Jun 13 05:19:20.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x18313c0 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8220856:Jun 13 05:20:09.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0xfe3880 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8221071:Jun 13 05:20:25.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x198b590 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8221100:Jun 13 05:20:26.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x197d450 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8221874:Jun 13 05:21:45.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x1892230 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8222328:Jun 13 05:22:06.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x1890750 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8222848:Jun 13 05:22:24.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x1a05ef0 sent a destroy, cmux counter is now 0, global counter is now 0
debug.log:8223105:Jun 13 05:22:28.000 [debug] circuitmux_notify_xmit_destroy(): Cmux at 0x1a10f50 sent a destroy, cmux counter is now 0, global counter is now 0

Looks like it's working just fine.

The code looks plausible. Think we should try this in 0.2.5 for a bit before merging it in 0.2.4?

14:07 < athena> nickm: #7912 looks solid enough for 0.2.4 to me if you're okay 
                with it
14:08 < nickm> I'm feeling wary TBH.
14:08 < nickm> This feels like one of those things where we merge it, and 
               something isn't quite right, and we tinker and tinker and delay 
               the release.
14:08 < nickm> I mean, I'm not sure that will happen. But I'd give it a 20-30% 
               chance.
14:08 < nickm> So if you're okay trying it in 0.2.5 with a possibility for an 
               0.2.4 backport, I think that's how i'd lean.
14:08 < athena> okay

I have done a rebase -i --autosquash in branch "bug7912_squashed", and added a changes file.

Merging to 0.2.5; pending a backport to 0.2.4 assuming that nothing acts hinky after a while.

Trac:
Keywords: tor-relay 023-backport deleted, tor-relay 023-backport 024-backport added

When backporting to 0.2.4, I should make sure that I have merged the legacy/trac#9082 (moved) fix into the bug7912_squashed branch.

I have merged that fix into my bug7912_squashed branch.

I have cherry-picked the fix for legacy/trac#9296 (moved) here too.

Given the complexity of the patches here, I'm inclined to stick with Nick's "I'm wary" statement above -- especially since we can't delay the release with bugs here, since the release is already out, meaning we'd just be screwing stable users with the tinkering.

If this gets a backport we should probably cherry-pick the fix for legacy/trac#11278 (moved) too

Recommendation: unsure about 0.2.4.22 backport due to complexity.

I suggest leaving it out of 0.2.4.22 then. With plans to leave it out of 0.2.4.x entirely unless we change our mind later.

Okay, no backport to 0.2.4 for these, for stability reasons.

Trac:
Status: needs_review to closed
Milestone: Tor: 0.2.4.x-final to Tor: 0.2.5.x-final
Resolution: N/A to fixed

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

Trac:
Keywords: tor-relay 023-backport 024-backport deleted, 024-backport, 023-backport, tor-relay, 2016-bug-retrospective added

Mark bugs for 2016 bug retrospective based on hand-examination of changelogs for 0.2.5 onwards.

closed

mentioned in issue legacy/trac#8568 (moved)

mentioned in issue legacy/trac#9082 (moved)

mentioned in issue legacy/trac#11553 (moved)

mentioned in issue legacy/trac#12023 (moved)

mentioned in issue legacy/trac#24668 (moved)

moved from legacy/trac#7912 (moved)

added Bug label and removed 1 deleted label

added Relay label and removed 1 deleted label

mentioned in issue #40623 (closed)

mentioned in issue #40655 (closed)

Cells that don't get inserted into cell queues can clog connection flushing

Child items 0

Activity