Extend OOM handler to cover channels/connection buffers

changed milestone to %Tor: 0.2.5.x-final in legacy/trac

added 024-backport in Legacy / Trac 2016-bug-retrospective in Legacy / Trac component::core tor/tor in Legacy / Trac milestone::Tor: 0.2.5.x-final in Legacy / Trac oom in Legacy / Trac priority::high in Legacy / Trac resolution::fixed in Legacy / Trac status::closed in Legacy / Trac tor-relay in Legacy / Trac type::defect in Legacy / Trac labels

This is for the upload to server variant of the sniper attack. For each exit stream at the exit relay, track a LastReadTimestamp - the last time that stream was read from by the destination. Then, while the OOM killer is checking all circuits for the one with the oldest cell, have it also consider the exit streams' LastReadTimestamps and kill the oldest circuit/stream accordingly.

What breaks here?

Trac:
Keywords: tor-relay oom deleted, tor-relay oom 024-backport added

"How long has the oldest cell on the circuit been there" (what we did for legacy/trac#9093 (moved)) is not quite the same thing as "when did this stream last successfully write" (what I think you're proposing above. The latter check is trivially defeated by a slow but steady drain, right?

Yeah, you're right, I wasn't very clear. I was trying to suggest something similar to a timestamp per cell like we have with circuit queues. How about we keep a timestamp queue on the streams. We append a timestamp for every N bytes written from the circuit to the edge buffer, and then pop the head of the timestamp queue after we flush N bytes. Will this allow us to track how long every N byte chunk has been waiting in the buffer?

Hm. I bet we could do something clever with the buf_chunk_t structure.

Branch "bug10169_023" is a very hurried implementation of an approximation of this.

What it is missing is:

it needs to aggressively free the bytes in the buffers rather than just marking the connections, to free up RAM faster.
it needs to count the bytes in the buffers as it's removing them, to know when it's freed 10% of the total allocations here.
probably something else; I'm in a hurry

I've resolved the items I know about above, and added a legacy/trac#9686 (moved) fix. Please review this branch? (bug10169_023)

Despite its name, I would be happy to let it percolate in 0.2.5 for a while before merging it into 0.2.4 or 0.2.3.

Trac:
Status: new to needs_review

Re-pinging folks: I really do need a review for this one. Anybody?

I've forward-ported to bug10169_024 and bug10169_025; I'm writing unit tests for the latter since the unit testing framework in 0.2.5 is what I need here. I'm doing a branch that combines unit tests and fixes as bug10169_025_tmp; I'm going to split them up into separate commits and cherry-pick the bugfixes to bug10169_023 while cherry-picking the tests in bug10169_025.

Reviewed bug10169_024.

total_bytes_allocated_in_chunks still has a DOCDOC in buffers.c

I wonder if END_CIRC_REASON_RESOURCELIMIT can be used to perform a modified sniper-based oracle attack if there's another way to fill the remaining 10% when the 90% memory usage is reached.

It might be useful to the relay operator if Tor says how many circuits remained alive after circuits_handle_oom runs, e.g. modify the log notice at the end of that function.

  int n_circuits_alive = smartlist_len(circlist) - n_circuits_killed;
  log_notice(LD_GENERAL, "Removed "U64_FORMAT" bytes by killing %d "
                         "circuits, %d circuits remain alive.",
             U64_PRINTF_ARG(mem_recovered), n_circuits_killed,
             n_circuits_alive);

Maybe the comments for circuit_get_streams_max_data_age and marked_circuit_streams_free_bytes should notate that they are helpers for circuit_max_queued_data_age and marked_circuit_free_stream_bytes, respectively?

Is it possible that when the stream buffers are "aggressively freed" using chunk_free_unchecked() they may not actually be freed, but prepended to a freelist and, as a result, less memory is actually freed in circuits_handle_oom() than is expected?

I've added some fixes to bug10169_023, bug10169_024, and bug10169_025_v2. (Note the new branch!)

Bug10169_025_v2 now has some tests for oom and other issues. I still need to address sysrqb's comments above.

Replying to sysrqb:

Reviewed bug10169_024.

total_bytes_allocated_in_chunks still has a DOCDOC in buffers.c

Will fix later in 0.2.5.

I wonder if END_CIRC_REASON_RESOURCELIMIT can be used to perform a modified sniper-based oracle attack if there's another way to fill the remaining 10% when the 90% memory usage is reached.

The idea being, cause a node to go nearly OOM, and then see which streams (as a client!) got END_STREAM_REASON_RESOURCELIMIT, so you know that you're nearly at the OOM point, and then somehow make the node consume another .1 * MaxMem ?

If that's what you meant, it would work, but I think it only means that MaxMem needs to be set conservatively, and we need to be on the lookout for other ways to pump up a node's memory consumption. Even if we didn't send END_STREAM_REASON_RESOURCELIMIT, an attacker could still snipe a node if they know a way to make it run out of memory without its buffers and cell queues exceeding .9*MaxMem.

It might be useful to the relay operator if Tor says how many circuits remained alive after circuits_handle_oom runs, e.g. modify the log notice at the end of that function.

{{{ int n_circuits_alive = smartlist_len(circlist) - n_circuits_killed; log_notice(LD_GENERAL, "Removed "U64_FORMAT" bytes by killing %d " "circuits, %d circuits remain alive.", U64_PRINTF_ARG(mem_recovered), n_circuits_killed, n_circuits_alive); }}}

Done in 79c234e0

Maybe the comments for circuit_get_streams_max_data_age and marked_circuit_streams_free_bytes should notate that they are helpers for circuit_max_queued_data_age and marked_circuit_free_stream_bytes, respectively?

Is it possible that when the stream buffers are "aggressively freed" using chunk_free_unchecked() they may not actually be freed, but prepended to a freelist and, as a result, less memory is actually freed in circuits_handle_oom() than is expected?

I think I fixed that in fd28754d

The branches to review are still bug10169_023 (for the main patch), bug10169_024 (for the merge), and bug10169_025_v2 (for the merge and the tests)

Code review for NickM's bug10169_023 branch:

https://gitweb.torproject.org/nickm/tor.git/shortlog/refs/heads/bug10169_023 91ec6f72..79c234e0

91ec6f72:

I don't like the name buf_get_oldest_chunk_timestamp() for a function that returns an age rather than a timestamp.
Can anything horrible happen with all this if the clock gets reset?
- Perhaps it would be wise to use clock_gettime(CLOCK_MONOTONIC, ...) where available if we aren't doing so already.
The adjustment in chunk_grow() is wrong but you fixed it in 79515917

eabcab2b:

This looks okay to me.

a406f6d0:

This looks okay to me.

03ff21b0:

This looks okay to me.

e572ec85:

This looks okay to me.

64724872:

This looks okay to me.

79515917:

This looks okay to me.

fd28754d:

Yeah, this might be a good move :)

79c234e0:

This looks okay to me.

Code review for NickM's bug10169_024 branch:

https://gitweb.torproject.org/nickm/tor.git/shortlog/refs/heads/bug10169_024

All of bug10169_023 plus merges:
- 5c45a333
- 05d8111e

5c45a333:

This looks okay to me.

05d8111e:

This looks okay to me.

Code review for bug10169_025 branch:

https://gitweb.torproject.org/nickm/tor.git/shortlog/refs/heads/bug10169_025_v2

All of bug10169_024 plus 87fb1e32..c8d41da5

87fb1e32:

This looks okay to me.

eb6f433b:

This looks okay to me.

f425cf83:

This looks okay to me.

d379fc6e:

This looks okay to me.

52d222aa:

This looks okay to me.

9a07ec75:

This looks okay to me.

48877e24:

This looks okay to me.

c8d41da5:

This looks okay to me.

Replying to andrea:

Code review for NickM's bug10169_023 branch:

https://gitweb.torproject.org/nickm/tor.git/shortlog/refs/heads/bug10169_023 91ec6f72..79c234e0

91ec6f72:

I don't like the name buf_get_oldest_chunk_timestamp() for a function that returns an age rather than a timestamp.

Okay, let's change that in the 0.2.5 version.

Can anything horrible happen with all this if the clock gets reset?

We could kill the wrong circuits if the clock goes backwards and then doesn't catch up with itself before we hit an OOM.

Perhaps it would be wise to use clock_gettime(CLOCK_MONOTONIC, ...) where available if we aren't doing so already.

Can that be an 0.2.5 only thing? Doing a portable monotonic timer is a bit tricksy. On Linux, you want clock_gettime(CLOCK_MONOTONIC_COARSE). On OSX, you want mach_absolute_time(). On other Unix, you want clock_gettime(CLOCK_MONOTONIC) if possible. On Windows, there's a complicated mishmash of things using QueryPerformanceCounter(), GetTickCount64(), and GetTickCount(). As a fallback, you can use gettimeofday() and check the result to make sure it doesn't go backwards.

I guess we could do just the fallback check-and-latch in 0.2.3/0.2.4, and aim for the more complex ones in 0.2.5 or later?

I've added 833d0277 to bug10169_023 ; if you like it, I'll merge it forwards. It implements a trivial latch to make sure that our time can't go backwards there.

Andrea said this looked good to merge into 0.2.5. I've updated the branches "bug10169_024" and "bug10169_025_v2", and merged the latter.

We should consider this for backport to 0.2.4.

Trac:
Milestone: Tor: 0.2.5.x-final to Tor: 0.2.4.x-final

Replying to nickm:

Andrea said this looked good to merge into 0.2.5. I've updated the branches "bug10169_024" and "bug10169_025_v2", and merged the latter.

FYI, I have not forgotten about testing this defense. I've been trying to test bug10169_025_v2 in Shadow for the last week. I've been running into bugs that appeared after updating the version of Tor I'm using in Shadow. Stay tuned.

Trac:

The oom defense was not triggered.

After merge hell, I finally got this working. I set MaxMemInQueues to 50MB, which automatically gets pushed up to the minimum of 256MB. I've attached a graph showing that circuits_handle_oom() did not appear to be triggered.

https://trac.torproject.org/projects/tor/attachment/ticket/10169/ram.time.png

Is there some way for me to tell what the victim thought its memory was over time? Or should I just add a log message for that?

I'd suggest adding a log message. Also, what are you testing that gets you "merge hell"? I'd suggest that you just test master, or 0.2.5.3-alpha.

Replying to nickm:

I'd suggest adding a log message. OK.

Also, what are you testing that gets you "merge hell"? I'd suggest that you just test master, or 0.2.5.3-alpha.

This was mostly due to the fact that I implemented some necessary client pieces for the attack on 0.2.3.25, and made the mistake of optimistically assuming everything would work OOTB instead of starting my testing on a minimal network. So yes, its my own fault.

ok. Please try against 0.2.5.3-alpha in that case if at all possible?

Trac:

OOM defense works well in controlled test network.

TLDR, the defense seems to be working correctly.

I tried this out on my small 10 node test network in Shadow, where all relays has ample 10 MiB/s connections. I merged both my sniper attack code and nickm's bug10169_025_v2 with tor-0.2.5.2-alpha. Then I tested the sniper attack using 1 team of 10 circuits (1 client instance to use a ping circuit to measure rtt, 1 client instance to launch 9 sniper circuits). I tested the attack without nickm's defense, and with nickm's defense using MaxMemInQueues 50 MB (which automatically gets adjusted up to 256MB). Then I ran a second test with 2 teams of 10 circuits.

The results are in the attack graph. Both the graph and the log file indicates that the sniper's circuits were successfully killed after memory exceeded the 256MB limit.

I'm not exactly sure why the defense was not being triggered before, but looking back at my config I may have been using MaxMemInQueues of 500 MB (which would have been to large to trigger OOM killer).

Recommendation: unsure about 0.2.4.22 backport due to complexity.

I think given the complexity, and given that some fast relays are running 0.2.5, it is not critical to get this fix to the remaining 0.2.4 relays.

Okay, no backport to 0.2.4 for these, for stability reasons.

Trac:
Resolution: N/A to fixed
Status: needs_review to closed
Milestone: Tor: 0.2.4.x-final to Tor: 0.2.5.x-final

Mark more tickets for bug retrospective based on hand-review of changelogs from 0.2.5 onwards.

Trac:
Keywords: tor-relay oom 024-backport deleted, 024-backport, oom, tor-relay, 2016-bug-retrospective added

closed

moved from legacy/trac#10169 (moved)

added Bug label and removed 1 deleted label

added Relay label and removed 1 deleted label

Extend OOM handler to cover channels/connection buffers

Child items 0

Activity