Consider rate-limiting INTRODUCE2 cells when under load

changed milestone to %Tor: unspecified in legacy/trac

added component::core tor/tor in Legacy / Trac milestone::Tor: unspecified in Legacy / Trac network-team-roadmap-july in Legacy / Trac nickm-merge in Legacy / Trac owner::dgoulet in Legacy / Trac parent::29999 in Legacy / Trac points::10 in Legacy / Trac priority::medium in Legacy / Trac resolution::implemented in Legacy / Trac reviewer::asn in Legacy / Trac severity::normal in Legacy / Trac sponsor::27-must in Legacy / Trac status::closed in Legacy / Trac tor-dos in Legacy / Trac tor-hs in Legacy / Trac type::enhancement in Legacy / Trac labels

We should probably queue INTRODUCE2 cells, and act on them the best we can. If the queue grows too big (we are under DoS), we should drop cells enough so that we (and our guard) can handle the load.

This seems like queuing theory stuff, and specifically active queue management. Yawning suggested looking into algorithms like Stochastic Fair Blue and CoDeL .

Trac:
Cc: N/A to sjmurdoch, yawning

I'd actually like to some exploration of initial throttling or dropping or queueing at the intro point as well. That was originally meant to be the first line of defense here.

(In a related design, the hs might consider which intro points the intro2 cells are arriving from, and if they're all arriving from one intro point, take that into account.)

Trac:
Milestone: N/A to Tor: 0.2.7.x-final
Priority: normal to major

Trac:
Milestone: Tor: 0.2.7.x-final to Tor: 0.2.8.x-final

Assigning to SponsorU as a likely anti-dos measure. Could also be SponsorR

Trac:
Sponsor: N/A to SponsorU

Trac:
Points: N/A to medium

It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.

Trac:
Milestone: Tor: 0.2.8.x-final to Tor: 0.2.9.x-final

Trac:
Sponsor: SponsorU to SponsorU-can

Trac:
Priority: High to Medium

tickets market to be removed from milestone 029

Trac:
Milestone: Tor: 0.2.9.x-final to Tor: 0.2.???

Trac:
Parent: legacy/trac#15463 (moved) to legacy/trac#17293 (moved)

Remove the SponsorU status from these items, which we already decided to defer from 0.2.9. add the SponsorU-deferred tag instead in case we ever want to remember which ones these were.

Trac:
Sponsor: SponsorU-can to N/A
Keywords: N/A deleted, SponsorU-deferred added

Trac:
Keywords: N/A deleted, tor-dos added

Unparenting these from legacy/trac#17293 (moved); holding for future work.

Trac:
Parent: legacy/trac#17293 (moved) to N/A

Milestone renamed

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.3.???

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

Trac:
Milestone: Tor: 0.3.??? to Tor: unspecified
Keywords: N/A deleted, tor-03-unspecified-201612 added

Remove an old triaging keyword.

Trac:
Keywords: tor-03-unspecified-201612 deleted, N/A added

Trac:
Reviewer: N/A to N/A
Keywords: N/A deleted, tor-hs added
Severity: N/A to Normal

Trac:
Cc: sjmurdoch, yawning to yawning

Trac:
Sponsor: N/A to SponsorV-can

Stealing from V to 27 due to relation with the DoS objective.

Trac:
Sponsor: SponsorV-can to Sponsor27-can
Points: medium to 10

Trac:
Parent: N/A to legacy/trac#26294 (moved)

Closed legacy/trac#15463 (moved) (and children legacy/trac#13738 (moved), legacy/trac#13739 (moved) legacy/trac#15540 (moved)) in favor of this ticket.

Trac:
Summary: Consider dropping INTRODUCE2 cells when under load to Consider rate-limiting INTRODUCE2 cells when under load

Trac:
Parent: legacy/trac#26294 (moved) to N/A

Trac:
Parent: N/A to legacy/trac#29999 (moved)

Add keyword to tickets in network team's roadmap.

Trac:
Keywords: N/A deleted, network-team-roadmap-2019-Q1Q2 added

Replying to arma:

I'd actually like to some exploration of initial throttling or dropping or queueing at the intro point as well. That was originally meant to be the first line of defense here.

Here's my concrete proposal on this one: the intro point should see if the package window for the intro circuit is empty, and if so, it should nack the intro1 cell. That way there are at most 1000 intro2 cells in flight at once from that intro point.

This design is reasonable because it takes a long while for an onion service to process 1000 intro2 cells, so if we queue later ones and send them 'eventually', they're going to arrive much later, and the client will likely have timed out and moved on from that rendezvous point. So we're not harming legitimate clients who end up in this situation, because the current behavior is already harming them plenty.

The benefits are that (a) the onion service doesn't receive the excess intro2 cells that it wasn't going to be able to rendezvous with anyway, (b) clients get a much faster feedback that things aren't going to work so they can move to another intro point, and (c) when a DoS stops, the pain stops soon after: there isn't a huge queue of waiting intro2 cells that have to slowly drain, for no value.

We could imagine an extension on this idea, where the intro point silently drops the excess intro1 cells, rather than explicitly nacking them. This variant will force the client to time out rather than immediately try the next intro point, thus slowing down attacks by clients that follow the protocol. (Modified clients could still use a smaller timeout, or not even care whether they get a response.)

Another idea I was considering here, but ultimately abandoned as more complex than we need, was to somehow timestamp the intro1 cell when it gets received at the intro point, which would allow the onion service to examine how many seconds have passed and discard it if it's more than n seconds ago. That would essentially mean that we have n seconds of valid intro2 cells in flight, rather than at-most-n circwindows of intro2 cells in flight. This approach would handle congestion that happens inside the network (between the intro point and the service), in that if it takes a long time for the intro2 cell to make it from the intro point to the onion service, it's less likely that the client is still around and waiting for the connect-back.

But how exactly to do the timestamp, and how and whether we need to synchronize clocks, made this too klunky an idea.

Trac:
Owner: N/A to dgoulet
Status: new to assigned

Implementing comment:28 proposal: ticket15516_042_01. Unfortunately, this can't work without legacy/trac#30440 (moved)...

Because legacy/trac#30440 (moved) won't be a mature thing in the network for many years to come, we can only use the "package_window" proposal once it is.

So until then, we'll use a token bucket system, add knobs in the consensus (like the dos.c subsystem) and go on from there. Not sure how we are going to come up with the values but they need to be large enough so it doesn't affect legit busy HS.

Plan B activated. Here is the development branch: ticket15516_042_02.

So far working on the testing side. Requires unit tests and most likely better values for the rate/burst of the INTRO2.

I still want to explore the idea of putting these knobs in the ESTABLISH_INTRO cell so an operator can tweak them at the intro point.

The rate and burst values are very arbitrary here. See this thread for the discussion:

https://lists.torproject.org/pipermail/tor-dev/2019-May/013837.html

NOTE: IGNORE the token-bucket: commit since this comes from legacy/trac#30687 (moved).

Branch: ticket15516_042_02 PR: https://github.com/torproject/tor/pull/1061

Trac:
Reviewer: N/A to asn
Status: assigned to needs_review

I did a review on the code without considering the higher-level design here. I will think more about the numbers and such and reply to the tor-dev thread today or tomorrow.

Trac:
Status: needs_review to needs_revision

dgoulet will assign himself to the ones he is working on right now.

Trac:
Owner: dgoulet to N/A
Status: needs_revision to assigned

Trac:
Status: assigned to accepted
Owner: N/A to dgoulet

I've pushed a series of fixup and squashed them together along with a new commit that adds the consensus parameters discussed prior.

Branch: ticket15516_042_02

Last thing that will need to be confirmed is the rate/burst values accepted from proposal 305.

Trac:
Status: accepted to needs_review

Thanks for the revisions!

I left some more comments to the PR!

Trac:
Status: needs_review to needs_revision

Addressed everything I hope!

Note that much of this will get complemented by legacy/trac#30924 (moved) (which introduces the torrc options for instance).

Trac:
Status: needs_revision to needs_review

Thanks for the updates David! Only a single nit remains on the GH (and maybe also open the tokenbucket ticket so that we don't forget?).

As a further thing: I lost track of the experimental results of this ticket when I went to AllHands. I now don't rememember exactly how this ticket affects (a) the health of the network and (b) the availability of the service. Any chance you could update us on these two thigns in the tor-dev mailing list? I think it would be great to have this documented so that we know what exactly we are doing by merging this patch.

Marking as needs_revision for these last bits of action.

Thanks! :)

Trac:
Status: needs_review to needs_revision

Replying to asn:

Thanks for the updates David! Only a single nit remains on the GH (and maybe also open the tokenbucket ticket so that we don't forget?).

Fixed.

New ticket: legacy/trac#31062 (moved)

As a further thing: I lost track of the experimental results of this ticket when I went to AllHands. I now don't rememember exactly how this ticket affects (a) the health of the network and (b) the availability of the service. Any chance you could update us on these two thigns in the tor-dev mailing list? I think it would be great to have this documented so that we know what exactly we are doing by merging this patch.

Yes I can do this!

With legacy/trac#30924 (moved), we'll have a more complete feature and we should at that point probably blog post this entire new defense and how to leverage it.

Trac:
Status: needs_revision to needs_review

Fixes and ticket looks good to me.

Let's get the tor-dev thread going and we can move forward here.

I'm marking this as needs_revision until we get the tor-dev thread so that it does not clobber my review queue.

Trac:
Status: needs_review to needs_revision

Trac:
Sponsor: Sponsor27-can to Sponsor27-must

Trac:
Keywords: network-team-roadmap-2019-Q1Q2, SponsorU-deferred deleted, network-team-roadmap-july added

Consider rate-limiting INTRODUCE2 cells when under load

Child items ...

Activity