Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
Trac
Trac
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Operations
    • Operations
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Issue Boards

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • Legacy
  • TracTrac
  • Issues
  • #24716

Closed (moved)
Open
Opened Dec 22, 2017 by Roger Dingledine@arma

Try cranking up cbttestfreq consensus param, to see if it helps the current overload

In Tor 0.3.1.1-alpha, commit d5a151a, we switched:

-#define CBT_DEFAULT_TEST_FREQUENCY 60
+#define CBT_DEFAULT_TEST_FREQUENCY 10

And on May 20 2017 the dir auths set the cbttestfreq consensus param to 10 as well.

Right now the network is overloaded with create cells, from the millions of new clients that showed up in the past weeks.

Hypothesis 1: most of these clients are in learning mode much of the time, so 5 million clients * 10 seconds = 500k new create requests per second launched at the network, which contributes to the overload.

Hypothesis 2: some of these clients have learned quite low timeouts, causing them to generate many circuits which they then almost immediately cancel, but not enough of their circuits fail that they back away from their learned value.

Hypothesis 3: the clients are stuck in a sad loop where they learn a low cbt value, generate circuits for a while that mostly time out, eventually they give up on their cbt value, then they generate a circuit every 10s until they re-learn a low cbt value, and they cycle.

The experiment here (set cbttestfreq to 600 seconds temporarily) should help us test these hypotheses. For 1, we will immediately reduce the load of new circuits. For 2, this will help more slowly, because we'll have to wait for each client to hit a situation where 90%+ of its circuit attempts are being timed out, but in theory clients will slowly shift from having a too-aggressive cbt, back into learning mode. And for 3, we'll push most clients to the "learning, but very slowly" phase of their sad loop.

We can use the notice-level heartbeat messages in relay logs, to discover whether the total number of create cells goes down dramatically. If it does, win, we confirmed one or more of these hypotheses, and we can make a plan from there. If it doesn't, also win, we know we need to look elsewhere.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Tor: 0.3.3.x-final
Milestone
Tor: 0.3.3.x-final
Assign milestone
Time tracking
None
Due date
None
Reference: legacy/trac#24716