Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
Trac
Trac
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Operations
    • Operations
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Issue Boards

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • Legacy
  • TracTrac
  • Issues
  • #32108

Closed (moved)
Open
Opened Oct 16, 2019 by Roger Dingledine@arma

tor can overrun its accountingmax if it enters soft hibernation first

I'll put the punchline first: second_elapsed_callback(), which is where we check if it's time to go dormant for hibernation, is no longer called when we've entered soft hibernation.

I assume this is because of the new "periodic event flag" feature, where we try to avoid calling callbacks when we're in a state that won't need them. See the "online and active" note here:

  /* This is a legacy catch-all callback that runs once per second if
   * we are online and active. */
  CALLBACK(second_elapsed, NET_PARTICIPANT,
           FL(NEED_NET)|FL(RUN_ON_DISABLE)),

The impact is limited, since we stop accepting new connections and new circuits when we enter soft hibernation, but it can still be bad: existing connections and circuits can last for a long time and use a lot of bandwidth.

A secondary impact is that accounting_run_housekeeping() never gets called, which means that the state file never gets updated after we've entered soft hibernation, which means these bandwidth overspends are never recorded to disk.

I think the bug went in during commit 4bf79fa4f which is part of Tor 0.4.0.1-alpha.

The PERIODIC_EVENT_FLAG_NEED_NET flag (what FL(NEED_NET) expands into) checks net_is_disabled(), but there is another function right after net_is_disabled in netstatus.c called net_is_completely_disabled(), and the only difference is that it checks we_are_fully_hibernating() vs we_are_hibernating().

I confirmed the overall bug happens in practice: there's a relay operator in #tor who hit soft hibernation, and then saw his tor proceed to use more bytes than expected. I had him do a 'gdb attach' to his tor and break on 'second_elapsed_callback' and the function never got called.

It seems like the immediate fix, and best backport plan, would be to resume calling second_elapsed_callback even when net_is_disabled(). The longer term plan can be to audit our calls to net_is_disabled() and net_is_completely_disabled() and we_are_hibernating(), with an eye towards "should we be doing this behavior while soft hibernating", and see what other bugs we find.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Tor: 0.4.0.x-final
Milestone
Tor: 0.4.0.x-final
Assign milestone
Time tracking
None
Due date
None
Reference: legacy/trac#32108