Skip to content
Snippets Groups Projects
Closed Confidential - TROVE-2022-001: Congestion control RTT injected delay
  • View options
  • Confidential - TROVE-2022-001: Congestion control RTT injected delay

  • View options
  • Closed Issue created by David Goulet

    Public ticket that started this: #40624

    This only affects >= tor-0.4.7.2-alpha. I have reserved TROVE-2022-001 at the moment and set it to HIGH considering the remote nature of the bug and its consequences on the network.

    What

    It appears that congestion control can enter a state that makes it never exit the CC slow-start. This means in concrete terms that tor can never exit its "initial congestion window" (set at 2 cells right now) thus having extremely slow circuits. As a client, we are talking couple KB/sec.

    This in theory can be triggered in two ways which one can be done remotely:

    1. A clock jump.

    2. A tor withholding a SENDME for a couple of minutes would also trigger this condition.

    The (2) is the one that is very worrying because anyone can trigger that. A malicious client could do that to an onion service effectively turning "off" congestion control for that service sending a pretty huge signal to a Guard/Middle relay.

    But, it is likely also possible that mobile tor client could go dormant just before needing to send a SENDME and then coming back online much later sending it and thus triggering this condition on the endpoint (onion service, relay).

    Another possibility, like my non-Exit relay ended up in, is for directory request to stall long enough leading to that problem. Directory authority are often overwhelmed or heavily throttled/DPI (Faravahar).

    Or for a malicious client to upload a descriptor on an HSDir (any relay) and withholding that SENDME again triggering this problem rendering the relay almost unusable.

    In a nutshell, the network can come to a grinding halt if we don't fix this else we need to disable CC asap.

    How

    The problem lies in time_delta_stalled_or_jumped() which checks if the circuit new RTT is very much out of range from the previous one. In that case, it sets is_monotime_clock_broken = true which is global to tor as in affecting all circuits. And then, it returns true so the circuit RTT is not updated because we believe the clock is no bueno.

    But, from that point on, every call to time_delta_stalled_or_jumped() will return true because of the guard if (old_delta == 0) where old_delta is circuit->cc->ewma_rtt_usec which starts at 0 for a new circuit and now because is_monotime_clock_broken = true, it will stay 0, never able to come back to false.

    This means, the circuit never gets to measure its RTT and thus never exit slow starts.

    Solution

    Proposed patch by @mikeperry : mikeperry/tor@4bdcfdf6

    Linked items ... 0

  • Activity

    • All activity
    • Comments only
    • History only
    • Newest first
    • Oldest first
    Loading Loading Loading Loading Loading Loading Loading Loading Loading Loading