Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Tor Tor
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 328
    • Issues 328
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 31
    • Merge requests 31
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • The Tor Project
  • Core
  • TorTor
  • Issues
  • #29698
Closed
Open
Issue created Mar 08, 2019 by pastly@pastly

Edge case that causes improper circuit prioritization for one scheduling run

= The Problem, Shortly

A circuit that goes from very busy for a long time, to 100% idle for a long time, and then needs to send traffic again will be incorrectly deprioritized the first time it gets scheduled.

= The Problem, Illustrated

Consider a circuit that is very very busy for a significant length of time (minutes). There's constant traffic flowing in one (or both, but let's just say one) direction on this circuit, leading to it earning for itself a high cell_count EWMA value (thus a low priority for scheduling). Assume it is the only circuit on its channel.

Now assume it suddenly stops sending traffic but stays open. It stays this way for significant length of time (many 10s of seconds) such that its cell_count EWMA value should be essentially zero, but it hasn't actually been updated yet since this value isn't updated until a cell has been transmitted (see circuitmux_notify_xmit_cells).

At this point in time the relay is still servicing some number of low-traffic circuits on other channels. Maybe it has always been handling these circuits. Whatever. It doesn't matter. What matters is at this point in time there's lots of low-traffic circuits needing scheduling. Because they are low-traffic, these circuits have cell_count EWMA values that are relatively low (thus a high priority for scheduling).

Now what happens when that original high-traffic circuit stops being totally idle? What happens when it wants to send another 1000, 100, or even just 1 cell?

It gets put into KIST channels_pending smart list like any other circuit. In fact there are a bunch of low-bandwidth circuits in there with it. Observe what happens when KIST starts scheduling its collection of pending channels:

KIST loops over and over until its list of pending channels is empty. Each time it gets the channel with the current best-priority circuit, schedules one cell, updates the appropriate cell_count, and puts the channel back in the pending list if necessary.

All those low-traffic circuits will be serviced first because they have low cell_count values (high priority) as compared to the outdated cell_count value for the original high-traffic circuit.

When the circuit finally gets to send its first cell after its long period of inactivity, its cell_count EWMA value is corrected to be near zero. That's fine. But it should have been updated before scheduling decisions were made so that it was the first one to be scheduled.

= A solution

Add a touch function in the circuitmux channel interface that tells the circuitmux and whatever its policy is to update its circuit priorities if desired.

Before entering the main scheduling loop, call this touch function on all the pending channels. In the case of the EWMA policy, the touch function would ultimately drill down to something like

ewma_touch(circuitmux_policy_data_t *pol_data)
{
  ewma_policy_data_t *pol = NULL;
  unsigned int tick;
  double fractional_tick;
  tor_assert(pol_data);
  pol = TO_EWMA_POL_DATA(pol_data);

  /* Rescale the EWMAs if needed */
  tick = cell_ewma_get_current_tick_and_fraction(&fractional_tick);

  if (tick != pol->active_circuit_pqueue_last_recalibrated) {
    scale_active_circuits(pol, tick);
  }
}

(Which you might observe is essentially the first part of ewma_notify_xmit_cells(...)).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking