Edge case that causes improper circuit prioritization for one scheduling run
= The Problem, Shortly A circuit that goes from very busy for a long time, to 100% idle for a long time, and then needs to send traffic again will be incorrectly deprioritized the first time it gets scheduled. = The Problem, Illustrated Consider a circuit that is very very busy for a significant length of time (minutes). There's constant traffic flowing in one (or both, but let's just say one) direction on this circuit, leading to it earning for itself a high `cell_count` EWMA value (thus a low priority for scheduling). _Assume it is the only circuit on its channel_. Now assume it suddenly stops sending traffic but stays open. It stays this way for significant length of time (many 10s of seconds) such that _its `cell_count` EWMA value should be essentially zero, but it hasn't actually been updated yet_ since this value isn't updated until a cell has been transmitted (see `circuitmux_notify_xmit_cells`). At this point in time the relay is still servicing some number of low-traffic circuits _on other channels_. Maybe it has always been handling these circuits. Whatever. It doesn't matter. What matters is at this point in time there's lots of low-traffic circuits needing scheduling. Because they are low-traffic, these circuits have `cell_count` EWMA values that are relatively low (thus a high priority for scheduling). Now what happens when that original high-traffic circuit stops being totally idle? What happens when it wants to send another 1000, 100, or even just 1 cell? It gets put into KIST `channels_pending` smart list like any other circuit. In fact there are a bunch of low-bandwidth circuits in there with it. Observe what happens when KIST starts scheduling its collection of pending channels: KIST loops over and over until its list of pending channels is empty. Each time it gets the channel with the current best-priority circuit, schedules one cell, _updates the appropriate `cell_count`_, and puts the channel back in the pending list if necessary. **All those low-traffic circuits will be serviced first because they have low `cell_count` values (high priority) as compared to the outdated `cell_count` value for the original high-traffic circuit.** When the circuit finally gets to send its first cell after its long period of inactivity, its `cell_count` EWMA value is corrected to be near zero. That's fine. _But it should have been updated before scheduling decisions were made so that it was the first one to be scheduled_. = A solution Add a `touch` function in the circuitmux channel interface that tells the circuitmux and whatever its policy is to update its circuit priorities if desired. Before entering the main scheduling loop, call this `touch` function on all the pending channels. In the case of the EWMA policy, the `touch` function would ultimately drill down to something like ``` ewma_touch(circuitmux_policy_data_t *pol_data) { ewma_policy_data_t *pol = NULL; unsigned int tick; double fractional_tick; tor_assert(pol_data); pol = TO_EWMA_POL_DATA(pol_data); /* Rescale the EWMAs if needed */ tick = cell_ewma_get_current_tick_and_fraction(&fractional_tick); if (tick != pol->active_circuit_pqueue_last_recalibrated) { scale_active_circuits(pol, tick); } } ``` (Which you might observe is essentially the first part of `ewma_notify_xmit_cells(...)`).
issue