Skip to content
Snippets Groups Projects

tor-proto: Fix possible bug when no hops are ready

Merged opara requested to merge opara/arti:none-pending into main
2 unresolved threads

What I think can happen is that if Circuit::ready_streams_iterator returns an empty Stream, ConfluxSet::circuit_action will select_biased! on this empty stream and next() will cause us to immediately return None. I think it would be better to instead wait on the other arms of the select_biased! in this case. By immediately returning None, my worry is that the tunnel reactor can end up in a spin loop where ConfluxSet::circuit_action immediately returns None, causing Reactor::run_once to finish immediately and cause Reactor::run to spin.

@gabi-250 While I do think that it's better to not return None, I'm not sure if there's some other edge case I'm not considering where returning a std::future::pending() might be bad.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
146 146 leg.chan_sender.prepare_send_from(async move {
147 // A future to wait for the next ready stream.
148 let next_ready_stream = async {
149 match ready_streams.next().await {
150 Some(x) => x,
151 None => {
152 // There are no ready streams (for example, they may all be
153 // blocked due to congestion control), so there is nothing
154 // to do.
155 // We await an infinitely pending future so that we don't
156 // immediately return a `None` in the `select_biased!` below.
157 // We'd rather wait on `input.next()` than immediately return with
158 // no `CircuitAction`, which could put the reactor into a spin loop.
159 let () = std::future::pending().await;
160 unreachable!();
161 }
  • Hmm, initially I was going to suggest moving this to ready_streams_iterator (replacing filter_map with map, and returning pending() on the None case), but that seems strictly worse: I think it'll require boxing the futures to make the expression type-check, and the resulting Stream can still be empty (if the list of hops is empty).

  • Afaiu, this blocks the reactor until self.input.next() returns a cell.

    And so, lets imagine this reactor has 10 streams, if all of them are blocked, we get None and block until input which basically means nothing will happen until we get an inbound cell basically which can be arbitrarily long?

    Is the theory here that if we get None, it is because of congestion control and that we should get a SendMe at some point that will unblock this whole thing? Aren't there other things that can block all streams here for which we might not get an inbound cell to unblock all iiuc?

  • Afaiu, this blocks the reactor until self.input.next() returns a cell.

    By "this", do you mean the MR, or my suggestion to move the pending() logic into ready_streams_iterator?

    Assuming you're talking about the former:

    And so, lets imagine this reactor has 10 streams, if all of them are blocked, we get None and block until input which basically means nothing will happen until we get an inbound cell basically which can be arbitrarily long?

    The reactor won't block completely: it will continue polling (via the select_biased! from run_once()) the command and control channels, and as you say, it will also continue trying to read from input (by polling the future returned by circuit_action())

    Is the theory here that if we get None, it is because of congestion control and that we should get a SendMe at some point that will unblock this whole thing? Aren't there other things that can block all streams here for which we might not get an inbound cell to unblock all iiuc?

    It can be either because of congestion control, or because the circ has no hops. There is nothing else that can cause the ready_streams stream to be exhausted (the only time we return None in the filter_map() from ready_streams_iterator() is if hop.ccontrol.can_send() == false).


    Thinking more about this, the std::future::pending().await looks a bit scary. A potentially nicer way of doing this would be by changing ready_streams_iterator to map each hop to a future that blocks until the hop is unblocked (receives a SENDME). But that would be more complex to implement.

  • Yeah for the record, I don't have a good solution here except returning from reactor main loop onto the next iteration and let it busy loop I guess which is "meh".

    Down the line, each stream will be bound by flow control (XON/XOFF) with a rate limit so I expect these situation to happen organically so this is why I'm a bit on the fence of making them "blocked" until control/commmand/input wakes up...

    I like the Future idea wrapped around the stream for its flow control condition.

  • Author Maintainer

    Thinking more about this, the std::future::pending().await looks a bit scary. A potentially nicer way of doing this would be by changing ready_streams_iterator to map each hop to a future that blocks until the hop is unblocked (receives a SENDME). But that would be more complex to implement.

    What can cause a hop to become unblocked? My understanding is that currently it's just SENDMEs, but those will arrive on input. So even if we made a future for each hop, the futures would never be ready since the cell will arrive on input, and the hops would become unblocked outside of this select! block.

    Yeah for the record, I don't have a good solution here except returning from reactor main loop onto the next iteration and let it busy loop I guess which is "meh".

    I don't think we should ever let it busy loop, since I think the task will never hand control back to the runtime since it never needs to block, and this task will block other tasks from running on this thread while it continues spinning.

    But if there's some condition that will unblock a hop, I think we should have a future for it that we can await on rather than busy loop. But it's unclear to me exactly what conditions unblock a hop. Currently there's the SENDME, but what other conditions do we need for congestion control?

    Down the line, each stream will be bound by flow control (XON/XOFF) with a rate limit so I expect these situation to happen organically so this is why I'm a bit on the fence of making them "blocked" until control/commmand/input wakes up...

    I'm not clear on how the congestion control rate limiting works. Is it a "unit per time" rate limit, where we need timers to update some token bucket to unblock a stream? In prop324, it mentions a kbps_ewma, but never actually explains what it is.

    Edited by opara
  • What can cause a hop to become unblocked? My understanding is that currently it's just SENDMEs,

    That is my understanding too

    but those will arrive on input. So even if we made a future for each hop, the futures would never be ready since the cell will arrive on input, and the hops would become unblocked outside of this select! block.

    (note the "future" I mentioned in my comment is the future we already create for each hop using poll_fn())

    Yeah, good point. My hand-wavy "But that would be more complex to implement." comment was mostly about somehow writing the code to get a notification when this happens, e.g. via a channel, but as you say it's not really doable with current implementation. So I think the current approach is fine.

  • Author Maintainer

    Okay I see. Yeah, I'm open to changing this, I just don't see how we can do that with the current structure. So I'll merge it as is for now, and we can change it later when we need to.

  • Please register or sign in to reply
  • gabi-250 left review comments

    left review comments

  • David Goulet left review comments

    left review comments

  • opara added 49 commits

    added 49 commits

    • 56d790da...0f13b7a0 - 47 commits from branch tpo/core:main
    • 888a4024 - tor-proto: fix possible bug when there are no ready streams
    • 152b8714 - tor-proto: remove unnecessary `Option` from `circuit_action`

    Compare with previous version

  • opara enabled an automatic merge when all merge checks for 152b8714 pass

    enabled an automatic merge when all merge checks for 152b8714 pass

  • opara mentioned in commit 5acdeea6

    mentioned in commit 5acdeea6

  • merged

  • Please register or sign in to reply
    Loading