KIST: Bug: scheduler_release_channel: Non-fatal assertion !(smartlist_pos(channels_pending, chan) == -1)
Because of commit cb5654f300312a8f
and adaf3e9b89f62d68
merged last night, we now see a new bug (benign as far as I can tell).
Here is the SCHED_BUG()
on one of my test relay:
Feb 02 12:56:21.930 [warn] tor_bug_occurred_(): Bug: src/or/scheduler.c:648: scheduler_release_channel: Non-fatal assertion !(smartlist_pos(channels_pending, chan) == -1) failed. (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.930 [warn] Bug: Non-fatal assertion !(smartlist_pos(channels_pending, chan) == -1) failed in scheduler_release_channel at src/or/scheduler.c:648. Stack trace: (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.930 [warn] Bug: /root/git/tor/src/or/tor(log_backtrace+0x42) [0x5579b56bb582] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.930 [warn] Bug: /root/git/tor/src/or/tor(tor_bug_occurred_+0xb9) [0x5579b56d6919] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(scheduler_release_channel+0x1aa) [0x5579b55edb0a] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(+0x10bd94) [0x5579b563fd94] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(connection_handle_write+0x2e) [0x5579b564067e] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(+0xbb229) [0x5579b55ef229] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(+0xb8c8f) [0x5579b55ecc8f] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x819) [0x7fcbd0fe94c9] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(do_main_loop+0x25c) [0x5579b5587f0c] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(tor_run_main+0x275) [0x5579b5589475] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(tor_main+0x3a) [0x5579b5582b3a] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(main+0x19) [0x5579b55828a9] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fcbcff01830] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug: /root/git/tor/src/or/tor(_start+0x29) [0x5579b55828f9] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
It appears that we can release a channel from within connection_handle_write()
. At that point, the channel was popped from the pending list and within the scheduler loop, we are writing the channel connection's outbuf onto the wire so when it is release, this bug is triggered in scheduler_release_channel()
:
if (chan->scheduler_state == SCHED_CHAN_PENDING) {
if (SCHED_BUG(smartlist_pos(channels_pending, chan) == -1, chan)) {
log_warn(LD_SCHED, "Scheduler asked to release channel %" PRIu64 " "
"but it wasn't in channels_pending",
chan->global_identifier);
This doesn't affect tor in any ways and the current codeflow of channels suggest that the release should accept that the channel is not in the pending list even if the scheduler state is in PENDING.
The reason is that while in the scheduler loop, once the channel is popped from the pending list, we have to keep it in PENDING state (see legacy/trac#24700 (moved)) until the scheduler is done with it and decides to put it back in the list or change its state. This is because a channel can be rescheduled in while it is in the scheduler loop which is not ideal for KIST but this is the way it is for now.
We have to do a quick fix for 032 and 033 but this should be resolved in 034 with legacy/trac#24554 (moved).