Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #25125

Closed (moved)
(moved)
Open
Created Feb 02, 2018 by David Goulet@dgoulet🆘

KIST: Bug: scheduler_release_channel: Non-fatal assertion !(smartlist_pos(channels_pending, chan) == -1)

Because of commit cb5654f300312a8f and adaf3e9b89f62d68 merged last night, we now see a new bug (benign as far as I can tell).

Here is the SCHED_BUG() on one of my test relay:

Feb 02 12:56:21.930 [warn] tor_bug_occurred_(): Bug: src/or/scheduler.c:648: scheduler_release_channel: Non-fatal assertion !(smartlist_pos(channels_pending, chan) == -1) failed. (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.930 [warn] Bug: Non-fatal assertion !(smartlist_pos(channels_pending, chan) == -1) failed in scheduler_release_channel at src/or/scheduler.c:648. Stack trace: (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.930 [warn] Bug:     /root/git/tor/src/or/tor(log_backtrace+0x42) [0x5579b56bb582] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.930 [warn] Bug:     /root/git/tor/src/or/tor(tor_bug_occurred_+0xb9) [0x5579b56d6919] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(scheduler_release_channel+0x1aa) [0x5579b55edb0a] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(+0x10bd94) [0x5579b563fd94] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(connection_handle_write+0x2e) [0x5579b564067e] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(+0xbb229) [0x5579b55ef229] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(+0xb8c8f) [0x5579b55ecc8f] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x819) [0x7fcbd0fe94c9] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(do_main_loop+0x25c) [0x5579b5587f0c] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(tor_run_main+0x275) [0x5579b5589475] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(tor_main+0x3a) [0x5579b5582b3a] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(main+0x19) [0x5579b55828a9] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fcbcff01830] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)
Feb 02 12:56:21.931 [warn] Bug:     /root/git/tor/src/or/tor(_start+0x29) [0x5579b55828f9] (on Tor 0.3.3.1-alpha-dev 0fd91772b1ece9e0)

It appears that we can release a channel from within connection_handle_write(). At that point, the channel was popped from the pending list and within the scheduler loop, we are writing the channel connection's outbuf onto the wire so when it is release, this bug is triggered in scheduler_release_channel():

  if (chan->scheduler_state == SCHED_CHAN_PENDING) {
    if (SCHED_BUG(smartlist_pos(channels_pending, chan) == -1, chan)) {
      log_warn(LD_SCHED, "Scheduler asked to release channel %" PRIu64 " "
                         "but it wasn't in channels_pending",
               chan->global_identifier);

This doesn't affect tor in any ways and the current codeflow of channels suggest that the release should accept that the channel is not in the pending list even if the scheduler state is in PENDING.

The reason is that while in the scheduler loop, once the channel is popped from the pending list, we have to keep it in PENDING state (see #24700 (moved)) until the scheduler is done with it and decides to put it back in the list or change its state. This is because a channel can be rescheduled in while it is in the scheduler loop which is not ideal for KIST but this is the way it is for now.

We have to do a quick fix for 032 and 033 but this should be resolved in 034 with #24554 (moved).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking