Skip to content

Fix continually_expire_channels timing issues

Jim Newsome requested to merge jnewsome/arti:changmgr-timing into main

Fixes two timing bugs in continually_expire_channels.

The first is moderately impactful; it would result in busy looping for up to 1s in some cases, which is potentially impactful to CPU usage, battery, etc.

The second is fairly benign. In typical environments it would have been hit only very rarely and then only result in a single extra iteration over the channel list.

Both bugs resulted in deadlock in shadow's default timing model where time only moves forward when blocked on network I/O or explicit sleep and timeouts. Fixing them lets us not use shadow's model-unblocked-syscall-latency workaround. Using that workaround has some disadvantages including masking these sort of bugs. https://shadow.github.io/docs/guide/limitations.html#busy-loops

Disabling that option also appears to make the problem in #1170 go away (as do other timing/scheduling perturbations, making it difficult to debug; see the issue). That lets us remove the workaround for that bug of having a dummy socks client.

Merge request reports