Review handling of panics in spawned tasks and threads
We should review our approach to panic handling in each spawned async task and spawned thread. When a spawned task or thread panics, and it isn't caught, the runtime prints a backtrace and the JoinHandle
would give Err
, but often we discard JoinHandle
s so the task's job is just not done.
As a rule of thumb we should catch panics when:
- We spawn a task (or thread)
- The task will do stuff that relies on lots of other code (esp. other crates' code), so that it might be made to panic by something very far away. (So this doesn't apply to most of the local update tasks, eg the bridge descriptor manager timeout task.)
- The way the task produces its output means that whoever consumes its output won't get an error indication. (Eg this means we probably don't care about the channel reactor, because if the channel reactor panics, the various interfaces to the channel all give EOF or send errors.)
In practice, we try quite hard to make our code not panic so missing catches are not a huge problem. but it would be better (and our system more reliable overall and have better error handling) if panics are detected and reported/propagated/whatever.
(prompted by discussion !831 (comment 2851925))