segfault when combining lttng tracing and sandbox
Summary
I noticed this while trying to upgrade our CI from the end-of-life debian buster to the current stable debian bullseye. I haven't checked which external change specifically introduced the incompatibility between tracing and sandbox. I can reproduce this both in CI (with the updated distro) and on debian unstable in my development environment.
Steps to reproduce:
- Install dependencies (
apt install liblttng-ust-dev
) - Configure tor with lttng (--enable-tracing-instrumentation-lttng)
- Run unit tests, either with a full
make test
or more to the point:make -j8 src/test/test && src/test/test sandbox/is_active
What is the current bug behavior?
Segfault in unit tests. Looking closer, it's an abort() with maybe some secondary failures due to the signal handler changes being blocked. The main problem is that the urcu_bp
lib is always treating membarrier() failures as totally fatal if it's determined that membarrier should work at init time.
This strace (below) shows the problem pretty succinctly, but going through the library source verifies this explanation.
What is the expected behavior?
I would appreciate guidance on this but my read of the situation is that it would be appropriate to just allow membarrier inside the sandbox. If we don't want to do this, we need some way to fail better. Either we need to prevent lttng from being initialized until we're inside the sandbox, or we need to block membarrier() early on maybe.
Environment
- Verified on the 0.4.7 maintenance branch and on main, and on both debian unstable and bullseye. Does not occur on debian buster, which is why we don't see this in CI yet.
Relevant logs and/or screenshots
beth@potato ~/git/tor (git)-[ci-deb-bullseye-047-mr] % strace -e membarrier -f src/test/test sandbox/is_active
membarrier(MEMBARRIER_CMD_QUERY, 0) = 0x1ff (MEMBARRIER_CMD_GLOBAL|MEMBARRIER_CMD_GLOBAL_EXPEDITED|MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED|MEMBARRIER_CMD_PRIVATE_EXPEDITED|MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED|MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE|MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE|MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ|MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ)
membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, 0) = 0
strace: Process 3591653 attached
strace: Process 3591654 attached
sandbox/is_active: [forking] strace: Process 3591710 attached
[pid 3591710] membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, 0) = -1 EPERM (Operation not permitted)
[pid 3591710] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
[pid 3591710] +++ killed by SIGSEGV (core dumped) +++
[pid 3591652] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=3591710, si_uid=1000, si_status=SIGSEGV, si_utime=55 /* 0.55 s */, si_stime=6 /* 0.06 s */} ---
[did not exit cleanly.]
[is_active FAILED]
1/1 TESTS FAILED. (0 skipped)
[pid 3591652] membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, 0) = 0
[pid 3591652] membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, 0) = 0
[pid 3591652] membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, 0) = 0
[pid 3591652] membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED, 0) = 0
==3591652==LeakSanitizer has encountered a fatal error.
==3591652==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==3591652==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)
[pid 3591654] +++ exited with 1 +++
[pid 3591653] +++ exited with 1 +++
+++ exited with 1 +++
Possible fixes
- Allow membarrier in sandbox
- Disallow tracing + sandbox?
- Block membarrier early
- Init tracing later