android: fdsan SIGABRT tor_main_configuration_free
Summary
When testing tor-0.4.7.13 on Android 13, I experienced a SIGABRT
in tor_main_configuration_free
caused by fdsan. The reason why fdsan causes an abort is that the owning control socket is closed twice. I noticed this crash as part of testing a release candidate of OONI Probe Android where we embed libtor.a
.
The corresponding OONI Probe issue is: https://github.com/ooni/probe/issues/2405.
Steps to reproduce
I do not have a very simple procedure to reproduce the issue that does not involve OONI Probe and its build system.
However, the underlying issue is independent of Android. The only reason why Android matters is that the fdsan notices the double close of the same file descriptor and hence triggers a crash (for Android API level >= 30).
Because of this, here are instructions to reproduce the underlying issue using GNU/Linux (I used Ubuntu 22.04):
-
clone tor
-
git checkout tor-0.4.7.13
-
git apply tor.diff
wheretor.diff
is the following patch:
diff --git a/src/core/mainloop/connection.c b/src/core/mainloop/connection.c
index cf25213cb1..d690de3892 100644
--- a/src/core/mainloop/connection.c
+++ b/src/core/mainloop/connection.c
@@ -149,6 +149,8 @@
#include "core/or/congestion_control_flow.h"
+#include <stdio.h>
+
/**
* On Windows and Linux we cannot reliably bind() a socket to an
* address and port if: 1) There's already a socket bound to wildcard
@@ -949,6 +951,7 @@ connection_free_minimal(connection_t *conn)
if (SOCKET_OK(conn->s)) {
log_debug(LD_NET,"closing fd %d.",(int)conn->s);
+ fprintf(stderr, "SBSDEBUG: connection_free_minimal %lld\n", (long long)conn->s);
tor_close_socket(conn->s);
conn->s = TOR_INVALID_SOCKET;
}
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..fb49d92ad7 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -116,6 +116,11 @@ tor_main_configuration_setup_control_socket(tor_main_configuration_t *cfg)
cfg_add_owned_arg(cfg, "__OwningControllerFD");
cfg_add_owned_arg(cfg, buf);
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_setup_control_socket %lld %lld\n",
+ (long long)fds[0], (long long)fds[1]
+ );
+
cfg->owning_controller_socket = fds[1];
return fds[0];
}
@@ -132,6 +137,10 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
raw_free(cfg->argv_owned);
}
if (SOCKET_OK(cfg->owning_controller_socket)) {
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_free %lld\n",
+ (long long)cfg->owning_controller_socket
+ );
raw_closesocket(cfg->owning_controller_socket);
}
raw_free(cfg);
-
./autogen.sh
-
./configure --disable-asciidoc
-
make
-
mkdir tmp
-
vi tmp/main.c
making sure it contains the following content:
#include "../src/feature/api/tor_api.h"
#include <stdlib.h>
#include <unistd.h>
int main() {
tor_main_configuration_t *cfg = tor_main_configuration_new();
if (cfg == NULL) {
exit(1);
}
tor_control_socket_t sock = tor_main_configuration_setup_control_socket(cfg);
if (sock == INVALID_TOR_CONTROL_SOCKET) {
exit(2);
}
(void)close(sock); // close immediately (it's async on Android but it should not matter AFAICT)
(void)tor_run_main(cfg);
tor_main_configuration_free(cfg);
}
-
gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm
-
./a.out
which should produce this output:
SBSDEBUG: tor_main_configuration_setup_control_socket 4 5
Feb 02 17:18:07.330 [notice] Tor 0.4.7.13 (git-7c1601fb6edd780f) running on Linux with Libevent 2.1.12-stable, OpenSSL 3.0.2, Zlib 1.2.11, Liblzma N/A, Libzstd N/A and Glibc 2.35 as libc.
Feb 02 17:18:07.330 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://support.torproject.org/faq/staying-anonymous/
Feb 02 17:18:07.330 [notice] Configuration file "/usr/local/etc/tor/torrc" not present, using reasonable defaults.
Feb 02 17:18:07.331 [notice] Opening Socks listener on 127.0.0.1:9050
Feb 02 17:18:07.331 [notice] Opened Socks listener connection (ready) on 127.0.0.1:9050
Feb 02 17:18:07.000 [notice] Bootstrapped 0% (starting): Starting
Feb 02 17:18:07.000 [notice] Starting with guard context "default"
Feb 02 17:18:07.000 [notice] Owning controller connection has closed -- exiting now.
SBSDEBUG: connection_free_minimal 5
Feb 02 17:18:07.000 [notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 8
SBSDEBUG: tor_main_configuration_free 5
If you analyze the above output, you would see that the file descriptor 5
is closed twice. This output is almost identical to the output that I have seen in the Android logcat (more on that below). Also, I think the way in which I am using the embedding API above (which mirrors our more complex implementation written in Go) is fine; if not, please educate me.
(If you want to reproduce the same problem I experience on Android, I can either explain how to compile and test OONI Probe for Android, or I can try to work on creating a simple Android PoC like the one above.)
What is the current bug behavior?
We're in an embedding scenario where we eventually call tor_run_main
, as mentioned in the previous section.
This is the sequence of APIs we call along with my best understanding of what happens inside tor
:
We create a configuration using tor_main_configuration_new
.
Calling tor_main_configuration_setup_control_socket
creates a pair of sockets, returns fds[0]
to us, and retains fds[1]
inside tor_main_configuration_t::owning_controller_socket
.
Calling tor_run_main
calls (in a way that is not 100% clear to me) options_act
that passes the owning_controller_socket
to control_connection_add_local_fd
. In turn, this function registers the file descriptor fds[1]
as the control connection.
Eventually we close
the fds[0]
that was returned to us, which causes tor
to stop its libevent loop.
When tor_run_main
terminates, it calls tor_cleanup
, which calls tor_free_all
, which calls connection_free_all
, which calls connection_free_minimal
for each connection, including the owning file descriptor fds[1]
.
After tor_run_main
, we call tor_main_configuration_free
. In turn, this function calls raw_closesocket
on the owning_controller_socket
, which is hence closed for the second time.
On Android with API level >= 30, the fdsan sanitizer notices the second close and sometimes (roughly 50%) this fact causes the app to abort.
What is the expected behavior?
I think tor should duplicate the file descriptor before registering it into the core event loop such that there is a single owner of each of the two duplicates. The tor_main_configuration_free
function owns one of them and the core event loop owns the other one. This semantics shouldn't cause any issue with the fdsan sanitizer because it's designed to enforce it.
Alternatively, it should probably be documented to use API level < 30 (where the fdsan only warns). Or it should be documented that one should use the proper fdsan API to disable crashing on double close. (I do not remember seeing these warnings when I read how to use the embedding API and a quick git grep fdsan
or git grep "API level"
did not return anything, but it's still possible that I overlooked some documentation about this issue.)
Environment
- Which version of Tor are you using?
tor-0.4.7.13
- Which operating system are you using?
Android 13 (but I also provided a minimal example on GNU/Linux)
- Which installation method did you use?
We cross compile tor for Android using our cross compilation scripts.
We obtain a static set of libraries and a tor_api.h
that we link as part of building an AAR with go mobile.
We use the obtained AAR as a dependency for OONI Probe Android.
The code that specifically invokes tor is written in Go. The sequence of events in terms of the Tor embedding API is the one I described above in the "what is the current bug behavior?" section.
However, I have also provided a minimal example for GNU/Linux that shows the double-close issue.
Relevant logs and/or screenshots
The following is an excerpt from the tombstone generated by the crashing app:
[notice] Catching signal TERM, exiting cleanly.
fdsan: attempted to close file descriptor 104, expected to be unowned, \
actually owned by unique_fd 0x70b7e5f19c
[...]
ABI: 'arm64'
Timestamp: 2023-02-02 11:39:31.181519664+0100
Cmdline: org.openobservatory.ooniprobe.experimental
pid: 16472, tid: 16593, name: AsyncTask #1 >>> org.openobservatory.ooniprobe.experimental <<<
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
[...]
backtrace:
#00 pc 0000000000055c48 .../lib64/bionic/libc.so (fdsan_error(char const*, ...)+556) (...)
#01 pc 0000000000055954 .../lib64/bionic/libc.so (android_fdsan_close_with_tag+732) (...)
#02 pc 00000000000560a8 .../lib64/bionic/libc.so (close+16) (...)
#03 pc 00000000012ad08c [...]/lib/arm64/libgojni.so (tor_main_configuration_free+128)
If I apply the patch that above I called tor.diff
and run OONI Probe on Android, I see this in the logcat:
SBSDEBUG: tor_main_configuration_setup_control_socket 94 98 // <- fds[0] and fds[1]
[...]
[notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 141
SBSDEBUG: connection_free_minimal 98 // <- first close of fds[1]
SBSDEBUG: connection_free_minimal 116
SBSDEBUG: connection_free_minimal 152
SBSDEBUG: tor_main_configuration_free 98 // <- second close of fds[1]
Possible fixes
The following patch makes the app work as intended (i.e., no crashes for several runs):
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..2773949264 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -131,9 +131,13 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
}
raw_free(cfg->argv_owned);
}
+ /* See https://github.com/ooni/probe/issues/2405 to understand
+ why we're not closing the controller socker here. */
+ /*
if (SOCKET_OK(cfg->owning_controller_socket)) {
raw_closesocket(cfg->owning_controller_socket);
}
+ */
raw_free(cfg);
}
That said, I think this patch is wrong because it leaks the file descriptor when tor_run_main
returns prematurely (e.g., when the command line flags are wrong). Because of this, I think the more robust fix would be to duplicate the file descriptor before registering it into the libevent loop, as I explained above.