Tor issueshttps://gitlab.torproject.org/tpo/core/tor/-/issues2024-03-21T18:27:41Zhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40820connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at...2024-03-21T18:27:41Zcomputer_freakconnection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )```
...
07:55:37.000 [notice] Heartbeat: Tor's uptime is 7 days 18:00 hours, with 71411 circuits open. I've sent 10022.56 GB and received 9786.33 GB. I've received 3866839 connections on IPv4 and 21166 on IPv6. I've made 181900 connectio...```
...
07:55:37.000 [notice] Heartbeat: Tor's uptime is 7 days 18:00 hours, with 71411 circuits open. I've sent 10022.56 GB and received 9786.33 GB. I've received 3866839 connections on IPv4 and 21166 on IPv6. I've made 181900 connections with IPv4 and 56445 with IPv6.
07:55:37.000 [notice] While bootstrapping, fetched this many bytes: 8107 (microdescriptor fetch)
07:55:37.000 [notice] While not bootstrapping, fetched this many bytes: 148937098 (server descriptor fetch); 5760 (server descriptor upload); 12096007 (consensus network-status fetch); 3883848 (microdescriptor fetch)
07:55:37.000 [notice] Circuit handshake stats since last time: 517/517 TAP, 3956544/3973033 NTor.
07:55:37.000 [notice] Since startup we initiated 0 and received 0 v1 connections; initiated 0 and received 0 v2 connections; initiated 0 and received 6320 v3 connections; initiated 3 and received 190964 v4 connections; initiated 71807 and received 3613775 v5 connections.
07:55:37.000 [notice] Heartbeat: DoS mitigation since startup: 1037 circuits killed with too many cells, 32767104 circuits rejected, 401 marked addresses, 1 marked addresses for max queue, 172207 same address concurrent connections rejected, 0 connections rejected, 0 single hop clients refused, 2185090 INTRODUCE2 rejected.
13:37:58.000 [notice] We're low on memory (cell queues total alloc: 10805520 buffer total alloc: 12886016, tor compress total alloc: 667208970 (zlib: 259584, zstd: 666941450, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:37:58.000 [notice] Removed 85978221 bytes by killing 62 circuits; 49010 circuits remain alive. Also killed 0 non-linked directory connections. Killed 3 edge connections
13:37:58.000 [warn] connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )
13:37:58.000 [warn] tor_bug_occurred_(): Bug: ../src/core/or/connection_edge.c:1065: connection_edge_about_to_close: This line should not have been reached. (Future instances of this warning will be silenced.) (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: Tor 0.4.7.13: Line unexpectedly reached at connection_edge_about_to_close at ../src/core/or/connection_edge.c:1065. Stack trace: (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(log_backtrace_impl+0x57) [0x55ba820720e7] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(tor_bug_occurred_+0x16b) [0x55ba8207d30b] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(connection_exit_about_to_close+0x1d) [0x55ba8211e4fd] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(+0x6b5ed) [0x55ba81ff65ed] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(+0x6bcfb) [0x55ba81ff6cfb] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7(+0x23b4f) [0x7fc9eef8db4f] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7(event_base_loop+0x52f) [0x7fc9eef8e28f] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(do_main_loop+0x101) [0x55ba81ff86f1] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(tor_run_main+0x1e5) [0x55ba81ff3fc5] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(tor_main+0x49) [0x55ba81ff02d9] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(main+0x19) [0x55ba81fefeb9] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7fc9ee828d0a] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(_start+0x2a) [0x55ba81feff0a] (on Tor 0.4.7.13 )
13:37:58.000 [warn] connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )
13:37:58.000 [warn] connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )
13:37:58.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:37:58.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:10.000 [notice] We're low on memory (cell queues total alloc: 15153072 buffer total alloc: 11710464, tor compress total alloc: 664530008 (zlib: 302848, zstd: 664219240, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:10.000 [notice] Removed 83687644 bytes by killing 17 circuits; 49083 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:38:11.000 [notice] We're low on memory (cell queues total alloc: 14386416 buffer total alloc: 12208128, tor compress total alloc: 665847849 (zlib: 259584, zstd: 665580345, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:11.000 [notice] Removed 88057871 bytes by killing 17 circuits; 49151 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [notice] We're low on memory (cell queues total alloc: 13510992 buffer total alloc: 11796480, tor compress total alloc: 665718009 (zlib: 129792, zstd: 665580345, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:12.000 [notice] Removed 82329467 bytes by killing 14 circuits; 49226 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:13.000 [notice] We're low on memory (cell queues total alloc: 14437632 buffer total alloc: 12062720, tor compress total alloc: 664356888 (zlib: 129792, zstd: 664219240, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:13.000 [notice] Removed 81757690 bytes by killing 15 circuits; 49281 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:54:30.000 [notice] Sudden decrease in circuit RTT (11 vs 119683), likely due to clock jump.
13:55:37.000 [notice] Heartbeat: Tor's uptime is 8 days 0:00 hours, with 74885 circuits open. I've sent 10349.14 GB and received 10099.50 GB. I've received 3979142 connections on IPv4 and 21891 on IPv6. I've made 187464 connections with IPv4 and 57946 with IPv6.
13:55:37.000 [notice] While bootstrapping, fetched this many bytes: 8107 (microdescriptor fetch)
13:55:37.000 [notice] While not bootstrapping, fetched this many bytes: 154088181 (server descriptor fetch); 5760 (server descriptor upload); 12495772 (consensus network-status fetch); 4020088 (microdescriptor fetch)
13:55:37.000 [notice] Circuit handshake stats since last time: 581/581 TAP, 2997064/3009741 NTor.
13:55:37.000 [notice] Since startup we initiated 0 and received 0 v1 connections; initiated 0 and received 0 v2 connections; initiated 0 and received 6720 v3 connections; initiated 3 and received 199738 v4 connections; initiated 73618 and received 3714169 v5 connections.
13:55:37.000 [notice] Heartbeat: DoS mitigation since startup: 1051 circuits killed with too many cells, 33500595 circuits rejected, 404 marked addresses, 1 marked addresses for max queue, 183981 same address concurrent connections rejected, 0 connections rejected, 0 single hop clients refused, 2523887 INTRODUCE2 rejected.
...
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40806running tor guard/bridge/exit behind layers of nat2023-06-13T20:47:38Zredbearrunning tor guard/bridge/exit behind layers of natThe issue is i can start a node and port is reached by outside but outside ips need uses some intermediate ip to access node. There is some work around ? is possible make something or config on torrc ?The issue is i can start a node and port is reached by outside but outside ips need uses some intermediate ip to access node. There is some work around ? is possible make something or config on torrc ?https://gitlab.torproject.org/tpo/core/tor/-/issues/40807Look for the lib64 directory when using a custom OpenSSL directory2023-06-12T16:28:25ZPier Angelo VendrameLook for the lib64 directory when using a custom OpenSSL directoryFor Tor Browser, we build tor in an old Debian container that has a very old OpenSSL.
So, we also compile OpenSSL and use a custom prefix with `--with-openssl-dir=$openssldir`.
From what I understood (by compiling with `make V=1`), it s...For Tor Browser, we build tor in an old Debian container that has a very old OpenSSL.
So, we also compile OpenSSL and use a custom prefix with `--with-openssl-dir=$openssldir`.
From what I understood (by compiling with `make V=1`), it seems to me that tor's build system tries to use `$openssldir/lib` as a library directory (and IIRC, it falls back to `$openssldir` when it doesn't exist).
However, with OpenSSL 3, the default directory has become `$openssldir/lib64` (at least on Linux amd64).
Linking `$openssldir/lib` to `$openssldir/lib64` seems to solve all the various problems.https://gitlab.torproject.org/tpo/core/tor/-/issues/40779Investigate address detection usage in Tor2023-05-15T16:39:11ZAlexander Færøyahf@torproject.orgInvestigate address detection usage in TorTor currently has a number of ways of detecting its own address when being used as relay. This includes:
- netinfo cell
- dirport connections to other relays
- configuration specification
It would be useful for the Arti WG to learn whi...Tor currently has a number of ways of detecting its own address when being used as relay. This includes:
- netinfo cell
- dirport connections to other relays
- configuration specification
It would be useful for the Arti WG to learn which of these methods are actively being used before we start implementing relay support in Arti.
A useful thing to do here is to enumerate the methods we have and extend MetricsPort to store information on where the relay learned its address for so we can make a sensible decision.Tor: 0.4.8.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40767Investigate high circuit build error rates in simulation2023-04-12T14:46:43Zgabi-250Investigate high circuit build error rates in simulationWe ran some shadow simulations to debug/repro the issue from #40570, and @jnewsome noticed the onion service clients have consistently high [circuit build failure rates](https://gitlab.torproject.org/tpo/core/tor/-/issues/40570#note_2883...We ran some shadow simulations to debug/repro the issue from #40570, and @jnewsome noticed the onion service clients have consistently high [circuit build failure rates](https://gitlab.torproject.org/tpo/core/tor/-/issues/40570#note_2883257).
We should figure out what causes these circuit build failures.https://gitlab.torproject.org/tpo/core/tor/-/issues/40768Decide whether to disable circuit cannibalization entirely2023-04-12T14:47:42Zgabi-250Decide whether to disable circuit cannibalization entirelyThe discussions around #40570 reignited a discussion about circuit cannibalization, and whether the performance improvements it provides (if any) justify the maintenance costs of keeping it around. This ticket is about deciding whether w...The discussions around #40570 reignited a discussion about circuit cannibalization, and whether the performance improvements it provides (if any) justify the maintenance costs of keeping it around. This ticket is about deciding whether we should disable cannibalization in c-tor.https://gitlab.torproject.org/tpo/core/tor/-/issues/40761DDoS mitigation: analysis to understand relay-to-relay connections from non-r...2023-04-12T14:46:14ZbnmDDoS mitigation: analysis to understand relay-to-relay connections from non-relay IPs
We are working on a tor proposal that should help
with protecting non-guard relays from a large fraction
of the DDoS load.
In first tests we have seen a 55% CPU usage decrease
when deploying our proposed mitigations, but we
want to mak...
We are working on a tor proposal that should help
with protecting non-guard relays from a large fraction
of the DDoS load.
In first tests we have seen a 55% CPU usage decrease
when deploying our proposed mitigations, but we
want to make sure that we are not introducing an
over blocking problem. We know about a few
configurations when a relay will use a source IP that is not in
consensus to connect to other relays (OutboundBindAddress, OutboundBindAddressOR)
but we would like to have some actual data about it.
To measure, understand and solve that potential problem and to
back up the proposal with some actual data we would
like to measure the following on our tor relays:
Log when our non-guard tor relays get
an authenticated relay to relay connection to our ORPort
from a source IP that is not in consensus and not in
the exit lists:
```
timestamp relay-fingerprint source-IP
```
If the "and not in the exit lists" part is too hard,
we can take care of that in post-processing of the logs
to filter them out.
We do not care about client to relay connections and do not want to log them.
Would it be possible to provide a patch or branch
that implements that logging on top of main?
It does not have to be in a release and we will run it
only temporarily.
thank you!https://gitlab.torproject.org/tpo/core/tor/-/issues/40747android: fdsan SIGABRT tor_main_configuration_free2023-04-12T14:45:21Zsbsandroid: fdsan SIGABRT tor_main_configuration_free### Summary
When testing tor-0.4.7.13 on Android 13, I experienced a `SIGABRT` in `tor_main_configuration_free` caused by [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md). The reason why fdsan causes an a...### Summary
When testing tor-0.4.7.13 on Android 13, I experienced a `SIGABRT` in `tor_main_configuration_free` caused by [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md). The reason why fdsan causes an abort is that the owning control socket is closed twice. I noticed this crash as part of testing a release candidate of [OONI Probe Android](https://github.com/ooni/probe-android/) where we embed `libtor.a`.
The corresponding OONI Probe issue is: https://github.com/ooni/probe/issues/2405.
### Steps to reproduce
I do not have a very simple procedure to reproduce the issue that does not involve OONI Probe and its build system.
However, the underlying issue is independent of Android. The only reason why Android matters is that the fdsan notices the double close of the same file descriptor and hence triggers a crash (for Android API level >= 30).
Because of this, here are instructions to reproduce the underlying issue using GNU/Linux (I used Ubuntu 22.04):
1. clone tor
2. `git checkout tor-0.4.7.13`
3. `git apply tor.diff` where `tor.diff` is the following patch:
```diff
diff --git a/src/core/mainloop/connection.c b/src/core/mainloop/connection.c
index cf25213cb1..d690de3892 100644
--- a/src/core/mainloop/connection.c
+++ b/src/core/mainloop/connection.c
@@ -149,6 +149,8 @@
#include "core/or/congestion_control_flow.h"
+#include <stdio.h>
+
/**
* On Windows and Linux we cannot reliably bind() a socket to an
* address and port if: 1) There's already a socket bound to wildcard
@@ -949,6 +951,7 @@ connection_free_minimal(connection_t *conn)
if (SOCKET_OK(conn->s)) {
log_debug(LD_NET,"closing fd %d.",(int)conn->s);
+ fprintf(stderr, "SBSDEBUG: connection_free_minimal %lld\n", (long long)conn->s);
tor_close_socket(conn->s);
conn->s = TOR_INVALID_SOCKET;
}
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..fb49d92ad7 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -116,6 +116,11 @@ tor_main_configuration_setup_control_socket(tor_main_configuration_t *cfg)
cfg_add_owned_arg(cfg, "__OwningControllerFD");
cfg_add_owned_arg(cfg, buf);
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_setup_control_socket %lld %lld\n",
+ (long long)fds[0], (long long)fds[1]
+ );
+
cfg->owning_controller_socket = fds[1];
return fds[0];
}
@@ -132,6 +137,10 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
raw_free(cfg->argv_owned);
}
if (SOCKET_OK(cfg->owning_controller_socket)) {
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_free %lld\n",
+ (long long)cfg->owning_controller_socket
+ );
raw_closesocket(cfg->owning_controller_socket);
}
raw_free(cfg);
```
4. `./autogen.sh`
5. `./configure --disable-asciidoc`
6. `make`
7. `mkdir tmp`
8. `vi tmp/main.c` making sure it contains the following content:
```C
#include "../src/feature/api/tor_api.h"
#include <stdlib.h>
#include <unistd.h>
int main() {
tor_main_configuration_t *cfg = tor_main_configuration_new();
if (cfg == NULL) {
exit(1);
}
tor_control_socket_t sock = tor_main_configuration_setup_control_socket(cfg);
if (sock == INVALID_TOR_CONTROL_SOCKET) {
exit(2);
}
(void)close(sock); // close immediately (it's async on Android but it should not matter AFAICT)
(void)tor_run_main(cfg);
tor_main_configuration_free(cfg);
}
```
9. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
10. `./a.out` which should produce this output:
```
SBSDEBUG: tor_main_configuration_setup_control_socket 4 5
Feb 02 17:18:07.330 [notice] Tor 0.4.7.13 (git-7c1601fb6edd780f) running on Linux with Libevent 2.1.12-stable, OpenSSL 3.0.2, Zlib 1.2.11, Liblzma N/A, Libzstd N/A and Glibc 2.35 as libc.
Feb 02 17:18:07.330 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://support.torproject.org/faq/staying-anonymous/
Feb 02 17:18:07.330 [notice] Configuration file "/usr/local/etc/tor/torrc" not present, using reasonable defaults.
Feb 02 17:18:07.331 [notice] Opening Socks listener on 127.0.0.1:9050
Feb 02 17:18:07.331 [notice] Opened Socks listener connection (ready) on 127.0.0.1:9050
Feb 02 17:18:07.000 [notice] Bootstrapped 0% (starting): Starting
Feb 02 17:18:07.000 [notice] Starting with guard context "default"
Feb 02 17:18:07.000 [notice] Owning controller connection has closed -- exiting now.
SBSDEBUG: connection_free_minimal 5
Feb 02 17:18:07.000 [notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 8
SBSDEBUG: tor_main_configuration_free 5
```
If you analyze the above output, you would see that the file descriptor `5` is closed twice. This output is almost identical to the output that I have seen in the Android logcat (more on that below). Also, I _think_ the way in which I am using the embedding API above (which mirrors our more complex implementation written in Go) is fine; if not, please educate me.
(If you want to reproduce the same problem I experience on Android, I can either explain how to compile and test OONI Probe for Android, or I can try to work on creating a simple Android PoC like the one above.)
### What is the current bug behavior?
We're in an embedding scenario where we eventually call `tor_run_main`, as mentioned in the previous section.
This is the sequence of APIs we call along with my best understanding of what happens inside `tor`:
We create a configuration using `tor_main_configuration_new`.
Calling `tor_main_configuration_setup_control_socket` creates a pair of sockets, returns `fds[0]` to us, and retains `fds[1]` inside `tor_main_configuration_t::owning_controller_socket`.
Calling `tor_run_main` calls (in a way that is not 100% clear to me) `options_act` that passes the `owning_controller_socket` to `control_connection_add_local_fd`. In turn, this function registers the file descriptor `fds[1]` as the control connection.
Eventually we `close` the `fds[0]` that was returned to us, which causes `tor` to stop its libevent loop.
When `tor_run_main` terminates, it calls `tor_cleanup`, which calls `tor_free_all`, which calls `connection_free_all`, which calls `connection_free_minimal` for each connection, including the owning file descriptor `fds[1]`.
After `tor_run_main`, we call `tor_main_configuration_free`. In turn, this function calls `raw_closesocket` on the `owning_controller_socket`, which is hence closed for the second time.
On Android with API level >= 30, the [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md) sanitizer notices the second close and _sometimes_ (roughly 50%) this fact causes the app to abort.
### What is the expected behavior?
I think tor should duplicate the file descriptor before registering it into the core event loop such that there is a single owner of each of the two duplicates. The `tor_main_configuration_free` function owns one of them and the core event loop owns the other one. This semantics shouldn't cause any issue with the fdsan sanitizer because it's designed to enforce it.
Alternatively, it should probably be documented to use API level < 30 (where the fdsan only warns). Or it should be documented that one should use the proper fdsan API to disable crashing on double close. (I do not remember seeing these warnings when I read how to use the embedding API and a quick `git grep fdsan` or `git grep "API level"` did not return anything, but it's still possible that I overlooked _some_ documentation about this issue.)
### Environment
- Which version of Tor are you using?
tor-0.4.7.13
- Which operating system are you using?
Android 13 (but I also provided a minimal example on GNU/Linux)
- Which installation method did you use?
We cross compile tor for Android using [our cross compilation scripts](https://github.com/ooni/probe-cli/tree/v3.17.0-alpha.1/internal/cmd/buildtool).
We obtain a static set of libraries and a `tor_api.h` that we link as part of building an AAR with [go mobile](https://github.com/golang/mobile).
We use the obtained AAR as a dependency for [OONI Probe Android](https://github.com/ooni/probe-android/).
The code that specifically invokes tor [is written in Go](https://github.com/ooni/probe-android/). The sequence of events in terms of the Tor embedding API is the one I described above in the "what is the current bug behavior?" section.
However, I have also provided a minimal example for GNU/Linux that shows the double-close issue.
### Relevant logs and/or screenshots
The following is an excerpt from the tombstone generated by the crashing app:
```
[notice] Catching signal TERM, exiting cleanly.
fdsan: attempted to close file descriptor 104, expected to be unowned, \
actually owned by unique_fd 0x70b7e5f19c
[...]
ABI: 'arm64'
Timestamp: 2023-02-02 11:39:31.181519664+0100
Cmdline: org.openobservatory.ooniprobe.experimental
pid: 16472, tid: 16593, name: AsyncTask #1 >>> org.openobservatory.ooniprobe.experimental <<<
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
[...]
backtrace:
#00 pc 0000000000055c48 .../lib64/bionic/libc.so (fdsan_error(char const*, ...)+556) (...)
#01 pc 0000000000055954 .../lib64/bionic/libc.so (android_fdsan_close_with_tag+732) (...)
#02 pc 00000000000560a8 .../lib64/bionic/libc.so (close+16) (...)
#03 pc 00000000012ad08c [...]/lib/arm64/libgojni.so (tor_main_configuration_free+128)
```
If I apply the patch that above I called `tor.diff` and run OONI Probe on Android, I see this in the logcat:
```
SBSDEBUG: tor_main_configuration_setup_control_socket 94 98 // <- fds[0] and fds[1]
[...]
[notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 141
SBSDEBUG: connection_free_minimal 98 // <- first close of fds[1]
SBSDEBUG: connection_free_minimal 116
SBSDEBUG: connection_free_minimal 152
SBSDEBUG: tor_main_configuration_free 98 // <- second close of fds[1]
```
### Possible fixes
The following patch makes the app work as intended (i.e., no crashes for several runs):
```diff
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..2773949264 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -131,9 +131,13 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
}
raw_free(cfg->argv_owned);
}
+ /* See https://github.com/ooni/probe/issues/2405 to understand
+ why we're not closing the controller socker here. */
+ /*
if (SOCKET_OK(cfg->owning_controller_socket)) {
raw_closesocket(cfg->owning_controller_socket);
}
+ */
raw_free(cfg);
}
```
That said, I think this patch is wrong because it leaks the file descriptor when `tor_run_main` returns prematurely (e.g., when the command line flags are wrong). Because of this, I think the more robust fix would be to duplicate the file descriptor before registering it into the libevent loop, as I explained above.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40702Single Onion Service Rends become 7 hop after retry2023-02-09T16:22:00ZMike PerrySingle Onion Service Rends become 7 hop after retryIn `retry_service_rendezvous_point()`, if a rend connect fails for a non-anonymous rend, we promote it to a 7-hop slow rend for some reason.
This will impact non-anonymous onions who want performance, especially during the DoS.
David n...In `retry_service_rendezvous_point()`, if a rend connect fails for a non-anonymous rend, we promote it to a 7-hop slow rend for some reason.
This will impact non-anonymous onions who want performance, especially during the DoS.
David notes that this decision to fall back to full anonymous mode in the event of timeout or failure was explicitly written just in case a non-anonymous onion service was also behind a restrictive firewall, and that firewall was the thing that happened to cause a timeout. There is also a comment that explains this, believe it or not. Back then, decision making in C-Tor was a bit more...special.
I bet if we get funders who actually care about single onion performance, they would prefer that their single onions not randomly double in latency on a timeout or failure, just to support the case where some single onion out there might be behind a firewall that they don't know about. Such a funder might suggest that we provide some other option for people behind firewalls to use, instead of this madness.
But I look forward to more research.https://gitlab.torproject.org/tpo/core/tor/-/issues/29084Ensure circuit padding RTT estimate handes var cells/wide creates2023-06-15T10:31:27ZGeorge KadianakisEnsure circuit padding RTT estimate handes var cells/wide createsThe use_rtt_estimate field in the circuit padding machines lets machines offset the inter-packet delays by a middle-node estimated RTT value of packets that go from the middle to the exit/website.
We abort this measurement if we get two...The use_rtt_estimate field in the circuit padding machines lets machines offset the inter-packet delays by a middle-node estimated RTT value of packets that go from the middle to the exit/website.
We abort this measurement if we get two or more cells back-to-back in either direction, as this indicates that the half-duplex request/response circuit setup and RELAY_BEGIN sequence has finished.
However, if we switch to a multi-cell circuit handshake, then we will need to take that into account when measuring RTT.
If RELAY_EARLY is used only for the first cell of a multi-cell EXTEND2 payload,
then we can just count time between RELAY_EARLIES. But the proposal currently says MAY, but not MUST for this behavior.Mike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40277Circuit Padding attack scenario through circpad_global_max_padding_pct2023-06-09T13:26:55ZGhost UserCircuit Padding attack scenario through circpad_global_max_padding_pctHello *,
I am from [RadicallyOpenSecurity](https://www.radicallyopensecurity.com/) and currently performing a short review of the [Padding Machines for Tor](https://nlnet.nl/project/TorPaddingMachines/) NGI Zero PET project.
During dis...Hello *,
I am from [RadicallyOpenSecurity](https://www.radicallyopensecurity.com/) and currently performing a short review of the [Padding Machines for Tor](https://nlnet.nl/project/TorPaddingMachines/) NGI Zero PET project.
During discussion with the padding machine author Tobias Pulls (@pulls), I noticed a general attack scenario that could be of relevance for the padding framework implementation.
The general idea of the attack is that a malicious Tor client can, under some circumstances, repeatedly force the circuit padding logic of middle relays to reply with more padding bytes than non-padding bytes. After some time of sustained attacks, the target middle relay will develop a statistical circuit padding overhead percentage which is over the the network-wide `circpad_global_max_padding_pct` limit (see [section 3.5](https://gitweb.torproject.org/tor.git/tree/doc/HACKING/CircuitPaddingDevelopment.md?id=33380f6b2770a3c4d9e2a9cc8a4b18c71a40571b#n614) of the documentation), at which point it will stop sending padding on any of its circuits.
Once the attack stops, the ratio between sent padding and sent non-padding of the relay goes back to normal over time while the relay serves non-padded cell data to legitimate clients, so it reverts automatically to a safe state again where it has working padding machines.
However, there are several notable details:
* To our knowledge, the currently rolled out padding machines can not be used to trigger this problem. This is likely only a problem for future machines with high volume that defend against website fingerprinting attacks.
* Due to the event-based nature of client padding machines, the suppression of padding generation at the middle relay means that the `CIRCPAD_EVENT_PADDING_RECV` transition on the client is never taken. This results in the client not sending any padding, or performing only a very limited subset of the regular padding behavior (depending on the machine definition). This amplifies the practical impact of the missing padding from the relay.
* Since the relay behavior change affects all of its circuits at once and may happen several times over a short time-span, the resulting traffic anomalies may be a fingerprinting opportunity for adversaries that are performing traffic analysis on the network.
* According to @pulls, the consequences of this attack on the relay server would only be logged at `info` level and may go unnoticed by relay operators.
* There is a threshold mechanism at the individual circuit machine level through the `relay->max_padding_percent` setting. At first glance, this appears to mitigate the described attack scenario. However, note that this limit is not applied to the first `relay->allowed_padding_count` number of cells, so an attacker can circumvent this limitation by repeatedly triggering new padding machines. Since it appears to be beneficial for padding machines to have this initial level of freedom over padding for their effectiveness against fingerprinting, the attack is unlikely to be mitigated with restrictive machine limits without also impacting the usefulness of the machines themselves.
* The attack can also be performed in the other direction by a malicious middle relay against clients. In this case, it would disable/degrade padding on all of the client circuits. However, we regard this variant as less impactful than client->relay attacks, in part because this attack is harder to scale.
Some initial recommendations:
* Higher severity logging when exceeding `circpad_global_max_padding_pct`.
* The consensus-provided global limits could be enforced *per circuit* instead of truly globally. The root of the problem is shared state across circuits. This would still make the global padding limits useful, since in the future there might be multiple machines active throughout the lifetime of a circuit.
* Based on the information from @pulls , the `CircuitPaddingDisabled` (see issue [28693](https://gitlab.torproject.org/legacy/trac/-/issues/28693)) switch might be sufficient as an emergency fail-safe in case of network-wide padding problems.
I would like to thank @pulls for walking me through the various aspects of related network behavior and providing other helpful suggestions.Mike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/31653Padding cells sent with 0ms delay cause circuit failures2023-06-09T13:27:09ZpullsPadding cells sent with 0ms delay cause circuit failuresBelow appears to be a bug that results in a closed circuit due to a relay sending a padding cell (RELAY_COMMAND_DROP) to the client.
At links below you find code for two circuit padding machines:
1. circpad_machine_client_close_circuit_...Below appears to be a bug that results in a closed circuit due to a relay sending a padding cell (RELAY_COMMAND_DROP) to the client.
At links below you find code for two circuit padding machines:
1. circpad_machine_client_close_circuit_minimal
2. circpad_machine_relay_close_circuit_minimal
Machine 1 runs on a client on CIRCUIT_PURPOSE_C_GENERAL circuits with 3 hops as soon as CIRCPAD_CIRC_OPENED: the only thing it does is set a timer 1000s in the future and waits for the timer to expire. The purpose of the machine is to negotiate padding with the relay to activate Machine 2 on the middle relay.
Machine 2 runs at the middle relay and repeatedly sends a padding cell to the client 1 usec after the event CIRCPAD_EVENT_NONPADDING_SENT. In other words, for every non-padding cell we directly add a padding cell. At the client, this causes `relay_decrypt_cell(): Incoming cell at client not recognized. Closing.` for all circuits triggering Machine 2 at the relay. Closing a circuit by injecting padding cells seems unintended.
Here is part of a log from a client on info level showing how the machine registers, negotiates with the relay (starting Machine 2 at the relay), eventually fails to decrypt, and the circuit closes.
```
Sep 05 10:32:19.000 [info] circpad_setup_machine_on_circ(): Registering machine client_close_circuit_minimal to origin circ 3 (5)
Sep 05 10:32:19.000 [info] circpad_node_supports_padding(): Checking padding: supported
Sep 05 10:32:19.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 3 (5), command 2
Sep 05 10:32:19.000 [info] link_apconn_to_circ(): Looks like completed circuit to [scrubbed] does allow optimistic data for connection to [scrubbed]
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Sending relay cell 0 on circ 3296464733 to begin stream 20575.
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Address/port sent, ap socket 13, n_circ_id 3296464733
Sep 05 10:32:19.000 [info] rep_hist_note_used_port(): New port prediction added. Will continue predictive circ building for 2355 more seconds.
Sep 05 10:32:19.000 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.
Sep 05 10:32:19.000 [info] exit circ (length 3): $[manually-scrubbed](open) $[manually-scrubbed](open) $[manually-scrubbed](open)
Sep 05 10:32:19.000 [info] pathbias_count_use_attempt(): Used circuit 3 is already in path state use attempted. Circuit is a General-purpose client currently open.
Sep 05 10:32:19.000 [info] link_apconn_to_circ(): Looks like completed circuit to [scrubbed] does allow optimistic data for connection to [scrubbed]
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Sending relay cell 0 on circ 3296464733 to begin stream 20576.
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Address/port sent, ap socket 14, n_circ_id 3296464733
Sep 05 10:32:19.000 [info] circpad_deliver_recognized_relay_cell_events(): Got padding cell on origin circuit 3.
Sep 05 10:32:20.000 [info] relay_decrypt_cell(): Incoming cell at client not recognized. Closing.
Sep 05 10:32:20.000 [info] circuit_receive_relay_cell(): relay crypt failed. Dropping connection.
Sep 05 10:32:20.000 [info] command_process_relay_cell(): circuit_receive_relay_cell (backward) failed. Closing.
Sep 05 10:32:20.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 3 (23)
Sep 05 10:32:20.000 [info] circpad_node_supports_padding(): Checking padding: supported
Sep 05 10:32:20.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 3 (23), command 1
Sep 05 10:32:20.000 [info] pathbias_send_usable_probe(): Sending pathbias testing cell to 0.233.9.53:25 on stream 20577 for circ 3.
Sep 05 10:32:20.000 [info] relay_decrypt_cell(): Incoming cell at client not recognized. Closing.
Sep 05 10:32:20.000 [info] circuit_receive_relay_cell(): relay crypt failed. Dropping connection.
Sep 05 10:32:20.000 [info] command_process_relay_cell(): circuit_receive_relay_cell (backward) failed. Closing.
Sep 05 10:32:20.000 [info] pathbias_send_usable_probe(): Got pathbias probe request for circuit 3 with outstanding probe
Sep 05 10:32:20.000 [info] pathbias_check_close(): Circuit 3 closed without successful use for reason 2. Circuit purpose 23 currently 1,open. Len 3.
Sep 05 10:32:20.000 [info] circuit_mark_for_close_(): Circuit 3296464733 (id: 3) marked for close at src/core/or/command.c:582 (orig reason: 2, new reason: 0)
Sep 05 10:32:20.000 [info] connection_edge_destroy(): CircID 0: At an edge. Marking connection for close.
Sep 05 10:32:20.000 [info] connection_edge_destroy(): CircID 0: At an edge. Marking connection for close.
Sep 05 10:32:20.000 [info] circuit_free_(): Circuit 0 (id: 3) has been freed.
```
If we delay sending the padding from the relay I cannot trigger the bug (see commented out fix in the Machine 2 function). With the code below, the code triggers on every circuit that has the machine negotiated.
Code: https://github.com/pylls/tor/tree/circuit-padding-relay-padding-bug (https://github.com/pylls/tor/blob/circuit-padding-relay-padding-bug/src/core/or/circuitpadding_machines.c#L460, as well as circuitpadding_machines.h and registration in circpad_machines_init() of circuitpadding.c).Mike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40422[CircuitPadding] circpad_add_matching_machines() should be called when a cir...2023-06-09T13:26:45ZJaym[CircuitPadding] circpad_add_matching_machines() should be called when a circuit has opened.### Summary
The circuit padding framework supports negotiating padding upon various events. Among them, CIRCPAD_CIRC_OPENED states that a given padding machine should be applied to a circuit when a circuit has opened.
However, no code ...### Summary
The circuit padding framework supports negotiating padding upon various events. Among them, CIRCPAD_CIRC_OPENED states that a given padding machine should be applied to a circuit when a circuit has opened.
However, no code seems to trigger this mechanism. When a circuit has built, the function circpad_machine_event_circ_built() is called and checks whether some machine may be removed/added to the circuit. However, at this stage of the circuit building process, the circuit has built but is not marked as open yet.
### Bug
If some machine uses `client_machine->conditions.apply_state_mask = CIRCPAD_CIRC_OPENED;` the machine would only be applied when another event than a circ building/opening triggers the function circpad_add_matching_machines() (e.g., ap conn links a stream, or the circ purpose changes from general to something else).
### What is the expected behavior?
When circuituse.c calls circuit_has_opened(), it should also call the circpad module; e.g., a new function circpad_machine_event_circ_opened() that checks for adding machine to the circuit.
### Environment
Running a version forked from 0.4.5.7
### Relevant logs and/or screenshots
Contains some logs showing a call to circpad_machine_event_circ_built() while the circuit is still marked as building. Also contains custom logs:
```Jun 30 11:23:50.000 [info] circuit_finish_handshake(): Finished building circuit hop:
Jun 30 11:23:50.000 [info] internal (high-uptime) circ (length 3, last hop test000a): $22BA781A60C0CBB7FFAEA8858128427F67F60038(open) $7684DE04DCBB44538554E2CD1D14CDF836D5AF4D(open) $C7ADB1DBCE99F0B2ED2812B1953E4986EE9846DB(open)
Jun 30 11:23:50.000 [debug] dispatch_send_msg_unchecked(): Queued: ocirc_cevent (<gid=7 evtype=2 reason=0 onehop=0>) from or, on ocirc.
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering: ocirc_cevent (<gid=7 evtype=2 reason=0 onehop=0>) from or, on ocirc:
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering to btrack.
Jun 30 11:23:50.000 [debug] btc_cevent_rcvr(): CIRC gid=7 evtype=2 reason=0 onehop=0
Jun 30 11:23:50.000 [debug] circuit_build_times_add_time(): Adding circuit build time 43
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking condition state mask 21 vs condition: 2
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_event_circ_built(): Circpad module event circ built -- circ state: 0
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking condition state mask 21 vs condition: 2
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] invoke_plugin_operation_or_default(): Plugin found for caller calling a plugin in the circpad module when a circuit has built
Jun 30 11:23:50.000 [info] circpad_dropmark_activate_when_built(): Looks like the client_dropmark_def machine does not exist over this circuit
Jun 30 11:23:50.000 [debug] plugin_run(): Plugin execution returned -2147483648
Jun 30 11:23:50.000 [debug] plugin_run(): vm error message: (null)
Jun 30 11:23:50.000 [info] entry_guards_note_guard_success(): Recorded success for primary confirmed guard test002r ($22BA781A60C0CBB7FFAEA8858128427F67F60038)
Jun 30 11:23:50.000 [debug] dispatch_send_msg_unchecked(): Queued: ocirc_state (<gid=7 state=4 onehop=0>) from or, on ocirc.
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering: ocirc_state (<gid=7 state=4 onehop=0>) from or, on ocirc:
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering to btrack.
Jun 30 11:23:50.000 [debug] btc_state_rcvr(): CIRC gid=7 state=4 onehop=0
Jun 30 11:23:50.000 [info] circuit_build_no_more_hops(): circuit built!
Jun 30 11:23:50.000 [info] pathbias_count_build_success(): Got success count 3.000000/3.000000 for guard test002r ($22BA781A60C0CBB7FFAEA8858128427F67F60038)
Jun 30 11:23:50.000 [debug] circuit_has_opened(): calling circuit_has_opened()
```
### Possible fixes
Add a new function circpad_machine_event_circ_opened() called from circuituse.c when the circuit has opened.Tor: 0.4.8.x-freezeMike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40711Tor with new ipv6 only works after a restart2022-12-16T18:26:07ZGusTor with new ipv6 only works after a restartA relay operator reported that every day their relay changes the IPv4 and IPv6 addresses. But while Tor with IPv4 works automatically, for IPv6 connection, they need to restart the service.
```
Auto-discovered IPv6 address [1111:::ffff]...A relay operator reported that every day their relay changes the IPv4 and IPv6 addresses. But while Tor with IPv4 works automatically, for IPv6 connection, they need to restart the service.
```
Auto-discovered IPv6 address [1111:::ffff]:9001 has not been found reachable. However, IPv4 address is reachable. Publishing server descriptor without IPv6 address. [2 similar message(s) suppressed in last 2400 seconds])
```
#### Tor version
0.4.7.10.
#### torrc
```
SocksPort 0
Log notice file /var/log/tor/notices.log
DataDirectory /var/lib/tor
ControlPort xxxxx
HashedControlPassword xxxxx
ORPort 9001
Nickname justme
RelayBandwidthRate 250 KB # Throttle traffic to 100KB/s (800Kbps)
RelayBandwidthBurst 750 KB # But allow bursts up to 200KB/s (1600Kbps)
ContactInfo me@there.net
DirPort 9005
ExitRelay 0
ExitPolicy reject *:* # no exits allowed
```
#### Forum link
https://forum.torproject.net/t/ipv6-with-dynamic-prefix-behind-nat/5296https://gitlab.torproject.org/tpo/core/tor/-/issues/40715MetricsPort: inbound ORPort connections: relays vs. non-relay connections2023-09-22T23:50:13ZcypherpunksMetricsPort: inbound ORPort connections: relays vs. non-relay connectionsthis got previously submitted on 2022-10-24 https://gitlab.torproject.org/tpo/core/tor/-/issues/40194#note_2849481
but that issue got closed and asked for new specific tickets for each new metric:
From last week's relay meetup we know t...this got previously submitted on 2022-10-24 https://gitlab.torproject.org/tpo/core/tor/-/issues/40194#note_2849481
but that issue got closed and asked for new specific tickets for each new metric:
From last week's relay meetup we know that tor knows whether an incoming OR connection is from a client or from a relay without looking at the source IP address.
https://pad.riseup.net/p/tor-relay-op-meetup-o22-keep
From the metrics added in !625 (merged) we know, that the increased CPU load correlates with an increase in the rate of new inbound OR connections. This rate increases when CPU load increases on exits:
```
rate(tor_relay_connections{type="OR",state="created",direction="received"}[$__rate_interval])
```
Could you please add a label for OR connections coming from clients vs. OR connections coming from other relays?
This would allow us to confirm that exits get more new inbound connections from clients when CPU load increases.
that new label could be `src`:
```
tor_relay_connections_total{type="OR",state="created",direction="received",src="relay"}
tor_relay_connections_total{type="OR",state="created",direction="received",src="non-relay"}
tor_relay_connections{type="OR",state="opened",direction="received",src="relay"}
tor_relay_connections{type="OR",state="opened",direction="received",src="non-relay"}
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40716Impelement conflux for onion services2022-11-28T14:01:05ZMike PerryImpelement conflux for onion servicesConflux is traffic splitting, and will result in increased throughput and reduced latency for onion services after a connection has been established, by routing traffic over multiple paths, or via the lowest latency path to a service.
T...Conflux is traffic splitting, and will result in increased throughput and reduced latency for onion services after a connection has been established, by routing traffic over multiple paths, or via the lowest latency path to a service.
This ticket is for the onion service pieces of conflux (https://gitlab.torproject.org/tpo/core/tor/-/issues/40593).
We will not be implementing the onion services pieces of conflux as part of that ticket. It can be done later, if any onion service sponsors care about latency or throughput.
The pieces for onion services are:
- **Negotiation**
- [ ] Protover Advertisement for Onions (24h)
- [ ] Rend circuit linking (40h)
This is specified in https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/329-traffic-splitting.txt, but we probably want to allow onion services to configure their scheduler by manually choosing either BLEST, or LowRTT, since different kinds of onion services may want to optimize for either throughput or latency.
There may be some additional work wrt making sure linked edge conns work properly, if they are handled differently for the onion service case.
Also, some shadow validation and performance testing will be needed. Maybe 40h or so of dev time (though much longer wall-clock time).https://gitlab.torproject.org/tpo/core/tor/-/issues/40717Additional metricsport stats for various stages of onionservice handshake2023-12-07T14:41:35ZMike PerryAdditional metricsport stats for various stages of onionservice handshakeIf we export additional onion service metrics such as time measurements on the HSDIR, INTRO, and REND stages of circuit setup for both client and service side, and the number of timeouts/failures there, it would help to uncover the root ...If we export additional onion service metrics such as time measurements on the HSDIR, INTRO, and REND stages of circuit setup for both client and service side, and the number of timeouts/failures there, it would help to uncover the root cause of issues like https://gitlab.torproject.org/tpo/core/tor/-/issues/40570 and related reliability and connectivity issues with onion services.
We can also export congestion control info from https://gitlab.torproject.org/tpo/core/tor/-/issues/40708 to the onionservice metrics set, too, which can help us with tuning congestion control for onion services.
We can then hook up the onionperf onion service instances to our grafana dashboard, and gather more detailed stats that way, as a supplement to the metrics that get graphed on the metrics website.https://gitlab.torproject.org/tpo/core/tor/-/issues/40735[WARN] Tried connecting to router ... identity keys were not as expected2023-11-14T16:59:05Zcypherpunks[WARN] Tried connecting to router ... identity keys were not as expectedBackground: Tor Browser 12.0, Tor 4.7.12, Windows 7, vanilla bridges.
Repeatedly getting the following log line.
```
[WARN] Tried connecting to router at *address* ID=<none> RSA_ID=*FP1*, but RSA + ed25519 identity keys were not as exp...Background: Tor Browser 12.0, Tor 4.7.12, Windows 7, vanilla bridges.
Repeatedly getting the following log line.
```
[WARN] Tried connecting to router at *address* ID=<none> RSA_ID=*FP1*, but RSA + ed25519 identity keys were not as expected: wanted *FP1* + no ed25519 key but got *FP2* + *edFP*.
```
Ideas of what happened:
* MITM
* Bridge operator reinstalled it in-between me getting the bridge and now.
What is wrong:
* Bridge should be marked as unreachable: either it is not used already and connections are doomed to spend resources for nothing, or it should not be used as something is clearly wrong with it
* There should be a way to distinguish first idea from second - my best guess is building a tunneled directory connection to bridge authority and asking "Is there a bridge *FP2* and does it listen on *address*?"https://gitlab.torproject.org/tpo/core/tor/-/issues/40739[warn] Possible compression bomb; abandoning stream.2023-11-18T19:42:40Zcomputer_freak[warn] Possible compression bomb; abandoning stream.Relay with `Tor 0.4.7.13`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 86.59.21.38:80).
```
obfs4 Bridge with ...Relay with `Tor 0.4.7.13`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 86.59.21.38:80).
```
obfs4 Bridge with `Tor 0.4.7.13`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Error while uncompressing data: bad input?
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 95.214.53.221:443).
```
Relay with `Tor 0.4.8.0-alpha-dev`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 199.58.81.140:80).
```
There are no more logs on `notice` log level.
A user at the [forum](https://forum.torproject.net/t/compression-bomb-in-tor-logs/6226) has the same problem.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40746Conflicting logic about whether bridges need descriptors for fetching dir inf...2023-04-12T14:42:46ZRoger DingledineConflicting logic about whether bridges need descriptors for fetching dir info from themIf you start your Tor with a pile of configured bridges but nothing cached, your Tor will sample the configured bridges to pick its ordered list of primary entry guards, and launch descriptor fetches to each of them.
But if the descript...If you start your Tor with a pile of configured bridges but nothing cached, your Tor will sample the configured bridges to pick its ordered list of primary entry guards, and launch descriptor fetches to each of them.
But if the descriptor hasn't arrived yet, while trying to bootstrap dir info you get these confusing messages in your logs:
```
Jan 31 18:56:44.928 [notice] Ignoring directory request, since no bridge nodes are available yet.
```
Things do bootstrap eventually, but it takes longer than it should, and the pile of scary log messages is scary.
What's going on here?
The way the log message comes about is that directory_get_from_dirserver() calls
```
const node_t *node = guards_choose_dirguard(dir_purpose, &guard_state);
if (node && node->ri) {
[...]
} else {
[...]
log_notice(LD_DIR, "Ignoring directory request, since no bridge "
"nodes are available yet.");
}
```
i.e. guards_choose_dirguard had better return a bridge for which we have the descriptor, or we're going to log a complaint and abort the directory fetch attempt.
But in select_primary_guard_for_circuit(), we do
```
const int need_descriptor = (usage == GUARD_USAGE_TRAFFIC);
[...]
SMARTLIST_FOREACH_BEGIN(gs->primary_entry_guards, entry_guard_t *, guard) {
[...]
if (guard->is_reachable != GUARD_REACHABLE_NO) {
if (need_descriptor && !guard_has_descriptor(guard)) {
log_info(LD_GUARD, "Guard %s does not have a descriptor",
entry_guard_describe(guard));
continue;
}
```
That is, in select_primary_guard_for_circuit() we require that the bridge have a descriptor only for the GUARD_USAGE_TRAFFIC case, but then in directory_get_from_dirserver() we expect that the bridge will always have a descriptor, even in the GUARD_USAGE_DIRGUARD case.
In normal operation this bug isn't a big deal, because it is a race to finish fetching the descriptor before we happen to pick it for asking directory info. But with the #40578 fix, where we defer fetching the descriptor if we won't use the bridge for the GUARD_USAGE_TRAFFIC case, the bug becomes more obvious.
I believe the fix is simply to always need_descriptor in select_primary_guard_for_circuit() -- meaning when we're going to launch a directory fetch we always choose among our primary guards who have descriptors already.https://gitlab.torproject.org/tpo/core/tor/-/issues/40774libtor.a: pubsub_install tor_raw_abort2024-03-20T17:17:22Zsbslibtor.a: pubsub_install tor_raw_abort### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating)...### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating), this issue occurred 526 times in the last 28 days and was one of the main sources of crashes for the OONI Probe Android app.
A typical stack trace obtained from the Google Play console looks like this:
```
backtrace:
#00 pc 0x0000000000089b0c .../lib64/bionic/libc.so (abort+164)
#01 pc 0x00000000013778a4 .../split_config.arm64_v8a.apk (tor_raw_abort_+12)
#02 pc 0x0000000001382150 .../split_config.arm64_v8a.apk (tor_abort_+12)
#03 pc 0x00000000012470a0 .../split_config.arm64_v8a.apk (pubsub_install+120)
#04 pc 0x0000000001247170 .../split_config.arm64_v8a.apk (tor_run_main+136)
```
We investigated this issue and manage to reproduce it initially on OONI Probe Android, then in Linux using our Go code for managing libtor.a, and finally with a pure C test case working under Linux. During this investigating we have never seen the first bootstrap failing. Rather, in some cases it took > 30 repeated bootstraps to observe the abort; in other cases, it occurred within the first 3-10 bootstraps.
I searched in the issue tracker for "pubsub", "pubsub_install", "SIGABRT", and "abort". AFAICT, there is no other open issue discussing this problem, however, I think https://gitlab.torproject.org/tpo/core/tor/-/issues/32729 may be related and ~similar.
### Steps to reproduce:
The following steps allowed me to reproduce the problem on Ubuntu 22.04.2:
1. `git clone https://gitlab.torproject.org/tpo/core/tor`
2. `cd tor`
3. `git checkout tor-0.4.7.13`
4. `git apply 004.diff` where `004.diff` is
```diff
diff --git a/src/lib/pubsub/pubsub_check.c b/src/lib/pubsub/pubsub_check.c
index 99e604d715..a5cc4b7658 100644
--- a/src/lib/pubsub/pubsub_check.c
+++ b/src/lib/pubsub/pubsub_check.c
@@ -25,6 +25,7 @@
#include "lib/malloc/malloc.h"
#include "lib/string/compat_string.h"
+#include <stdio.h>
#include <string.h>
static void pubsub_adjmap_add(pubsub_adjmap_t *map,
@@ -343,21 +344,27 @@ lint_message(const pubsub_adjmap_t *map, message_id_t msg)
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has subscribers, but no publishers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_pub == 0 for %s\n", get_message_id_name(msg));
ok = false;
} else if (n_sub == 0) {
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has publishers, but no subscribers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_sub == 0 for %s\n", get_message_id_name(msg));
ok = false;
}
/* Check the message graph topology. */
- if (lint_message_graph(map, msg, pub, sub) < 0)
+ if (lint_message_graph(map, msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_graph failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
/* Check whether the messages have the same fields set on them. */
- if (lint_message_consistency(msg, pub, sub) < 0)
+ if (lint_message_consistency(msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_consistency failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
if (!ok) {
/* There was a problem -- let's log all the publishers and subscribers on
@@ -385,6 +392,7 @@ pubsub_adjmap_check(const pubsub_adjmap_t *map)
bool all_ok = true;
for (unsigned i = 0; i < map->n_msgs; ++i) {
if (lint_message(map, i) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message failed for %u %s\n", i, get_message_id_name((message_id_t)i));
all_ok = false;
}
}
@@ -401,11 +409,15 @@ pubsub_builder_check(pubsub_builder_t *builder)
pubsub_adjmap_t *map = pubsub_build_adjacency_map(builder->items);
int rv = -1;
- if (!map)
+ if (!map) {
+ fprintf(stderr, "SBSDEBUG: pubsub_build_adjacency_map failed\n");
goto err; // should be impossible
+ }
- if (pubsub_adjmap_check(map) < 0)
+ if (pubsub_adjmap_check(map) < 0) {
+ fprintf(stderr, "SBSDEBUG: pubsub_adjmap_check failed\n");
goto err;
+ }
rv = 0;
err:
```
5. `./autogen.sh`
6. `./configure --disable-asciidoc`
7. `make`
8. `mkdir tmp`
9. `vi tmp/main.c` where `main.c` contains
```C
#include "../src/feature/api/tor_api.h"
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
static void *threadMain(void *ptr) {
int *fdp = (int*)ptr;
(void)sleep(45 /* seconds */);
(void)close(*fdp);
free(fdp);
return NULL;
}
int main() {
for (;;) {
tor_main_configuration_t *config = tor_main_configuration_new();
if (config == NULL) {
exit(1);
}
char *argv[] = {
"tor",
"Log",
"notice stderr",
"DataDirectory",
"./x",
NULL,
};
int argc = 5;
if (tor_main_configuration_set_command_line(config, argc, argv) != 0) {
exit(2);
}
int filedesc = tor_main_configuration_setup_control_socket(config);
if (filedesc < 0) {
exit(3);
}
int *fdp = malloc(sizeof(*fdp));
if (fdp == NULL) {
exit(4);
}
*fdp = filedesc;
pthread_t thread;
if (pthread_create(&thread, NULL, threadMain, /* move */ fdp) != 0) {
exit(5);
}
(void)tor_run_main(config);
if (pthread_join(thread, NULL) != 0) {
exit(6);
}
fprintf(stderr, "********** doing another round\n");
}
}
```
10. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
11. `./a.out 2>&1|tee LOG.txt`
The `tmp/main.c` command is a reasonable approximation of what our Go code for running tor does. The main difference is that we start tor with `DisableNetwork` set and re-enable network later. This difference does not seem to have any impact, since we saw aborts in both cases.
We run repeated bootstraps because the OONI Probe Android app loads tor and the Go code as a shared library and calls `tor_run_main` each time we run a OONI experiment that requires tor (typically, `vanilla_tor` and `torsf`).
### What is the current bug behavior?
We can cluster the kind of crashes we observed into two groups.
#### pubsub_adjmap_check failed
This crash has been the most frequent one we observed. With the above patch applied, it generally looks like this:
```
[... omitting logs from several bootstraps ...]
Mar 22 14:07:21.000 [notice] Owning controller connection has closed -- exiting now.
Mar 22 14:07:21.000 [notice] Catching signal TERM, exiting cleanly.
********** doing another round
SBSDEBUG: n_sub == 0 for orconn_state
SBSDEBUG: lint_message failed for 5 orconn_state
SBSDEBUG: n_pub == 0 for orconn_state
SBSDEBUG: lint_message failed for 34 orconn_state
SBSDEBUG: pubsub_adjmap_check failed
[1] 300227 IOT instruction (core dumped) ./a.out 2>&1 |
300228 done tee LOG.txt
```
When running this via Go code, we see a different message before the abort. I think this happens because Go installs its own handler for SIGABRT, while the C code does not install any handler. My understanding is also that "IOT instruction" is related to `SIGIOT`, which seems to be an alias for `SIGABRT` judging from include/linux/signal.h and Glib's bits/signum-generic.h.
My understanding of the above logs is that, somehow, a message is registered twice: once without publishers, and once without subscribers.
It's also important to point out that the message causing failure has not always been `orconn_state`. Based on all the aborts we have examined, it seems that also `orconn_status` could cause failures. For the sake of brevity, I am not going to copy here all the logs we collected, but you can read them along with my thought process when analyzing the bug at https://github.com/ooni/probe/issues/2406.
#### INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
This specific error occurred very rarely (2-3 times). It is not clear whether this is the same issue or not, however I think it makes sense to mention it in the same issue, because it occurred when using the above code to investigate pubsub_install aborts.
```
2023/03/21 17:59:13 info tunnel: tor: exec: <internal/libtor> x/tunnel/torsf/tor [...]
BUG: subsystem btrack (at 55) could not connect to publish/subscribe system.
============================================================ T= 1679421553
INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
A subsystem couldn't be connected.
./testtorsf(dump_stack_symbols_to_error_fds+0x58)[0xe6df08]
./testtorsf(tor_raw_assertion_failed_msg_+0x97)[0xe6e8d7]
./testtorsf(subsystems_add_pubsub_upto+0x128)[0xe47df8]
./testtorsf(pubsub_install+0x29)[0xdf9c99]
./testtorsf(tor_run_main+0x8a)[0xdf9e2a]
./testtorsf(_cgo_2d785783cadf_Cfunc_tor_run_main+0x1b)[0xdf665b]
./testtorsf[0x500e04]
SIGABRT: abort
PC=0x7fa00f89aa7c m=14 sigcode=18446744073709551610
signal arrived during cgo execution
```
(Because this specific error occurred when using Go code, here you see also the output of Go `SIGABRT` handler.)
The specific assertion that fails in this case is the following:
```C
int
subsystems_add_pubsub_upto(pubsub_builder_t *builder,
int target_level)
{
for (unsigned i = 0; i < n_tor_subsystems; ++i) {
const subsys_fns_t *sys = tor_subsystems[i];
if (!sys->supported)
continue;
if (sys->level > target_level)
break;
if (! sys_status[i].initialized)
continue;
int r = 0;
if (sys->add_pubsub) {
subsys_id_t sysid = get_subsys_id(sys->name);
raw_assert(sysid != ERROR_ID);
pubsub_connector_t *connector;
connector = pubsub_connector_for_subsystem(builder, sysid);
r = sys->add_pubsub(connector);
pubsub_connector_free(connector);
}
if (r < 0) {
fprintf(stderr, "BUG: subsystem %s (at %u) could not connect to "
"publish/subscribe system.", sys->name, sys->level);
raw_assert_unreached_msg("A subsystem couldn't be connected."); // <- HERE
}
}
return 0;
}
```
### What is the expected behavior?
On a very broad level, I think tor should not abort. Because I do not understand very well what is happening, it is difficult to provide a more specific recommendation about what the code should actually do.
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
Always 0.4.7.13
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
Android (several versions and devices according to the Google Play console); Android 13 on Pixel 4a arm64 (my phone); Ubuntu 22.04.2 on amd64
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
Tor compiled along with all its dependencies using our build scripts as well as tor compiled from sources with Ubuntu 22.04.2 installation dependencies when reproducing the issue using the above mentioned steps.
### Relevant logs and/or screenshots
I think I already provided representative logs above. The https://github.com/ooni/probe/issues/2406 issue contains all the logs we produced while investigating this issue on our end. It also describes how we progressively narrowed down the problem from an abort in the Android app to an abort using Go code on Linux to the minimal instructions for reproducing the issue that I mentioned above.
On this note, I initially suspected that there was a data race on our end. That assumption was true but the abort continued to occur after I fixed the data race inside Go code. In any case, the possible presence of data races on our end prompted me to bypass our Go code and write C code that could allow reproducing the issue. In one of my final attempts at understanding the issue using just C code, I [patched tor to avoid aborting in case pubsub_install failed](https://github.com/ooni/probe/issues/2406#issuecomment-1479884981), recompiled and run with tsan enabled, [seeing just two pubsub_install failures over 490 runs and no sign of data races](https://github.com/ooni/probe/issues/2406#issuecomment-1480826748).
### Possible fixes
I don't know. Since the data-race theory is not supported by data and unlikely, perhaps it could be that state from previous runs causes issues with the pubsub subsystem that appear for repeated bootstraps? I'll be happy to collaborate and try other debugging strategies.https://gitlab.torproject.org/tpo/core/tor/-/issues/40836Update recommended/required protocol lists?2024-01-16T15:38:37ZNick MathewsonUpdate recommended/required protocol lists?We haven't updated the recommended/required protocol lists since 2021, possibly longer. If we mark more protocols as required or recommended, we can more correctly reason about the network.
The protocols that are unconditionally suppor...We haven't updated the recommended/required protocol lists since 2021, possibly longer. If we mark more protocols as required or recommended, we can more correctly reason about the network.
The protocols that are unconditionally supported by 0.4.7.7 (our oldest supported stable) are:
* `Cons=1-2 Desc=1-2 DirCache=2 FlowCtrl=1-2 HSDir=2 HSIntro=4-5 HSRend=1-2 Link=1-5 LinkAuth=3 Microdesc=1-2 Padding=2 Relay=1-4`
The protocols that are unconditionally supported by the most recent Arti are not currently listed anywhere or enforced in Arti. :disappointed: So maybe we should take care of that first?
The protocols that the consensus currently recommends are:
* `recommended-client-protocols Cons=2 Desc=2 DirCache=2 HSDir=2 HSIntro=4 HSRend=2 Link
=4-5 Microdesc=2 Relay=2`
* `recommended-relay-protocols Cons=2 Desc=2 DirCache=2 HSDir=2 HSIntro=4 HSRend=2 Link=4-5 LinkAuth=3 Microdesc=2 Relay=2`
The protocols that the consensus currently requires are:
* `required-client-protocols Cons=2 Desc=2 Link=4 Microdesc=2 Relay=2`
* `required-relay-protocols Cons=2 Desc=2 DirCache=2 HSDir=2 HSIntro=4 HSRend=2 Link=4-5 LinkAuth=3 Microdesc=2 Relay=2`
cc @dgoulet @mikeperryNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40802Dir auths say "Failed to find node for hop #2 of our path. Discarding this ci...2024-01-31T23:49:46ZRoger DingledineDir auths say "Failed to find node for hop #2 of our path. Discarding this circuit." every second after boot until new consensusStarting somewhere in Tor 0.4.7, every directory authority now prints thousands of lines of
```
Jun 01 14:51:33.790 [notice] Failed to find node for hop #2 of our path. Discarding this circuit.
```
on startup. It continues until the top ...Starting somewhere in Tor 0.4.7, every directory authority now prints thousands of lines of
```
Jun 01 14:51:33.790 [notice] Failed to find node for hop #2 of our path. Discarding this circuit.
```
on startup. It continues until the top of the hour when
```
Jun 01 14:59:59.942 [notice] Failed to find node for hop #2 of our path. Discarding this circuit.
Jun 01 15:00:00.017 [notice] Time to publish the consensus and discard old votes
Jun 01 15:00:00.162 [notice] Published ns consensus
Jun 01 15:00:00.315 [notice] Published microdesc consensus
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40803Cannot write to ClientOnionAuthDir when Sandbox is enabled2023-06-05T16:46:55ZanonymCannot write to ClientOnionAuthDir when Sandbox is enabled### Summary
When `tor` has the sandbox option enabled it cannot write to the `ClientOnionAuthDir` directory to store onion auth keys, e.g. when checking the "Remember this key" checkbox in Tor Browser when providing the key.
### Steps ...### Summary
When `tor` has the sandbox option enabled it cannot write to the `ClientOnionAuthDir` directory to store onion auth keys, e.g. when checking the "Remember this key" checkbox in Tor Browser when providing the key.
### Steps to reproduce:
1. Configure `tor` with `Sandbox 1`
2. Configure `tor` with `ClientOnionAuthDir /some/writable/directory`
3. Use Tor Browser to access an onion service with onion authentication
4. Check the "Remember this key" checkbox when providing the key
### What is the current bug behavior?
The onion auth prompt in Tor Browser reports "Unable to store creds for ...", and no key is written to the `ClientOnionAuthDir` directory.
### What is the expected behavior?
No errors, and the key should be written to the `ClientOnionAuthDir` directory.
### Environment
- Tor version 0.4.7.13
- Tested both on Debian Sid and inside Tails with `tor` installed via `apt`
### Relevant logs and/or screenshots
```
Jun 02 13:04:02.000 [warn] sandbox_intern_string(): Bug: No interned sandbox parameter found for /var/lib/tor/keys/n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd.auth_private.tmp (on Tor 0.4.7.13 )
Jun 02 13:00:25.000 [warn] Couldn't open "/var/lib/tor/keys/n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd.auth_private.tmp" (/var/lib/tor/keys/n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd.auth_private) for writing: Operation not permitted
Jun 02 13:00:25.000 [warn] Failed to write client auth creds file for n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd!
```
### Possible fixes
Update the sandbox rules.Tor: 0.4.9.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40809Bring padding machines into sync with Tobias's latest changes2023-06-15T10:31:28ZMike PerryBring padding machines into sync with Tobias's latest changesTobias has added some changes to the padding machines in his latest research: in particular, the padding machine can respond to queue length/congestion signals. I believe there are now also probabilistic transitions (ie https://gitlab.to...Tobias has added some changes to the padding machines in his latest research: in particular, the padding machine can respond to queue length/congestion signals. I believe there are now also probabilistic transitions (ie https://gitlab.torproject.org/tpo/core/tor/-/issues/31636 or https://gitlab.torproject.org/tpo/core/tor/-/issues/31787).
I need to read his latest paper, sync with him, and discuss these things. This will generate new tickets.Mike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40822[warn] Possible replay detected! An INTRODUCE2 cell with the same ENCRYPTED s...2023-08-28T12:28:35ZTimeIsGold[warn] Possible replay detected! An INTRODUCE2 cell with the same ENCRYPTED section was seen 0 seconds ago. Dropping cell.I am running HS using Tor 0.4.7.13 Running on Windows Server and i am using Obfs4 Bridges for my nodes.
i got this messages every day and i don't know what is the problem
`[warn] Possible replay detected! An INTRODUCE2 cell with the sam...I am running HS using Tor 0.4.7.13 Running on Windows Server and i am using Obfs4 Bridges for my nodes.
i got this messages every day and i don't know what is the problem
`[warn] Possible replay detected! An INTRODUCE2 cell with the same ENCRYPTED section was seen 0 seconds ago. Dropping cell.`
`[notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 1000 buildtimes.`https://gitlab.torproject.org/tpo/core/tor/-/issues/40831null pointer dereference if threadpool initialization fails2023-10-08T15:25:20ZAlex Xunull pointer dereference if threadpool initialization fails```
In function 'threadpool_register_reply_event',
inlined from 'cpuworker_init' at src/core/mainloop/cpuworker.c:140:13,
inlined from 'run_tor_main_loop' at src/app/main/main.c:1230:3,
inlined from 'tor_run_main' at src/app/...```
In function 'threadpool_register_reply_event',
inlined from 'cpuworker_init' at src/core/mainloop/cpuworker.c:140:13,
inlined from 'run_tor_main_loop' at src/app/main/main.c:1230:3,
inlined from 'tor_run_main' at src/app/main/main.c:1359:14,
inlined from 'tor_main' at src/feature/api/tor_api.c:166:12,
inlined from 'main' at src/app/main/tor_main.c:32:7:
src/lib/evloop/workqueue.c:631:9: warning: potential null pointer dereference [-Wnull-dereference]
631 | if (tp->reply_event) {
| ^
```
if `threadpool_new` fails, then `tp` will be null. `spawn_func` should not normally fail, and furthermore the result will most likely be a non-exploitable segmentation fault, but it is still technically undefined behavior and should be fixed.Tor: 0.4.9.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40840Prevent outbound cell command flipping2024-02-13T17:00:47ZMike PerryPrevent outbound cell command flippingAs per https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/344-protocol-info-leaks.txt#L197, the RELAY_EARLY fix did not address the outbound direction.
We can fix this by checking at relays that the cell command field ...As per https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/344-protocol-info-leaks.txt#L197, the RELAY_EARLY fix did not address the outbound direction.
We can fix this by checking at relays that the cell command field does not switch back and forth between RELAY and RELAY_EARLY. Then, so long as the middle relay is honest, this vector cannot be used as a covert channel between the Guard and the Exit.
This fix should be relatively simple and can be backported, though we should of course test it in shadow.Mike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40847tor 0.4.8.4: compilation error on SunOS / OpenIndiana2023-09-06T00:57:05Zsvschmeltor 0.4.8.4: compilation error on SunOS / OpenIndianaCompiling tor 0.4.8.4 throws the following error.
(Similar to https://gitlab.torproject.org/tpo/core/tor/-/issues/40843 => compilation error on NetBSD)
```
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equ...Compiling tor 0.4.8.4 throws the following error.
(Similar to https://gitlab.torproject.org/tpo/core/tor/-/issues/40843 => compilation error on NetBSD)
```
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c: In function 'hashx_vm_alloc_huge':
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c:113:5: error: 'MAP_HUGETLB' undeclared (first use in this function)
113 | | MAP_HUGETLB | MAP_POPULATE, -1, 0);
| ^~~~~~~~~~~
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c:113:5: note: each undeclared identifier is reported only once for each function it appears in
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c:113:19: error: 'MAP_POPULATE' undeclared (first use in this function); did you mean 'MAP_PRIVATE'?
113 | | MAP_HUGETLB | MAP_POPULATE, -1, 0);
| ^~~~~~~~~~~~
| MAP_PRIVATE
make[2]: *** [Makefile:16523: src/ext/equix/hashx/src/libhashx_a-virtual_memory.o] Error 1
make[2]: Leaving directory '/export/home/svschmel/oi-userland/components/network/tor/build/amd64'
make[1]: *** [Makefile:7648: all] Error 2
```
These defines do not exist on SunOS / OpenIndiana
=> https://www.illumos.org/man/2/mmap
I suggest handling SunOS like OpenBSD in file "virtual_memory.c" because the default => the "#else"- tree <= assumes that the underlying OS is a Linux-system.
```
...
#elif defined(__OpenBSD__)
mem = MAP_FAILED; // OpenBSD does not support huge pages
#else
mem = mmap(NULL, bytes, PAGE_READWRITE, MAP_PRIVATE | MAP_ANONYMOUS
| MAP_HUGETLB | MAP_POPULATE, -1, 0);
...
```
With this modification the code can be compiled on SunOS / Openindiana successful.https://gitlab.torproject.org/tpo/core/tor/-/issues/40901Document for the Relay Operator community how to debug relays that are slower...2023-12-19T07:53:56ZAlexander Færøyahf@torproject.orgDocument for the Relay Operator community how to debug relays that are slower than what the operator expectsThis idea origins from a conversation betweeh @beth, @gk and I on #tor-dev today.
We often release new features of C Tor to the relay operators that causes discussions/conversations around whether Tor has gotten faster/slower/uses (more...This idea origins from a conversation betweeh @beth, @gk and I on #tor-dev today.
We often release new features of C Tor to the relay operators that causes discussions/conversations around whether Tor has gotten faster/slower/uses (more|less) memory/crashes (more|less) often/etc. many of these items are hard to give a definitive "yes, the cause of this is X" and it's very time consuming for the Network Team to debug each item individually with the operator.
It would be very useful to have a document in place that informs relay operators about the different situations that may impact performance and how they can get some performance measurements they can then compare to and see if our performance have truly regressed. This can also be used to push MetricsPort to more operators.
We can expand upon the document over time as we discover new ways to do this analysis and/or from feedback from the relay operator community.
This is related to:
- https://lists.torproject.org/pipermail/tor-relays/2023-December/021409.html
- https://lists.torproject.org/pipermail/tor-relays/2023-December/021407.html
This may be relevant to Arti Relay too.
CC @mikeperry for awareness.https://gitlab.torproject.org/tpo/core/tor/-/issues/40907Starting tor with --DormantOnFirstStartup doesn't set GETINFO dormant to 12024-01-24T19:08:12ZCrazyChaozStarting tor with --DormantOnFirstStartup doesn't set GETINFO dormant to 1### Summary
Starting tor with --DormantOnFirstStartup 1 on a new DataDirectory doesn't set GETINFO dormant to 1
### Steps to reproduce:
1. start tor with ```tor --DormantOnFirstStartup 1 --DataDirectory some-empty-dir```
2. do a ```GE...### Summary
Starting tor with --DormantOnFirstStartup 1 on a new DataDirectory doesn't set GETINFO dormant to 1
### Steps to reproduce:
1. start tor with ```tor --DormantOnFirstStartup 1 --DataDirectory some-empty-dir```
2. do a ```GETINFO dormant```
### What is the current bug behavior?
GETINFO dormant returns 0, as if it weren't in dormant mode
### What is the expected behavior?
GETINFO dormant returns 1, as it is in dormant mode
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
- 0.4.6.10 and 0.4.5.10
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
- Pop!_OS 22.04 and Ubuntu 18.04.6
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
- apt and ?Tor: 0.4.9.x-freeze