Tor issueshttps://gitlab.torproject.org/tpo/core/tor/-/issues2023-04-12T14:48:13Zhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40777Amend greeting message with pointer to weather.torproject.org2023-04-12T14:48:13ZGeorg KoppenAmend greeting message with pointer to weather.torproject.orgIn `log_new_relay_greeting` we have a welcome message pointing to our lifecycle document:
```
tor_log(LOG_NOTICE, LD_GENERAL, "You are running a new relay. "
"Thanks for helping the Tor network! If you wish to know "
...In `log_new_relay_greeting` we have a welcome message pointing to our lifecycle document:
```
tor_log(LOG_NOTICE, LD_GENERAL, "You are running a new relay. "
"Thanks for helping the Tor network! If you wish to know "
"what will happen in the upcoming weeks regarding its usage, "
"have a look at https://blog.torproject.org/lifecycle-of-a"
"-new-relay");
```
We should amend that message and pointing to our new Tor Weather service at weather.tpo as well offering operators an easy way to get notified e.g. in case their relay goes down.https://gitlab.torproject.org/tpo/core/tor/-/issues/40776Unable to find IPv4 address for ORPort2023-04-12T14:48:03ZmfilipeUnable to find IPv4 address for ORPort### Summary
Using a Tor Relay behind a NAT with dynamic IP address, the relay isn't able to get the IP.
The network only accepts IPv4.
### Steps to reproduce:
1. Configure router to NAT 9001 into Tor Relay;
2. Define `ORPort 9001 I...### Summary
Using a Tor Relay behind a NAT with dynamic IP address, the relay isn't able to get the IP.
The network only accepts IPv4.
### Steps to reproduce:
1. Configure router to NAT 9001 into Tor Relay;
2. Define `ORPort 9001 IPv4Only` in `torrc`;
3. Run the relay and wait until `journald` receives `Unable to find IPv4 address for ORPort`.
### What is the current bug behavior?
The relay isn't able to identify which public IPv4 it should use, so it isn't joining into the consensus and receiving flags. It is being ignored by the Tor network.
### What is the expected behavior?
Identify its IPv4 and be part of the Tor network contributing with bandwidth.
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
0.4.7.13
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
Ubuntu 22.04.2 LTS @ Raspberry Pi 3 Model B Plus.
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
Apt packages from https://deb.torproject.org
### Relevant logs and/or screenshots
[torrc](/uploads/e8fb1069bf46a8bba85f320e69123915/torrc)
debug.log
```
Mar 25 17:10:42.000 [info] update_consensus_networkstatus_downloads(): Launching microdesc standard networkstatus consensus download.
Mar 25 17:10:42.000 [debug] compute_weighted_bandwidths(): Generated weighted bandwidths for rule weight as directory based on weights Wg=0.412700 Wm=1.000000 We=0.000000
Wd=0.000000 with total bw 35527529420.000000
Mar 25 17:10:42.000 [debug] directory_initiate_request(): anonymized 0, use_begindir 1.
Mar 25 17:10:42.000 [debug] directory_initiate_request(): Initiating consensus network-status fetch
Mar 25 17:10:42.000 [info] connection_ap_make_link(): Making internal direct tunnel to [scrubbed]:9001 ...
Mar 25 17:10:42.000 [debug] connection_add_impl(): new conn type Socks, socket -1, address (Tor_internal), n_conns 143.
Mar 25 17:10:42.000 [info] connection_ap_make_link(): ... application connection created and linked.
Mar 25 17:10:42.000 [debug] connection_add_impl(): new conn type Directory, socket -1, address 185.21.216.197, n_conns 144.
Mar 25 17:10:42.000 [info] directory_send_command(): Downloading consensus from 185.21.216.197:9001 using /tor/status-vote/current/consensus-microdesc/0232AF+14C131+23D15
D+27102B+49015F+E8A9C4+ED03BB+F533C8.z
Mar 25 17:10:42.000 [debug] directory_send_command(): Sent request to directory server 185.21.216.197:9001 (purpose: 14, request size: 350, payload size: 0)
Mar 25 17:10:42.000 [info] update_consensus_router_descriptor_downloads(): 0 router descriptors downloadable. 0 delayed; 6527 present (0 of those were in old_routers); 0
would_reject; 0 wouldnt_use; 0 in progress.
Mar 25 17:10:42.000 [debug] tor_rename(): Renaming /var/lib/tor/state.tmp to /var/lib/tor/state
Mar 25 17:10:42.000 [info] or_state_save(): Saved state to "/var/lib/tor/state"
Mar 25 17:10:42.000 [debug] prune_old_routers_callback(): Pruning routerlist...
Mar 25 17:10:42.000 [info] routerlist_remove_old_routers(): We have 6527 live routers and 13007 old router descriptors.
Mar 25 17:10:42.000 [debug] consdiffmgr_cleanup(): Looking for consdiffmgr entries to remove
Mar 25 17:10:42.000 [info] channel_check_for_duplicates(): Performed connection pruning. Found 3 connections to 33 relays. Found 3 current canonical connections, in 3 of
which we were a non-canonical peer. 0 relays had more than 1 connection, 0 had more than 2, and 0 had more than 4 connections.
Mar 25 17:10:42.000 [info] router_rebuild_descriptor(): Rebuilding relay descriptor
Mar 25 17:10:42.000 [debug] get_address_from_config(): Attempting to get address from configuration
Mar 25 17:10:42.000 [info] get_address_from_config(): No Address option found in configuration.
Mar 25 17:10:42.000 [debug] get_address_from_orport(): Attempting to get address from ORPort
Mar 25 17:10:42.000 [info] address_can_be_used(): Address '0.0.0.0' is a private IP address. Tor relays that use the default DirAuthorities must have public IP addresses.
Mar 25 17:10:42.000 [debug] get_address_from_interface(): Attempting to get address from network interface
Mar 25 17:10:42.000 [debug] get_interface_address6(): Found internal interface address '192.168.13.3'
Mar 25 17:10:42.000 [info] address_can_be_used(): Address '192.168.13.3' is a private IP address. Tor relays that use the default DirAuthorities must have public IP addre
sses.
Mar 25 17:10:42.000 [debug] get_address_from_hostname(): Attempting to get address from local hostname
Mar 25 17:10:42.000 [info] address_can_be_used(): Address '127.0.1.1' is a private IP address. Tor relays that use the default DirAuthorities must have public IP addresse
s.
Mar 25 17:10:42.000 [info] find_my_address(): Unable to find our IP address.
Mar 25 17:10:42.000 [notice] Unable to find IPv4 address for ORPort 9001. You might want to specify IPv6Only to it or set an explicit address or set Address.
Mar 25 17:10:42.000 [info] router_build_fresh_unsigned_routerinfo(): Don't know my address while generating descriptor. Launching circuit to authority to learn it.
Mar 25 17:10:42.000 [warn] tor_bug_occurred_(): Bug: ../src/feature/relay/relay_find_addr.c:225: relay_addr_learn_from_dirauth: Non-fatal assertion !(!ei) failed. (on Tor
0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: Tor 0.4.7.13: Non-fatal assertion !(!ei) failed in relay_addr_learn_from_dirauth at ../src/feature/relay/relay_find_addr.c:225. Stack trac
e: (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(log_backtrace_impl+0x6c) [0xaaaad427e72c] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(tor_bug_occurred_+0x164) [0xaaaad4296bb4] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(relay_addr_learn_from_dirauth+0x1f0) [0xaaaad43d6964] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(+0x27d78c) [0xaaaad440d78c] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(router_build_fresh_descriptor+0x44) [0xaaaad423a0f4] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(router_rebuild_descriptor+0x94) [0xaaaad423a604] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(consider_publishable_server+0x44) [0xaaaad423aa84] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(+0x245b6c) [0xaaaad43d5b6c] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(+0x8832c) [0xaaaad421832c] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /lib/aarch64-linux-gnu/libevent-2.1.so.7(+0x1fd04) [0xffff95dbfd04] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /lib/aarch64-linux-gnu/libevent-2.1.so.7(event_base_loop+0x444) [0xffff95dc1868] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(do_main_loop+0xfc) [0xaaaad41fb51c] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(tor_run_main+0x1e0) [0xaaaad41ff4a0] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(tor_main+0x54) [0xaaaad41ff8e4] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(main+0x20) [0xaaaad41f2320] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /lib/aarch64-linux-gnu/libc.so.6(+0x273fc) [0xffff955573fc] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0xffff955574cc] (on Tor 0.4.7.13 )
Mar 25 17:10:42.000 [warn] Bug: /usr/bin/tor(_start+0x30) [0xaaaad41f23b0] (on Tor 0.4.7.13 )
```
### Possible fixes
Set `Address` with a host that points to my dynamic IP address but as far as I know when the relay is configured with `Address` and the public IP changes, the relays isn't able to get this change.https://gitlab.torproject.org/tpo/core/tor/-/issues/40774libtor.a: pubsub_install tor_raw_abort2024-03-20T17:17:22Zsbslibtor.a: pubsub_install tor_raw_abort### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating)...### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating), this issue occurred 526 times in the last 28 days and was one of the main sources of crashes for the OONI Probe Android app.
A typical stack trace obtained from the Google Play console looks like this:
```
backtrace:
#00 pc 0x0000000000089b0c .../lib64/bionic/libc.so (abort+164)
#01 pc 0x00000000013778a4 .../split_config.arm64_v8a.apk (tor_raw_abort_+12)
#02 pc 0x0000000001382150 .../split_config.arm64_v8a.apk (tor_abort_+12)
#03 pc 0x00000000012470a0 .../split_config.arm64_v8a.apk (pubsub_install+120)
#04 pc 0x0000000001247170 .../split_config.arm64_v8a.apk (tor_run_main+136)
```
We investigated this issue and manage to reproduce it initially on OONI Probe Android, then in Linux using our Go code for managing libtor.a, and finally with a pure C test case working under Linux. During this investigating we have never seen the first bootstrap failing. Rather, in some cases it took > 30 repeated bootstraps to observe the abort; in other cases, it occurred within the first 3-10 bootstraps.
I searched in the issue tracker for "pubsub", "pubsub_install", "SIGABRT", and "abort". AFAICT, there is no other open issue discussing this problem, however, I think https://gitlab.torproject.org/tpo/core/tor/-/issues/32729 may be related and ~similar.
### Steps to reproduce:
The following steps allowed me to reproduce the problem on Ubuntu 22.04.2:
1. `git clone https://gitlab.torproject.org/tpo/core/tor`
2. `cd tor`
3. `git checkout tor-0.4.7.13`
4. `git apply 004.diff` where `004.diff` is
```diff
diff --git a/src/lib/pubsub/pubsub_check.c b/src/lib/pubsub/pubsub_check.c
index 99e604d715..a5cc4b7658 100644
--- a/src/lib/pubsub/pubsub_check.c
+++ b/src/lib/pubsub/pubsub_check.c
@@ -25,6 +25,7 @@
#include "lib/malloc/malloc.h"
#include "lib/string/compat_string.h"
+#include <stdio.h>
#include <string.h>
static void pubsub_adjmap_add(pubsub_adjmap_t *map,
@@ -343,21 +344,27 @@ lint_message(const pubsub_adjmap_t *map, message_id_t msg)
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has subscribers, but no publishers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_pub == 0 for %s\n", get_message_id_name(msg));
ok = false;
} else if (n_sub == 0) {
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has publishers, but no subscribers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_sub == 0 for %s\n", get_message_id_name(msg));
ok = false;
}
/* Check the message graph topology. */
- if (lint_message_graph(map, msg, pub, sub) < 0)
+ if (lint_message_graph(map, msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_graph failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
/* Check whether the messages have the same fields set on them. */
- if (lint_message_consistency(msg, pub, sub) < 0)
+ if (lint_message_consistency(msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_consistency failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
if (!ok) {
/* There was a problem -- let's log all the publishers and subscribers on
@@ -385,6 +392,7 @@ pubsub_adjmap_check(const pubsub_adjmap_t *map)
bool all_ok = true;
for (unsigned i = 0; i < map->n_msgs; ++i) {
if (lint_message(map, i) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message failed for %u %s\n", i, get_message_id_name((message_id_t)i));
all_ok = false;
}
}
@@ -401,11 +409,15 @@ pubsub_builder_check(pubsub_builder_t *builder)
pubsub_adjmap_t *map = pubsub_build_adjacency_map(builder->items);
int rv = -1;
- if (!map)
+ if (!map) {
+ fprintf(stderr, "SBSDEBUG: pubsub_build_adjacency_map failed\n");
goto err; // should be impossible
+ }
- if (pubsub_adjmap_check(map) < 0)
+ if (pubsub_adjmap_check(map) < 0) {
+ fprintf(stderr, "SBSDEBUG: pubsub_adjmap_check failed\n");
goto err;
+ }
rv = 0;
err:
```
5. `./autogen.sh`
6. `./configure --disable-asciidoc`
7. `make`
8. `mkdir tmp`
9. `vi tmp/main.c` where `main.c` contains
```C
#include "../src/feature/api/tor_api.h"
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
static void *threadMain(void *ptr) {
int *fdp = (int*)ptr;
(void)sleep(45 /* seconds */);
(void)close(*fdp);
free(fdp);
return NULL;
}
int main() {
for (;;) {
tor_main_configuration_t *config = tor_main_configuration_new();
if (config == NULL) {
exit(1);
}
char *argv[] = {
"tor",
"Log",
"notice stderr",
"DataDirectory",
"./x",
NULL,
};
int argc = 5;
if (tor_main_configuration_set_command_line(config, argc, argv) != 0) {
exit(2);
}
int filedesc = tor_main_configuration_setup_control_socket(config);
if (filedesc < 0) {
exit(3);
}
int *fdp = malloc(sizeof(*fdp));
if (fdp == NULL) {
exit(4);
}
*fdp = filedesc;
pthread_t thread;
if (pthread_create(&thread, NULL, threadMain, /* move */ fdp) != 0) {
exit(5);
}
(void)tor_run_main(config);
if (pthread_join(thread, NULL) != 0) {
exit(6);
}
fprintf(stderr, "********** doing another round\n");
}
}
```
10. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
11. `./a.out 2>&1|tee LOG.txt`
The `tmp/main.c` command is a reasonable approximation of what our Go code for running tor does. The main difference is that we start tor with `DisableNetwork` set and re-enable network later. This difference does not seem to have any impact, since we saw aborts in both cases.
We run repeated bootstraps because the OONI Probe Android app loads tor and the Go code as a shared library and calls `tor_run_main` each time we run a OONI experiment that requires tor (typically, `vanilla_tor` and `torsf`).
### What is the current bug behavior?
We can cluster the kind of crashes we observed into two groups.
#### pubsub_adjmap_check failed
This crash has been the most frequent one we observed. With the above patch applied, it generally looks like this:
```
[... omitting logs from several bootstraps ...]
Mar 22 14:07:21.000 [notice] Owning controller connection has closed -- exiting now.
Mar 22 14:07:21.000 [notice] Catching signal TERM, exiting cleanly.
********** doing another round
SBSDEBUG: n_sub == 0 for orconn_state
SBSDEBUG: lint_message failed for 5 orconn_state
SBSDEBUG: n_pub == 0 for orconn_state
SBSDEBUG: lint_message failed for 34 orconn_state
SBSDEBUG: pubsub_adjmap_check failed
[1] 300227 IOT instruction (core dumped) ./a.out 2>&1 |
300228 done tee LOG.txt
```
When running this via Go code, we see a different message before the abort. I think this happens because Go installs its own handler for SIGABRT, while the C code does not install any handler. My understanding is also that "IOT instruction" is related to `SIGIOT`, which seems to be an alias for `SIGABRT` judging from include/linux/signal.h and Glib's bits/signum-generic.h.
My understanding of the above logs is that, somehow, a message is registered twice: once without publishers, and once without subscribers.
It's also important to point out that the message causing failure has not always been `orconn_state`. Based on all the aborts we have examined, it seems that also `orconn_status` could cause failures. For the sake of brevity, I am not going to copy here all the logs we collected, but you can read them along with my thought process when analyzing the bug at https://github.com/ooni/probe/issues/2406.
#### INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
This specific error occurred very rarely (2-3 times). It is not clear whether this is the same issue or not, however I think it makes sense to mention it in the same issue, because it occurred when using the above code to investigate pubsub_install aborts.
```
2023/03/21 17:59:13 info tunnel: tor: exec: <internal/libtor> x/tunnel/torsf/tor [...]
BUG: subsystem btrack (at 55) could not connect to publish/subscribe system.
============================================================ T= 1679421553
INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
A subsystem couldn't be connected.
./testtorsf(dump_stack_symbols_to_error_fds+0x58)[0xe6df08]
./testtorsf(tor_raw_assertion_failed_msg_+0x97)[0xe6e8d7]
./testtorsf(subsystems_add_pubsub_upto+0x128)[0xe47df8]
./testtorsf(pubsub_install+0x29)[0xdf9c99]
./testtorsf(tor_run_main+0x8a)[0xdf9e2a]
./testtorsf(_cgo_2d785783cadf_Cfunc_tor_run_main+0x1b)[0xdf665b]
./testtorsf[0x500e04]
SIGABRT: abort
PC=0x7fa00f89aa7c m=14 sigcode=18446744073709551610
signal arrived during cgo execution
```
(Because this specific error occurred when using Go code, here you see also the output of Go `SIGABRT` handler.)
The specific assertion that fails in this case is the following:
```C
int
subsystems_add_pubsub_upto(pubsub_builder_t *builder,
int target_level)
{
for (unsigned i = 0; i < n_tor_subsystems; ++i) {
const subsys_fns_t *sys = tor_subsystems[i];
if (!sys->supported)
continue;
if (sys->level > target_level)
break;
if (! sys_status[i].initialized)
continue;
int r = 0;
if (sys->add_pubsub) {
subsys_id_t sysid = get_subsys_id(sys->name);
raw_assert(sysid != ERROR_ID);
pubsub_connector_t *connector;
connector = pubsub_connector_for_subsystem(builder, sysid);
r = sys->add_pubsub(connector);
pubsub_connector_free(connector);
}
if (r < 0) {
fprintf(stderr, "BUG: subsystem %s (at %u) could not connect to "
"publish/subscribe system.", sys->name, sys->level);
raw_assert_unreached_msg("A subsystem couldn't be connected."); // <- HERE
}
}
return 0;
}
```
### What is the expected behavior?
On a very broad level, I think tor should not abort. Because I do not understand very well what is happening, it is difficult to provide a more specific recommendation about what the code should actually do.
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
Always 0.4.7.13
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
Android (several versions and devices according to the Google Play console); Android 13 on Pixel 4a arm64 (my phone); Ubuntu 22.04.2 on amd64
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
Tor compiled along with all its dependencies using our build scripts as well as tor compiled from sources with Ubuntu 22.04.2 installation dependencies when reproducing the issue using the above mentioned steps.
### Relevant logs and/or screenshots
I think I already provided representative logs above. The https://github.com/ooni/probe/issues/2406 issue contains all the logs we produced while investigating this issue on our end. It also describes how we progressively narrowed down the problem from an abort in the Android app to an abort using Go code on Linux to the minimal instructions for reproducing the issue that I mentioned above.
On this note, I initially suspected that there was a data race on our end. That assumption was true but the abort continued to occur after I fixed the data race inside Go code. In any case, the possible presence of data races on our end prompted me to bypass our Go code and write C code that could allow reproducing the issue. In one of my final attempts at understanding the issue using just C code, I [patched tor to avoid aborting in case pubsub_install failed](https://github.com/ooni/probe/issues/2406#issuecomment-1479884981), recompiled and run with tsan enabled, [seeing just two pubsub_install failures over 490 runs and no sign of data races](https://github.com/ooni/probe/issues/2406#issuecomment-1480826748).
### Possible fixes
I don't know. Since the data-race theory is not supported by data and unlikely, perhaps it could be that state from previous runs causes issues with the pubsub subsystem that appear for repeated bootstraps? I'll be happy to collaborate and try other debugging strategies.https://gitlab.torproject.org/tpo/core/tor/-/issues/40772Support range in RejectPlaintextPorts2023-04-12T14:47:49ZcypherpunksSupport range in RejectPlaintextPortsexample, `RejectPlaintextPorts 1-442,444-65535` to force tor only accept https and reject else.example, `RejectPlaintextPorts 1-442,444-65535` to force tor only accept https and reject else.https://gitlab.torproject.org/tpo/core/tor/-/issues/40768Decide whether to disable circuit cannibalization entirely2023-04-12T14:47:42Zgabi-250Decide whether to disable circuit cannibalization entirelyThe discussions around #40570 reignited a discussion about circuit cannibalization, and whether the performance improvements it provides (if any) justify the maintenance costs of keeping it around. This ticket is about deciding whether w...The discussions around #40570 reignited a discussion about circuit cannibalization, and whether the performance improvements it provides (if any) justify the maintenance costs of keeping it around. This ticket is about deciding whether we should disable cannibalization in c-tor.https://gitlab.torproject.org/tpo/core/tor/-/issues/40767Investigate high circuit build error rates in simulation2023-04-12T14:46:43Zgabi-250Investigate high circuit build error rates in simulationWe ran some shadow simulations to debug/repro the issue from #40570, and @jnewsome noticed the onion service clients have consistently high [circuit build failure rates](https://gitlab.torproject.org/tpo/core/tor/-/issues/40570#note_2883...We ran some shadow simulations to debug/repro the issue from #40570, and @jnewsome noticed the onion service clients have consistently high [circuit build failure rates](https://gitlab.torproject.org/tpo/core/tor/-/issues/40570#note_2883257).
We should figure out what causes these circuit build failures.https://gitlab.torproject.org/tpo/core/tor/-/issues/40766Introduce additional HS client timeouts2023-04-12T14:46:37Zgabi-250Introduce additional HS client timeoutsToday tor terminates any circuits that take too long to build (`circuit_build_times_handle_completed_hop`, `circuit_expire_building`). In addition to this circuit built timeout, we might want to introduce timeouts for circuits that were ...Today tor terminates any circuits that take too long to build (`circuit_build_times_handle_completed_hop`, `circuit_expire_building`). In addition to this circuit built timeout, we might want to introduce timeouts for circuits that were built successfully but are stuck waiting for:
* `INTRODUCE_ACK` (for intro circuits)
* `RENDEZVOUS_ESTABLISHED` (for rend circuits)
cc @dgoulet who suggested this potential improvement for c-tor/artihttps://gitlab.torproject.org/tpo/core/tor/-/issues/40765Re think about connection management and vote relays2023-04-12T14:46:31ZredbearRe think about connection management and vote relaysHello people , how are you ? :smile:
People am here here testing few things and seeing how network usage testing both tor 4.7.xx and dev with some modifications and i would like share some of observations and ask to implement some feat...Hello people , how are you ? :smile:
People am here here testing few things and seeing how network usage testing both tor 4.7.xx and dev with some modifications and i would like share some of observations and ask to implement some features could low those DDOS attacks and spam request by guards to connections.
My first phase test was test a normal version Tor without modifications and perform some usage Advertised Bandwidth and found some to sahre :
1. A guard mode using normal features frozen more times using full/low capacity ISP usage taking hours to back to normal
2. Usually those frozen is caused by malicious people DDOS attacks and excess requests
3. Those open bar configurations overload all network tor and make our contribution hard
4. Increase my energy bills and frozen your internet most of time
Those actual implementation give me also
[warn] Decrypting superencrypted desc failed.
[warn] Service descriptor decryption failed.
[warn] HSDesc parsing failed!
IP: 148.251.46.115 Port: 1
IP: 148.251.46.115 Port: 0
IP: 95.217.200.54 Port: 1
IP: 95.217.200.54 Port: 0
IP: 85.214.42.55 Port: 1
IP: 85.214.42.55 Port: 0
IP: 185.220.101.34 Port: 1
IP: 185.220.101.34 Port: 0
IP: 37.75.166.2 Port: 1
IP: 37.75.166.2 Port: 0
This is another one i took during execution tor
[notice] Application asked to connect to port 0. Refusing.
[warn] Rejecting SOCKS request for anonymous connection to private address [scrubbed].
My second phase resumes take a dev tor and let people use what my tor/guard/middle and internet permits
1. Limit ports guard can offer in way to prevent flood requests and provide to those already connected good and stable connections
2. Accept basic ports as 53,80,443,5005,8333 and reject all btw 2:8999
3. Observe how ISP act this phase test and if it wont frozen
4. Detected some 1.1.1.1:443 try uses my guard to make spam
Using this approach this test i get a better result than running openbar stuff. This suggest me most tor using lowers ports make terrible spam and have been overload who offer low Advertised Bandwidth and as consequence get Consensus Weight very low. i don't like put my hands on source code cause sometimes i don't few comfortable to code on stuff of others so i would like make some requests in order to prevent those things to happen to low quality ISP providers and give more options to people without hurt relays internet capacity.
Suggestions to change about how servers and built:
1. Make ORPort range port usage
2. Server Could restart after time and use ORPort range as parameter to define new port usage cause actual AUTO make from 0-65535 and ISP blocks some range or low ports
3. Keep DirPort static; no need change anything
4. Makes guard limitations port to prevent overload and spam requests
5. Add blacklist capacity as sockspolicy IP "i dont know if we already have it to guards too"
6. Build a open database ips used to be scam / ddos or other bad related stuff
7. Add options to Dir vote to make sure no misunderstands appear because limitations
8. AUTO detect ports could restart many times as possible to detect ports ISP let you use
I would like more to advocate and run relays and make more suggestions if possible .... but all depends if community will be more friendly and near future help people as me to build more relays or exits . For now is all i can say and suggests. Thank you.https://gitlab.torproject.org/tpo/core/tor/-/issues/40763Implement standard Prometheus metrics2023-04-12T14:46:21Zfriendly73Implement standard Prometheus metricsMost client libraries and applications export a few standard metrics that would be useful for tor to implement. Below are some examples from a Prometheus' own `/metrics` page.
Build Info - Don't need as many labels as GO provides here b...Most client libraries and applications export a few standard metrics that would be useful for tor to implement. Below are some examples from a Prometheus' own `/metrics` page.
Build Info - Don't need as many labels as GO provides here but the short version tag that appears on the relay consensus would be good.
```
# HELP prometheus_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which prometheus was built, and the goos and goarch for the build.
# TYPE prometheus_build_info gauge
prometheus_build_info{branch="HEAD",goarch="amd64",goos="linux",goversion="go1.19.5",revision="225c61122d88b01d1f0eaaee0e05b6f3e0567ac0",version="2.42.0"} 1
```
Process start time and last config reload - Useful for dashboard annotations and alerts.
```
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.67604147258e+09
# HELP prometheus_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload.
# TYPE prometheus_config_last_reload_success_timestamp_seconds gauge
prometheus_config_last_reload_success_timestamp_seconds 1.6761274276477513e+09
```
CPU / Memory of the current process - These might be a bit of a stretch as they could be hard to implement in a cross platform way and will require supporting float values for counters.
```
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 6232.69
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.924509696e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40762add MetricsPort metrics, labels, values documentation to man page2023-04-15T19:59:07Znatadd MetricsPort metrics, labels, values documentation to man page
As mentioned during the last relay operator meetup, please document the MetricsPort prometheus metrics
their labels and values. In many cases pointer to other documentation is enough, for example in the case of tor relay flags you can p...
As mentioned during the last relay operator meetup, please document the MetricsPort prometheus metrics
their labels and values. In many cases pointer to other documentation is enough, for example in the case of tor relay flags you can point to the relevant tor specification section.
When new metrics/labels/values get introduced they should get a manpage entry with the same merge request (like requiring a changes file) to avoid this situation - metrics without documentation - from mounting up in the future again.
```
## HELP tor_relay_dos_total Denial of Service defenses related counters
## TYPE tor_relay_dos_total counter
tor_relay_dos_total
labels:
* type
* "circuit_rejected
* "circuit_killed_max_cell"
* "circuit_killed_max_cell_outq"
* "marked_address"
* "marked_address_maxq"
* "conn_rejected"
* "concurrent_conn_rejected"
* "single_hop_refused"
* "introduce2_rejected"
## HELP tor_relay_traffic_bytes Traffic related counters
## TYPE tor_relay_traffic_bytes counter
tor_relay_traffic_bytes
labels:
* direction
* "read"
* "written"
## HELP tor_relay_load_oom_bytes_total Total number of bytes the OOM has freed by subsystem
## TYPE tor_relay_load_oom_bytes_total counter
tor_relay_load_oom_bytes_total
labels:
* subsys
* "cell"
* "dns"
* "geoip"
* "hsdir"
## HELP tor_relay_load_socket_total Total number of sockets
## TYPE tor_relay_load_socket_total gauge
tor_relay_load_socket_total
labels:
* state
* "opened"
## HELP tor_relay_streams_total Total number of streams
## TYPE tor_relay_streams_total counter
tor_relay_streams_total
labels:
* type
* "BEGIN"
* "BEGIN_DIR"
* "RESOLVE"
## HELP tor_relay_circuits_total Total number of circuits
## TYPE tor_relay_circuits_total gauge
tor_relay_circuits_total
labels:
* state
* "opened"
## HELP tor_relay_load_onionskins_total Total number of onionskins handled
## TYPE tor_relay_load_onionskins_total counter
tor_relay_load_onionskins_total
labels:
* type
* "tap"
* "fast"
* "ntor"
* "ntor_v3"
* action
* "processed"
* "dropped"
## HELP tor_relay_load_global_rate_limit_reached_total Total number of global connection bucket limit reached
## TYPE tor_relay_load_global_rate_limit_reached_total counter
tor_relay_load_global_rate_limit_reached_total
labels:
* side
* "read"
* "write"
## HELP tor_relay_load_tcp_exhaustion_total Total number of times we ran out of TCP ports
## TYPE tor_relay_load_tcp_exhaustion_total counter
tor_relay_load_tcp_exhaustion_total
## HELP tor_relay_congestion_control_total Congestion control related counters
## TYPE tor_relay_congestion_control_total counter
tor_relay_congestion_control_total
labels:
* state
* "starvation"
* "clock_stalls"
* "flow_control"
* "cc_limits"
* "cc_circuits"
* action
* "rtt_reset"
* "rtt_skipped"
* "xoff_num_sent"
* "xon_num_sent"
* "above_delta"
* "above_ss_cwnd_max"
* "below_ss_inc_floor"
* "circs_created"
* "circs_exited_ss"
## HELP tor_relay_exit_dns_query_total Total number of DNS queries done by this relay
## TYPE tor_relay_exit_dns_query_total counter
tor_relay_exit_dns_query_total
## HELP tor_relay_flag Relay flags from consensus
## TYPE tor_relay_flag gauge
tor_relay_flag
labels:
* type
* "Fast"
* "Exit"
* "Authority"
* "Stable"
* "HSDir"
* "Running"
* "V2Dir"
* "Sybil"
* "Guard"
## HELP tor_relay_connections Total number of opened connections
## TYPE tor_relay_connections gauge
tor_relay_connections
labels:
* type
* "OR listener"
* "OR"
* "Exit"
* "Socks listener"
* "Socks"
* "Directory listener"
* "Directory"
* "Control listener"
* "Control"
* "Transparent pf/netfilter listener"
* "Transparent natd listener"
* "DNS listener"
* "Extended OR"
* "Extended OR listener"
* "HTTP tunnel listener"
* "Metrics listener"
* "Metrics"
* direction
* "initiated"
* "received"
* state
* "opened"
* family
* "ipv4"
* "ipv6"
## HELP tor_relay_congestion_control Congestion control related gauges
## TYPE tor_relay_congestion_control gauge
tor_relay_congestion_control
labels:
* state
* "slow_start_exit"
* "on_circ_close"
* "buffers"
* "cc_backoff"
* "cc_cwnd_update"
* "cc_estimates"
* action
* "cwnd"
* "bdp"
* "inc"
* "ss_cwnd"
* "xon_outbuf"
* "xoff_outbuf"
* "chan_blocked_pct"
* "gamma_drop"
* "delta_drop"
* "ss_chan_blocked_pct"
* "alpha_pct"
* "beta_pct"
* "delta_pct"
* "ss_queue"
* "queue"
* "bdp"
## HELP tor_relay_connections_total Total number of created/rejected connections
## TYPE tor_relay_connections_total counter
tor_relay_connections_total
same description as metric tor_relay_connections lables/values
## HELP tor_relay_exit_dns_error_total Total number of DNS errors encountered by this relay
## TYPE tor_relay_exit_dns_error_total counter
tor_relay_exit_dns_error_total
labels:
* reason
* "success"
* "format"
* "serverfailed"
* "notexist"
* "notimpl"
* "refused"
* "truncated"
* "unknown"
* "tor_timeout"
* "shutdown"
* "cancel"
* "nodata"
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40761DDoS mitigation: analysis to understand relay-to-relay connections from non-r...2023-04-12T14:46:14ZbnmDDoS mitigation: analysis to understand relay-to-relay connections from non-relay IPs
We are working on a tor proposal that should help
with protecting non-guard relays from a large fraction
of the DDoS load.
In first tests we have seen a 55% CPU usage decrease
when deploying our proposed mitigations, but we
want to mak...
We are working on a tor proposal that should help
with protecting non-guard relays from a large fraction
of the DDoS load.
In first tests we have seen a 55% CPU usage decrease
when deploying our proposed mitigations, but we
want to make sure that we are not introducing an
over blocking problem. We know about a few
configurations when a relay will use a source IP that is not in
consensus to connect to other relays (OutboundBindAddress, OutboundBindAddressOR)
but we would like to have some actual data about it.
To measure, understand and solve that potential problem and to
back up the proposal with some actual data we would
like to measure the following on our tor relays:
Log when our non-guard tor relays get
an authenticated relay to relay connection to our ORPort
from a source IP that is not in consensus and not in
the exit lists:
```
timestamp relay-fingerprint source-IP
```
If the "and not in the exit lists" part is too hard,
we can take care of that in post-processing of the logs
to filter them out.
We do not care about client to relay connections and do not want to log them.
Would it be possible to provide a patch or branch
that implements that logging on top of main?
It does not have to be in a release and we will run it
only temporarily.
thank you!https://gitlab.torproject.org/tpo/core/tor/-/issues/40754Configuration logic error when using Bridge and One-hop hidden service2023-04-12T14:45:38ZValdikSSConfiguration logic error when using Bridge and One-hop hidden service### Summary
When Bridge and HiddenServiceSingleHopMode configured simultaneously, Tor should refuse to run, but instead it runs and Hidden Service is never reachable.
This is because using Bridge force-enables UseEntryGuards, and Hidde...### Summary
When Bridge and HiddenServiceSingleHopMode configured simultaneously, Tor should refuse to run, but instead it runs and Hidden Service is never reachable.
This is because using Bridge force-enables UseEntryGuards, and HiddenServiceSingleHopMode force-disables UseEntryGuards.
### Steps to reproduce:
1. Configure `UseBridges 1`, several `Bridge` lines, and a hidden service with `HiddenServiceSingleHopMode 1`, `HiddenServiceNonAnonymousMode 1`
2. Try to run Tor
### What is the current bug behavior?
Tor runs with both Bridge and HiddenServiceSingleHopMode set, however the hidden service is not reachable from the outside in such configuration.
If `UseEntryGuards 0` is explicitly set, Tor refuses to run, as expected.
### What is the expected behavior?
Tor refuses to run in such configuration (or, better, it runs properly and Hidden service is reachable in one-hop mode and with bridge used).
### Environment
Tor version 0.4.7.13https://gitlab.torproject.org/tpo/core/tor/-/issues/40749tor process cannot be stopped sometimes2024-02-25T09:54:31Ztoralftor process cannot be stopped sometimes### Summary
The process does sometimes not accept `kill -15 <pid>`
### Steps to reproduce:
```
kill $(cat /run/tor/tor.pid)
```
has often no effect, whereas
```
kill -9 $(cat /run/tor/tor.pid)
```
works every time.
### Environment...### Summary
The process does sometimes not accept `kill -15 <pid>`
### Steps to reproduce:
```
kill $(cat /run/tor/tor.pid)
```
has often no effect, whereas
```
kill -9 $(cat /run/tor/tor.pid)
```
works every time.
### Environment
```
# tor --version
Tor version 0.4.8.0-alpha-dev (git-a9c7cd6b2c08eed9).
Tor is running on Linux with Libevent 2.1.12-stable, OpenSSL 1.1.1s, Zlib 1.2.13, Liblzma 5.4.1, Libzstd 1.5.2 and Glibc 2.36 as libc.
Tor compiled with GCC version 12.2.1
mr-fox ~ #
```
This is a hardened stable Gentoo linux:
```
uname -a
Linux mr-fox 6.1.10 #13 SMP PREEMPT_DYNAMIC Mon Feb 6 19:03:09 UTC 2023 x86_64 AMD Ryzen 9 5950X 16-Core Processor AuthenticAMD GNU/Linux
```
FWIW I do have 3 Tor processes here running in parallel, often only 1 of them is affected.
The only patch on top of git is
```
mr-fox ~ # cat /etc/portage/patches/net-vpn/tor/tor-tasks-per-cpu.patch
diff --git a/src/core/mainloop/cpuworker.c b/src/core/mainloop/cpuworker.c
index 9ad8939e4d..eaf1f9dee1 100644
--- a/src/core/mainloop/cpuworker.c
+++ b/src/core/mainloop/cpuworker.c
@@ -85,7 +85,7 @@ get_max_pending_tasks_per_cpu(const networkstatus_t *ns)
{
/* Total voodoo. Can we make this more sensible? Maybe, that is why we made it
* a consensus parameter so our future self can figure out this magic. */
-#define MAX_PENDING_TASKS_PER_CPU_DEFAULT 64
+#define MAX_PENDING_TASKS_PER_CPU_DEFAULT 1024
#define MAX_PENDING_TASKS_PER_CPU_MIN 1
#define MAX_PENDING_TASKS_PER_CPU_MAX INT32_MAX
```
config:
```
mr-fox ~ # cat /etc/tor/torrc3
# torrc
#
PIDFile /var/run/tor/tor3.pid
DataDirectory /var/lib/tor/data3
OfflineMasterKey 0
Nickname toralf3
Address 65.21.94.13
OutboundBindAddress 65.21.94.13
OutboundBindAddress [2a01:4f9:3b:468e::13]
ORPort 65.21.94.13:8443
ORPort [2a01:4f9:3b:468e::13]:8443
SocksPort 0
ControlPort 127.0.0.1:39051
ControlPort [::1]:39051
MetricsPort 127.0.0.1:39052
MetricsPortPolicy accept 127.0.0.1
#Log info file /tmp/info2.log
Log notice file /var/log/tor/notice3.log
Log warn file /var/log/tor/warn3.log
Log err file /var/log/tor/err3.log
%include /etc/tor/conf.d/
mr-fox ~ # cat /etc/tor/conf.d/00_common
User tor
SandBox 1
#OfflineMasterKey 1
MyFamily 63BF46A63F9C21FD315CD061B3EAA3EB05283A0A,509EAB4C5D10C9A9A24B4EA0CE402C047A2D64E6,4FC26DC244109105AE131628BDB0C84F2D710941
# https://torcontactinfogenerator.netlify.app/
ContactInfo email:toralf.foerster[]gmx.de url:https://zwiebeltoralf.de/ proof:uri-rsa abuse:abuse[]zwiebeltoralf.de gpg:1A376F994A9D026F13E24DCFC4EACDDE0076E94E ciissversion:2
CookieAuthentication 1
# https://stem.torproject.org/tutorials/mirror_mirror_on_the_wall.html
FetchDirInfoEarly 1
FetchDirInfoExtraEarly 1
FetchUselessDescriptors 1
DownloadExtraInfo 1
MetricsPortPolicy accept 127.0.0.1
AvoidDiskWrites 1
ConnDirectionStatistics 1
mr-fox ~ # cat /etc/tor/conf.d/90_reject
ExitPolicy reject *:*
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40747android: fdsan SIGABRT tor_main_configuration_free2023-04-12T14:45:21Zsbsandroid: fdsan SIGABRT tor_main_configuration_free### Summary
When testing tor-0.4.7.13 on Android 13, I experienced a `SIGABRT` in `tor_main_configuration_free` caused by [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md). The reason why fdsan causes an a...### Summary
When testing tor-0.4.7.13 on Android 13, I experienced a `SIGABRT` in `tor_main_configuration_free` caused by [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md). The reason why fdsan causes an abort is that the owning control socket is closed twice. I noticed this crash as part of testing a release candidate of [OONI Probe Android](https://github.com/ooni/probe-android/) where we embed `libtor.a`.
The corresponding OONI Probe issue is: https://github.com/ooni/probe/issues/2405.
### Steps to reproduce
I do not have a very simple procedure to reproduce the issue that does not involve OONI Probe and its build system.
However, the underlying issue is independent of Android. The only reason why Android matters is that the fdsan notices the double close of the same file descriptor and hence triggers a crash (for Android API level >= 30).
Because of this, here are instructions to reproduce the underlying issue using GNU/Linux (I used Ubuntu 22.04):
1. clone tor
2. `git checkout tor-0.4.7.13`
3. `git apply tor.diff` where `tor.diff` is the following patch:
```diff
diff --git a/src/core/mainloop/connection.c b/src/core/mainloop/connection.c
index cf25213cb1..d690de3892 100644
--- a/src/core/mainloop/connection.c
+++ b/src/core/mainloop/connection.c
@@ -149,6 +149,8 @@
#include "core/or/congestion_control_flow.h"
+#include <stdio.h>
+
/**
* On Windows and Linux we cannot reliably bind() a socket to an
* address and port if: 1) There's already a socket bound to wildcard
@@ -949,6 +951,7 @@ connection_free_minimal(connection_t *conn)
if (SOCKET_OK(conn->s)) {
log_debug(LD_NET,"closing fd %d.",(int)conn->s);
+ fprintf(stderr, "SBSDEBUG: connection_free_minimal %lld\n", (long long)conn->s);
tor_close_socket(conn->s);
conn->s = TOR_INVALID_SOCKET;
}
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..fb49d92ad7 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -116,6 +116,11 @@ tor_main_configuration_setup_control_socket(tor_main_configuration_t *cfg)
cfg_add_owned_arg(cfg, "__OwningControllerFD");
cfg_add_owned_arg(cfg, buf);
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_setup_control_socket %lld %lld\n",
+ (long long)fds[0], (long long)fds[1]
+ );
+
cfg->owning_controller_socket = fds[1];
return fds[0];
}
@@ -132,6 +137,10 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
raw_free(cfg->argv_owned);
}
if (SOCKET_OK(cfg->owning_controller_socket)) {
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_free %lld\n",
+ (long long)cfg->owning_controller_socket
+ );
raw_closesocket(cfg->owning_controller_socket);
}
raw_free(cfg);
```
4. `./autogen.sh`
5. `./configure --disable-asciidoc`
6. `make`
7. `mkdir tmp`
8. `vi tmp/main.c` making sure it contains the following content:
```C
#include "../src/feature/api/tor_api.h"
#include <stdlib.h>
#include <unistd.h>
int main() {
tor_main_configuration_t *cfg = tor_main_configuration_new();
if (cfg == NULL) {
exit(1);
}
tor_control_socket_t sock = tor_main_configuration_setup_control_socket(cfg);
if (sock == INVALID_TOR_CONTROL_SOCKET) {
exit(2);
}
(void)close(sock); // close immediately (it's async on Android but it should not matter AFAICT)
(void)tor_run_main(cfg);
tor_main_configuration_free(cfg);
}
```
9. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
10. `./a.out` which should produce this output:
```
SBSDEBUG: tor_main_configuration_setup_control_socket 4 5
Feb 02 17:18:07.330 [notice] Tor 0.4.7.13 (git-7c1601fb6edd780f) running on Linux with Libevent 2.1.12-stable, OpenSSL 3.0.2, Zlib 1.2.11, Liblzma N/A, Libzstd N/A and Glibc 2.35 as libc.
Feb 02 17:18:07.330 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://support.torproject.org/faq/staying-anonymous/
Feb 02 17:18:07.330 [notice] Configuration file "/usr/local/etc/tor/torrc" not present, using reasonable defaults.
Feb 02 17:18:07.331 [notice] Opening Socks listener on 127.0.0.1:9050
Feb 02 17:18:07.331 [notice] Opened Socks listener connection (ready) on 127.0.0.1:9050
Feb 02 17:18:07.000 [notice] Bootstrapped 0% (starting): Starting
Feb 02 17:18:07.000 [notice] Starting with guard context "default"
Feb 02 17:18:07.000 [notice] Owning controller connection has closed -- exiting now.
SBSDEBUG: connection_free_minimal 5
Feb 02 17:18:07.000 [notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 8
SBSDEBUG: tor_main_configuration_free 5
```
If you analyze the above output, you would see that the file descriptor `5` is closed twice. This output is almost identical to the output that I have seen in the Android logcat (more on that below). Also, I _think_ the way in which I am using the embedding API above (which mirrors our more complex implementation written in Go) is fine; if not, please educate me.
(If you want to reproduce the same problem I experience on Android, I can either explain how to compile and test OONI Probe for Android, or I can try to work on creating a simple Android PoC like the one above.)
### What is the current bug behavior?
We're in an embedding scenario where we eventually call `tor_run_main`, as mentioned in the previous section.
This is the sequence of APIs we call along with my best understanding of what happens inside `tor`:
We create a configuration using `tor_main_configuration_new`.
Calling `tor_main_configuration_setup_control_socket` creates a pair of sockets, returns `fds[0]` to us, and retains `fds[1]` inside `tor_main_configuration_t::owning_controller_socket`.
Calling `tor_run_main` calls (in a way that is not 100% clear to me) `options_act` that passes the `owning_controller_socket` to `control_connection_add_local_fd`. In turn, this function registers the file descriptor `fds[1]` as the control connection.
Eventually we `close` the `fds[0]` that was returned to us, which causes `tor` to stop its libevent loop.
When `tor_run_main` terminates, it calls `tor_cleanup`, which calls `tor_free_all`, which calls `connection_free_all`, which calls `connection_free_minimal` for each connection, including the owning file descriptor `fds[1]`.
After `tor_run_main`, we call `tor_main_configuration_free`. In turn, this function calls `raw_closesocket` on the `owning_controller_socket`, which is hence closed for the second time.
On Android with API level >= 30, the [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md) sanitizer notices the second close and _sometimes_ (roughly 50%) this fact causes the app to abort.
### What is the expected behavior?
I think tor should duplicate the file descriptor before registering it into the core event loop such that there is a single owner of each of the two duplicates. The `tor_main_configuration_free` function owns one of them and the core event loop owns the other one. This semantics shouldn't cause any issue with the fdsan sanitizer because it's designed to enforce it.
Alternatively, it should probably be documented to use API level < 30 (where the fdsan only warns). Or it should be documented that one should use the proper fdsan API to disable crashing on double close. (I do not remember seeing these warnings when I read how to use the embedding API and a quick `git grep fdsan` or `git grep "API level"` did not return anything, but it's still possible that I overlooked _some_ documentation about this issue.)
### Environment
- Which version of Tor are you using?
tor-0.4.7.13
- Which operating system are you using?
Android 13 (but I also provided a minimal example on GNU/Linux)
- Which installation method did you use?
We cross compile tor for Android using [our cross compilation scripts](https://github.com/ooni/probe-cli/tree/v3.17.0-alpha.1/internal/cmd/buildtool).
We obtain a static set of libraries and a `tor_api.h` that we link as part of building an AAR with [go mobile](https://github.com/golang/mobile).
We use the obtained AAR as a dependency for [OONI Probe Android](https://github.com/ooni/probe-android/).
The code that specifically invokes tor [is written in Go](https://github.com/ooni/probe-android/). The sequence of events in terms of the Tor embedding API is the one I described above in the "what is the current bug behavior?" section.
However, I have also provided a minimal example for GNU/Linux that shows the double-close issue.
### Relevant logs and/or screenshots
The following is an excerpt from the tombstone generated by the crashing app:
```
[notice] Catching signal TERM, exiting cleanly.
fdsan: attempted to close file descriptor 104, expected to be unowned, \
actually owned by unique_fd 0x70b7e5f19c
[...]
ABI: 'arm64'
Timestamp: 2023-02-02 11:39:31.181519664+0100
Cmdline: org.openobservatory.ooniprobe.experimental
pid: 16472, tid: 16593, name: AsyncTask #1 >>> org.openobservatory.ooniprobe.experimental <<<
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
[...]
backtrace:
#00 pc 0000000000055c48 .../lib64/bionic/libc.so (fdsan_error(char const*, ...)+556) (...)
#01 pc 0000000000055954 .../lib64/bionic/libc.so (android_fdsan_close_with_tag+732) (...)
#02 pc 00000000000560a8 .../lib64/bionic/libc.so (close+16) (...)
#03 pc 00000000012ad08c [...]/lib/arm64/libgojni.so (tor_main_configuration_free+128)
```
If I apply the patch that above I called `tor.diff` and run OONI Probe on Android, I see this in the logcat:
```
SBSDEBUG: tor_main_configuration_setup_control_socket 94 98 // <- fds[0] and fds[1]
[...]
[notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 141
SBSDEBUG: connection_free_minimal 98 // <- first close of fds[1]
SBSDEBUG: connection_free_minimal 116
SBSDEBUG: connection_free_minimal 152
SBSDEBUG: tor_main_configuration_free 98 // <- second close of fds[1]
```
### Possible fixes
The following patch makes the app work as intended (i.e., no crashes for several runs):
```diff
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..2773949264 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -131,9 +131,13 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
}
raw_free(cfg->argv_owned);
}
+ /* See https://github.com/ooni/probe/issues/2405 to understand
+ why we're not closing the controller socker here. */
+ /*
if (SOCKET_OK(cfg->owning_controller_socket)) {
raw_closesocket(cfg->owning_controller_socket);
}
+ */
raw_free(cfg);
}
```
That said, I think this patch is wrong because it leaks the file descriptor when `tor_run_main` returns prematurely (e.g., when the command line flags are wrong). Because of this, I think the more robust fix would be to duplicate the file descriptor before registering it into the libevent loop, as I explained above.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40746Conflicting logic about whether bridges need descriptors for fetching dir inf...2023-04-12T14:42:46ZRoger DingledineConflicting logic about whether bridges need descriptors for fetching dir info from themIf you start your Tor with a pile of configured bridges but nothing cached, your Tor will sample the configured bridges to pick its ordered list of primary entry guards, and launch descriptor fetches to each of them.
But if the descript...If you start your Tor with a pile of configured bridges but nothing cached, your Tor will sample the configured bridges to pick its ordered list of primary entry guards, and launch descriptor fetches to each of them.
But if the descriptor hasn't arrived yet, while trying to bootstrap dir info you get these confusing messages in your logs:
```
Jan 31 18:56:44.928 [notice] Ignoring directory request, since no bridge nodes are available yet.
```
Things do bootstrap eventually, but it takes longer than it should, and the pile of scary log messages is scary.
What's going on here?
The way the log message comes about is that directory_get_from_dirserver() calls
```
const node_t *node = guards_choose_dirguard(dir_purpose, &guard_state);
if (node && node->ri) {
[...]
} else {
[...]
log_notice(LD_DIR, "Ignoring directory request, since no bridge "
"nodes are available yet.");
}
```
i.e. guards_choose_dirguard had better return a bridge for which we have the descriptor, or we're going to log a complaint and abort the directory fetch attempt.
But in select_primary_guard_for_circuit(), we do
```
const int need_descriptor = (usage == GUARD_USAGE_TRAFFIC);
[...]
SMARTLIST_FOREACH_BEGIN(gs->primary_entry_guards, entry_guard_t *, guard) {
[...]
if (guard->is_reachable != GUARD_REACHABLE_NO) {
if (need_descriptor && !guard_has_descriptor(guard)) {
log_info(LD_GUARD, "Guard %s does not have a descriptor",
entry_guard_describe(guard));
continue;
}
```
That is, in select_primary_guard_for_circuit() we require that the bridge have a descriptor only for the GUARD_USAGE_TRAFFIC case, but then in directory_get_from_dirserver() we expect that the bridge will always have a descriptor, even in the GUARD_USAGE_DIRGUARD case.
In normal operation this bug isn't a big deal, because it is a race to finish fetching the descriptor before we happen to pick it for asking directory info. But with the #40578 fix, where we defer fetching the descriptor if we won't use the bridge for the GUARD_USAGE_TRAFFIC case, the bug becomes more obvious.
I believe the fix is simply to always need_descriptor in select_primary_guard_for_circuit() -- meaning when we're going to launch a directory fetch we always choose among our primary guards who have descriptors already.https://gitlab.torproject.org/tpo/core/tor/-/issues/40739[warn] Possible compression bomb; abandoning stream.2023-11-18T19:42:40Zcomputer_freak[warn] Possible compression bomb; abandoning stream.Relay with `Tor 0.4.7.13`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 86.59.21.38:80).
```
obfs4 Bridge with ...Relay with `Tor 0.4.7.13`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 86.59.21.38:80).
```
obfs4 Bridge with `Tor 0.4.7.13`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Error while uncompressing data: bad input?
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 95.214.53.221:443).
```
Relay with `Tor 0.4.8.0-alpha-dev`:
```
[warn] Possible compression bomb; abandoning stream.
[warn] Unable to decompress HTTP body (tried Zstandard compressed, on Directory connection (client reading) with 199.58.81.140:80).
```
There are no more logs on `notice` log level.
A user at the [forum](https://forum.torproject.net/t/compression-bomb-in-tor-logs/6226) has the same problem.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40735[WARN] Tried connecting to router ... identity keys were not as expected2023-11-14T16:59:05Zcypherpunks[WARN] Tried connecting to router ... identity keys were not as expectedBackground: Tor Browser 12.0, Tor 4.7.12, Windows 7, vanilla bridges.
Repeatedly getting the following log line.
```
[WARN] Tried connecting to router at *address* ID=<none> RSA_ID=*FP1*, but RSA + ed25519 identity keys were not as exp...Background: Tor Browser 12.0, Tor 4.7.12, Windows 7, vanilla bridges.
Repeatedly getting the following log line.
```
[WARN] Tried connecting to router at *address* ID=<none> RSA_ID=*FP1*, but RSA + ed25519 identity keys were not as expected: wanted *FP1* + no ed25519 key but got *FP2* + *edFP*.
```
Ideas of what happened:
* MITM
* Bridge operator reinstalled it in-between me getting the bridge and now.
What is wrong:
* Bridge should be marked as unreachable: either it is not used already and connections are doomed to spend resources for nothing, or it should not be used as something is clearly wrong with it
* There should be a way to distinguish first idea from second - my best guess is building a tunneled directory connection to bridge authority and asking "Is there a bridge *FP2* and does it listen on *address*?"https://gitlab.torproject.org/tpo/core/tor/-/issues/40717Additional metricsport stats for various stages of onionservice handshake2023-12-07T14:41:35ZMike PerryAdditional metricsport stats for various stages of onionservice handshakeIf we export additional onion service metrics such as time measurements on the HSDIR, INTRO, and REND stages of circuit setup for both client and service side, and the number of timeouts/failures there, it would help to uncover the root ...If we export additional onion service metrics such as time measurements on the HSDIR, INTRO, and REND stages of circuit setup for both client and service side, and the number of timeouts/failures there, it would help to uncover the root cause of issues like https://gitlab.torproject.org/tpo/core/tor/-/issues/40570 and related reliability and connectivity issues with onion services.
We can also export congestion control info from https://gitlab.torproject.org/tpo/core/tor/-/issues/40708 to the onionservice metrics set, too, which can help us with tuning congestion control for onion services.
We can then hook up the onionperf onion service instances to our grafana dashboard, and gather more detailed stats that way, as a supplement to the metrics that get graphed on the metrics website.https://gitlab.torproject.org/tpo/core/tor/-/issues/40716Impelement conflux for onion services2022-11-28T14:01:05ZMike PerryImpelement conflux for onion servicesConflux is traffic splitting, and will result in increased throughput and reduced latency for onion services after a connection has been established, by routing traffic over multiple paths, or via the lowest latency path to a service.
T...Conflux is traffic splitting, and will result in increased throughput and reduced latency for onion services after a connection has been established, by routing traffic over multiple paths, or via the lowest latency path to a service.
This ticket is for the onion service pieces of conflux (https://gitlab.torproject.org/tpo/core/tor/-/issues/40593).
We will not be implementing the onion services pieces of conflux as part of that ticket. It can be done later, if any onion service sponsors care about latency or throughput.
The pieces for onion services are:
- **Negotiation**
- [ ] Protover Advertisement for Onions (24h)
- [ ] Rend circuit linking (40h)
This is specified in https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/329-traffic-splitting.txt, but we probably want to allow onion services to configure their scheduler by manually choosing either BLEST, or LowRTT, since different kinds of onion services may want to optimize for either throughput or latency.
There may be some additional work wrt making sure linked edge conns work properly, if they are handled differently for the onion service case.
Also, some shadow validation and performance testing will be needed. Maybe 40h or so of dev time (though much longer wall-clock time).https://gitlab.torproject.org/tpo/core/tor/-/issues/40715MetricsPort: inbound ORPort connections: relays vs. non-relay connections2023-09-22T23:50:13ZcypherpunksMetricsPort: inbound ORPort connections: relays vs. non-relay connectionsthis got previously submitted on 2022-10-24 https://gitlab.torproject.org/tpo/core/tor/-/issues/40194#note_2849481
but that issue got closed and asked for new specific tickets for each new metric:
From last week's relay meetup we know t...this got previously submitted on 2022-10-24 https://gitlab.torproject.org/tpo/core/tor/-/issues/40194#note_2849481
but that issue got closed and asked for new specific tickets for each new metric:
From last week's relay meetup we know that tor knows whether an incoming OR connection is from a client or from a relay without looking at the source IP address.
https://pad.riseup.net/p/tor-relay-op-meetup-o22-keep
From the metrics added in !625 (merged) we know, that the increased CPU load correlates with an increase in the rate of new inbound OR connections. This rate increases when CPU load increases on exits:
```
rate(tor_relay_connections{type="OR",state="created",direction="received"}[$__rate_interval])
```
Could you please add a label for OR connections coming from clients vs. OR connections coming from other relays?
This would allow us to confirm that exits get more new inbound connections from clients when CPU load increases.
that new label could be `src`:
```
tor_relay_connections_total{type="OR",state="created",direction="received",src="relay"}
tor_relay_connections_total{type="OR",state="created",direction="received",src="non-relay"}
tor_relay_connections{type="OR",state="opened",direction="received",src="relay"}
tor_relay_connections{type="OR",state="opened",direction="received",src="non-relay"}
```