Tor issueshttps://gitlab.torproject.org/tpo/core/tor/-/issues2023-05-15T16:39:11Zhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40779Investigate address detection usage in Tor2023-05-15T16:39:11ZAlexander Færøyahf@torproject.orgInvestigate address detection usage in TorTor currently has a number of ways of detecting its own address when being used as relay. This includes:
- netinfo cell
- dirport connections to other relays
- configuration specification
It would be useful for the Arti WG to learn whi...Tor currently has a number of ways of detecting its own address when being used as relay. This includes:
- netinfo cell
- dirport connections to other relays
- configuration specification
It would be useful for the Arti WG to learn which of these methods are actively being used before we start implementing relay support in Arti.
A useful thing to do here is to enumerate the methods we have and extend MetricsPort to store information on where the relay learned its address for so we can make a sensible decision.Tor: 0.4.8.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40422[CircuitPadding] circpad_add_matching_machines() should be called when a cir...2023-06-09T13:26:45ZJaym[CircuitPadding] circpad_add_matching_machines() should be called when a circuit has opened.### Summary
The circuit padding framework supports negotiating padding upon various events. Among them, CIRCPAD_CIRC_OPENED states that a given padding machine should be applied to a circuit when a circuit has opened.
However, no code ...### Summary
The circuit padding framework supports negotiating padding upon various events. Among them, CIRCPAD_CIRC_OPENED states that a given padding machine should be applied to a circuit when a circuit has opened.
However, no code seems to trigger this mechanism. When a circuit has built, the function circpad_machine_event_circ_built() is called and checks whether some machine may be removed/added to the circuit. However, at this stage of the circuit building process, the circuit has built but is not marked as open yet.
### Bug
If some machine uses `client_machine->conditions.apply_state_mask = CIRCPAD_CIRC_OPENED;` the machine would only be applied when another event than a circ building/opening triggers the function circpad_add_matching_machines() (e.g., ap conn links a stream, or the circ purpose changes from general to something else).
### What is the expected behavior?
When circuituse.c calls circuit_has_opened(), it should also call the circpad module; e.g., a new function circpad_machine_event_circ_opened() that checks for adding machine to the circuit.
### Environment
Running a version forked from 0.4.5.7
### Relevant logs and/or screenshots
Contains some logs showing a call to circpad_machine_event_circ_built() while the circuit is still marked as building. Also contains custom logs:
```Jun 30 11:23:50.000 [info] circuit_finish_handshake(): Finished building circuit hop:
Jun 30 11:23:50.000 [info] internal (high-uptime) circ (length 3, last hop test000a): $22BA781A60C0CBB7FFAEA8858128427F67F60038(open) $7684DE04DCBB44538554E2CD1D14CDF836D5AF4D(open) $C7ADB1DBCE99F0B2ED2812B1953E4986EE9846DB(open)
Jun 30 11:23:50.000 [debug] dispatch_send_msg_unchecked(): Queued: ocirc_cevent (<gid=7 evtype=2 reason=0 onehop=0>) from or, on ocirc.
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering: ocirc_cevent (<gid=7 evtype=2 reason=0 onehop=0>) from or, on ocirc:
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering to btrack.
Jun 30 11:23:50.000 [debug] btc_cevent_rcvr(): CIRC gid=7 evtype=2 reason=0 onehop=0
Jun 30 11:23:50.000 [debug] circuit_build_times_add_time(): Adding circuit build time 43
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking condition state mask 21 vs condition: 2
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_event_circ_built(): Circpad module event circ built -- circ state: 0
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking condition state mask 21 vs condition: 2
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] invoke_plugin_operation_or_default(): Plugin found for caller calling a plugin in the circpad module when a circuit has built
Jun 30 11:23:50.000 [info] circpad_dropmark_activate_when_built(): Looks like the client_dropmark_def machine does not exist over this circuit
Jun 30 11:23:50.000 [debug] plugin_run(): Plugin execution returned -2147483648
Jun 30 11:23:50.000 [debug] plugin_run(): vm error message: (null)
Jun 30 11:23:50.000 [info] entry_guards_note_guard_success(): Recorded success for primary confirmed guard test002r ($22BA781A60C0CBB7FFAEA8858128427F67F60038)
Jun 30 11:23:50.000 [debug] dispatch_send_msg_unchecked(): Queued: ocirc_state (<gid=7 state=4 onehop=0>) from or, on ocirc.
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering: ocirc_state (<gid=7 state=4 onehop=0>) from or, on ocirc:
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering to btrack.
Jun 30 11:23:50.000 [debug] btc_state_rcvr(): CIRC gid=7 state=4 onehop=0
Jun 30 11:23:50.000 [info] circuit_build_no_more_hops(): circuit built!
Jun 30 11:23:50.000 [info] pathbias_count_build_success(): Got success count 3.000000/3.000000 for guard test002r ($22BA781A60C0CBB7FFAEA8858128427F67F60038)
Jun 30 11:23:50.000 [debug] circuit_has_opened(): calling circuit_has_opened()
```
### Possible fixes
Add a new function circpad_machine_event_circ_opened() called from circuituse.c when the circuit has opened.Tor: 0.4.8.x-freezeMike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40907Starting tor with --DormantOnFirstStartup doesn't set GETINFO dormant to 12024-01-24T19:08:12ZCrazyChaozStarting tor with --DormantOnFirstStartup doesn't set GETINFO dormant to 1### Summary
Starting tor with --DormantOnFirstStartup 1 on a new DataDirectory doesn't set GETINFO dormant to 1
### Steps to reproduce:
1. start tor with ```tor --DormantOnFirstStartup 1 --DataDirectory some-empty-dir```
2. do a ```GE...### Summary
Starting tor with --DormantOnFirstStartup 1 on a new DataDirectory doesn't set GETINFO dormant to 1
### Steps to reproduce:
1. start tor with ```tor --DormantOnFirstStartup 1 --DataDirectory some-empty-dir```
2. do a ```GETINFO dormant```
### What is the current bug behavior?
GETINFO dormant returns 0, as if it weren't in dormant mode
### What is the expected behavior?
GETINFO dormant returns 1, as it is in dormant mode
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
- 0.4.6.10 and 0.4.5.10
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
- Pop!_OS 22.04 and Ubuntu 18.04.6
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
- apt and ?Tor: 0.4.9.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40831null pointer dereference if threadpool initialization fails2023-10-08T15:25:20ZAlex Xunull pointer dereference if threadpool initialization fails```
In function 'threadpool_register_reply_event',
inlined from 'cpuworker_init' at src/core/mainloop/cpuworker.c:140:13,
inlined from 'run_tor_main_loop' at src/app/main/main.c:1230:3,
inlined from 'tor_run_main' at src/app/...```
In function 'threadpool_register_reply_event',
inlined from 'cpuworker_init' at src/core/mainloop/cpuworker.c:140:13,
inlined from 'run_tor_main_loop' at src/app/main/main.c:1230:3,
inlined from 'tor_run_main' at src/app/main/main.c:1359:14,
inlined from 'tor_main' at src/feature/api/tor_api.c:166:12,
inlined from 'main' at src/app/main/tor_main.c:32:7:
src/lib/evloop/workqueue.c:631:9: warning: potential null pointer dereference [-Wnull-dereference]
631 | if (tp->reply_event) {
| ^
```
if `threadpool_new` fails, then `tp` will be null. `spawn_func` should not normally fail, and furthermore the result will most likely be a non-exploitable segmentation fault, but it is still technically undefined behavior and should be fixed.Tor: 0.4.9.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40803Cannot write to ClientOnionAuthDir when Sandbox is enabled2023-06-05T16:46:55ZanonymCannot write to ClientOnionAuthDir when Sandbox is enabled### Summary
When `tor` has the sandbox option enabled it cannot write to the `ClientOnionAuthDir` directory to store onion auth keys, e.g. when checking the "Remember this key" checkbox in Tor Browser when providing the key.
### Steps ...### Summary
When `tor` has the sandbox option enabled it cannot write to the `ClientOnionAuthDir` directory to store onion auth keys, e.g. when checking the "Remember this key" checkbox in Tor Browser when providing the key.
### Steps to reproduce:
1. Configure `tor` with `Sandbox 1`
2. Configure `tor` with `ClientOnionAuthDir /some/writable/directory`
3. Use Tor Browser to access an onion service with onion authentication
4. Check the "Remember this key" checkbox when providing the key
### What is the current bug behavior?
The onion auth prompt in Tor Browser reports "Unable to store creds for ...", and no key is written to the `ClientOnionAuthDir` directory.
### What is the expected behavior?
No errors, and the key should be written to the `ClientOnionAuthDir` directory.
### Environment
- Tor version 0.4.7.13
- Tested both on Debian Sid and inside Tails with `tor` installed via `apt`
### Relevant logs and/or screenshots
```
Jun 02 13:04:02.000 [warn] sandbox_intern_string(): Bug: No interned sandbox parameter found for /var/lib/tor/keys/n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd.auth_private.tmp (on Tor 0.4.7.13 )
Jun 02 13:00:25.000 [warn] Couldn't open "/var/lib/tor/keys/n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd.auth_private.tmp" (/var/lib/tor/keys/n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd.auth_private) for writing: Operation not permitted
Jun 02 13:00:25.000 [warn] Failed to write client auth creds file for n7wwn7f4jirk2yaukobahoane722lnvi7d65emwj4toas7uf5oaomdyd!
```
### Possible fixes
Update the sandbox rules.Tor: 0.4.9.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40901Document for the Relay Operator community how to debug relays that are slower...2023-12-19T07:53:56ZAlexander Færøyahf@torproject.orgDocument for the Relay Operator community how to debug relays that are slower than what the operator expectsThis idea origins from a conversation betweeh @beth, @gk and I on #tor-dev today.
We often release new features of C Tor to the relay operators that causes discussions/conversations around whether Tor has gotten faster/slower/uses (more...This idea origins from a conversation betweeh @beth, @gk and I on #tor-dev today.
We often release new features of C Tor to the relay operators that causes discussions/conversations around whether Tor has gotten faster/slower/uses (more|less) memory/crashes (more|less) often/etc. many of these items are hard to give a definitive "yes, the cause of this is X" and it's very time consuming for the Network Team to debug each item individually with the operator.
It would be very useful to have a document in place that informs relay operators about the different situations that may impact performance and how they can get some performance measurements they can then compare to and see if our performance have truly regressed. This can also be used to push MetricsPort to more operators.
We can expand upon the document over time as we discover new ways to do this analysis and/or from feedback from the relay operator community.
This is related to:
- https://lists.torproject.org/pipermail/tor-relays/2023-December/021409.html
- https://lists.torproject.org/pipermail/tor-relays/2023-December/021407.html
This may be relevant to Arti Relay too.
CC @mikeperry for awareness.https://gitlab.torproject.org/tpo/core/tor/-/issues/40847tor 0.4.8.4: compilation error on SunOS / OpenIndiana2023-09-06T00:57:05Zsvschmeltor 0.4.8.4: compilation error on SunOS / OpenIndianaCompiling tor 0.4.8.4 throws the following error.
(Similar to https://gitlab.torproject.org/tpo/core/tor/-/issues/40843 => compilation error on NetBSD)
```
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equ...Compiling tor 0.4.8.4 throws the following error.
(Similar to https://gitlab.torproject.org/tpo/core/tor/-/issues/40843 => compilation error on NetBSD)
```
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c: In function 'hashx_vm_alloc_huge':
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c:113:5: error: 'MAP_HUGETLB' undeclared (first use in this function)
113 | | MAP_HUGETLB | MAP_POPULATE, -1, 0);
| ^~~~~~~~~~~
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c:113:5: note: each undeclared identifier is reported only once for each function it appears in
/export/home/svschmel/oi-userland/components/network/tor/tor-0.4.8.4/src/ext/equix/hashx/src/virtual_memory.c:113:19: error: 'MAP_POPULATE' undeclared (first use in this function); did you mean 'MAP_PRIVATE'?
113 | | MAP_HUGETLB | MAP_POPULATE, -1, 0);
| ^~~~~~~~~~~~
| MAP_PRIVATE
make[2]: *** [Makefile:16523: src/ext/equix/hashx/src/libhashx_a-virtual_memory.o] Error 1
make[2]: Leaving directory '/export/home/svschmel/oi-userland/components/network/tor/build/amd64'
make[1]: *** [Makefile:7648: all] Error 2
```
These defines do not exist on SunOS / OpenIndiana
=> https://www.illumos.org/man/2/mmap
I suggest handling SunOS like OpenBSD in file "virtual_memory.c" because the default => the "#else"- tree <= assumes that the underlying OS is a Linux-system.
```
...
#elif defined(__OpenBSD__)
mem = MAP_FAILED; // OpenBSD does not support huge pages
#else
mem = mmap(NULL, bytes, PAGE_READWRITE, MAP_PRIVATE | MAP_ANONYMOUS
| MAP_HUGETLB | MAP_POPULATE, -1, 0);
...
```
With this modification the code can be compiled on SunOS / Openindiana successful.https://gitlab.torproject.org/tpo/core/tor/-/issues/40840Prevent outbound cell command flipping2024-02-13T17:00:47ZMike PerryPrevent outbound cell command flippingAs per https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/344-protocol-info-leaks.txt#L197, the RELAY_EARLY fix did not address the outbound direction.
We can fix this by checking at relays that the cell command field ...As per https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/344-protocol-info-leaks.txt#L197, the RELAY_EARLY fix did not address the outbound direction.
We can fix this by checking at relays that the cell command field does not switch back and forth between RELAY and RELAY_EARLY. Then, so long as the middle relay is honest, this vector cannot be used as a covert channel between the Guard and the Exit.
This fix should be relatively simple and can be backported, though we should of course test it in shadow.Mike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40836Update recommended/required protocol lists?2024-01-16T15:38:37ZNick MathewsonUpdate recommended/required protocol lists?We haven't updated the recommended/required protocol lists since 2021, possibly longer. If we mark more protocols as required or recommended, we can more correctly reason about the network.
The protocols that are unconditionally suppor...We haven't updated the recommended/required protocol lists since 2021, possibly longer. If we mark more protocols as required or recommended, we can more correctly reason about the network.
The protocols that are unconditionally supported by 0.4.7.7 (our oldest supported stable) are:
* `Cons=1-2 Desc=1-2 DirCache=2 FlowCtrl=1-2 HSDir=2 HSIntro=4-5 HSRend=1-2 Link=1-5 LinkAuth=3 Microdesc=1-2 Padding=2 Relay=1-4`
The protocols that are unconditionally supported by the most recent Arti are not currently listed anywhere or enforced in Arti. :disappointed: So maybe we should take care of that first?
The protocols that the consensus currently recommends are:
* `recommended-client-protocols Cons=2 Desc=2 DirCache=2 HSDir=2 HSIntro=4 HSRend=2 Link
=4-5 Microdesc=2 Relay=2`
* `recommended-relay-protocols Cons=2 Desc=2 DirCache=2 HSDir=2 HSIntro=4 HSRend=2 Link=4-5 LinkAuth=3 Microdesc=2 Relay=2`
The protocols that the consensus currently requires are:
* `required-client-protocols Cons=2 Desc=2 Link=4 Microdesc=2 Relay=2`
* `required-relay-protocols Cons=2 Desc=2 DirCache=2 HSDir=2 HSIntro=4 HSRend=2 Link=4-5 LinkAuth=3 Microdesc=2 Relay=2`
cc @dgoulet @mikeperryNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40822[warn] Possible replay detected! An INTRODUCE2 cell with the same ENCRYPTED s...2023-08-28T12:28:35ZTimeIsGold[warn] Possible replay detected! An INTRODUCE2 cell with the same ENCRYPTED section was seen 0 seconds ago. Dropping cell.I am running HS using Tor 0.4.7.13 Running on Windows Server and i am using Obfs4 Bridges for my nodes.
i got this messages every day and i don't know what is the problem
`[warn] Possible replay detected! An INTRODUCE2 cell with the sam...I am running HS using Tor 0.4.7.13 Running on Windows Server and i am using Obfs4 Bridges for my nodes.
i got this messages every day and i don't know what is the problem
`[warn] Possible replay detected! An INTRODUCE2 cell with the same ENCRYPTED section was seen 0 seconds ago. Dropping cell.`
`[notice] Your network connection speed appears to have changed. Resetting timeout to 60000ms after 18 timeouts and 1000 buildtimes.`https://gitlab.torproject.org/tpo/core/tor/-/issues/40820connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at...2024-03-21T18:27:41Zcomputer_freakconnection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )```
...
07:55:37.000 [notice] Heartbeat: Tor's uptime is 7 days 18:00 hours, with 71411 circuits open. I've sent 10022.56 GB and received 9786.33 GB. I've received 3866839 connections on IPv4 and 21166 on IPv6. I've made 181900 connectio...```
...
07:55:37.000 [notice] Heartbeat: Tor's uptime is 7 days 18:00 hours, with 71411 circuits open. I've sent 10022.56 GB and received 9786.33 GB. I've received 3866839 connections on IPv4 and 21166 on IPv6. I've made 181900 connections with IPv4 and 56445 with IPv6.
07:55:37.000 [notice] While bootstrapping, fetched this many bytes: 8107 (microdescriptor fetch)
07:55:37.000 [notice] While not bootstrapping, fetched this many bytes: 148937098 (server descriptor fetch); 5760 (server descriptor upload); 12096007 (consensus network-status fetch); 3883848 (microdescriptor fetch)
07:55:37.000 [notice] Circuit handshake stats since last time: 517/517 TAP, 3956544/3973033 NTor.
07:55:37.000 [notice] Since startup we initiated 0 and received 0 v1 connections; initiated 0 and received 0 v2 connections; initiated 0 and received 6320 v3 connections; initiated 3 and received 190964 v4 connections; initiated 71807 and received 3613775 v5 connections.
07:55:37.000 [notice] Heartbeat: DoS mitigation since startup: 1037 circuits killed with too many cells, 32767104 circuits rejected, 401 marked addresses, 1 marked addresses for max queue, 172207 same address concurrent connections rejected, 0 connections rejected, 0 single hop clients refused, 2185090 INTRODUCE2 rejected.
13:37:58.000 [notice] We're low on memory (cell queues total alloc: 10805520 buffer total alloc: 12886016, tor compress total alloc: 667208970 (zlib: 259584, zstd: 666941450, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:37:58.000 [notice] Removed 85978221 bytes by killing 62 circuits; 49010 circuits remain alive. Also killed 0 non-linked directory connections. Killed 3 edge connections
13:37:58.000 [warn] connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )
13:37:58.000 [warn] tor_bug_occurred_(): Bug: ../src/core/or/connection_edge.c:1065: connection_edge_about_to_close: This line should not have been reached. (Future instances of this warning will be silenced.) (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: Tor 0.4.7.13: Line unexpectedly reached at connection_edge_about_to_close at ../src/core/or/connection_edge.c:1065. Stack trace: (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(log_backtrace_impl+0x57) [0x55ba820720e7] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(tor_bug_occurred_+0x16b) [0x55ba8207d30b] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(connection_exit_about_to_close+0x1d) [0x55ba8211e4fd] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(+0x6b5ed) [0x55ba81ff65ed] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(+0x6bcfb) [0x55ba81ff6cfb] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7(+0x23b4f) [0x7fc9eef8db4f] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/lib/x86_64-linux-gnu/libevent-2.1.so.7(event_base_loop+0x52f) [0x7fc9eef8e28f] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(do_main_loop+0x101) [0x55ba81ff86f1] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(tor_run_main+0x1e5) [0x55ba81ff3fc5] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(tor_main+0x49) [0x55ba81ff02d9] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(main+0x19) [0x55ba81fefeb9] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7fc9ee828d0a] (on Tor 0.4.7.13 )
13:37:58.000 [warn] Bug: /usr/bin/tor(_start+0x2a) [0x55ba81feff0a] (on Tor 0.4.7.13 )
13:37:58.000 [warn] connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )
13:37:58.000 [warn] connection_edge_about_to_close(): Bug: (Harmless.) Edge connection (marked at ../src/core/or/circuitlist.c:2713) hasn't sent end yet? (on Tor 0.4.7.13 )
13:37:58.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:37:58.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:10.000 [notice] We're low on memory (cell queues total alloc: 15153072 buffer total alloc: 11710464, tor compress total alloc: 664530008 (zlib: 302848, zstd: 664219240, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:10.000 [notice] Removed 83687644 bytes by killing 17 circuits; 49083 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:38:11.000 [notice] We're low on memory (cell queues total alloc: 14386416 buffer total alloc: 12208128, tor compress total alloc: 665847849 (zlib: 259584, zstd: 665580345, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:11.000 [notice] Removed 88057871 bytes by killing 17 circuits; 49151 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:11.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [notice] We're low on memory (cell queues total alloc: 13510992 buffer total alloc: 11796480, tor compress total alloc: 665718009 (zlib: 129792, zstd: 665580345, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:12.000 [notice] Removed 82329467 bytes by killing 14 circuits; 49226 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:12.000 [warn] channel_flush_from_first_active_circuit(): Bug: Found a supposedly active circuit with no cells to send. Trying to recover. (on Tor 0.4.7.13 )
13:38:13.000 [notice] We're low on memory (cell queues total alloc: 14437632 buffer total alloc: 12062720, tor compress total alloc: 664356888 (zlib: 129792, zstd: 664219240, lzma: 0), rendezvous cache total alloc: 109213998). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
13:38:13.000 [notice] Removed 81757690 bytes by killing 15 circuits; 49281 circuits remain alive. Also killed 0 non-linked directory connections. Killed 0 edge connections
13:54:30.000 [notice] Sudden decrease in circuit RTT (11 vs 119683), likely due to clock jump.
13:55:37.000 [notice] Heartbeat: Tor's uptime is 8 days 0:00 hours, with 74885 circuits open. I've sent 10349.14 GB and received 10099.50 GB. I've received 3979142 connections on IPv4 and 21891 on IPv6. I've made 187464 connections with IPv4 and 57946 with IPv6.
13:55:37.000 [notice] While bootstrapping, fetched this many bytes: 8107 (microdescriptor fetch)
13:55:37.000 [notice] While not bootstrapping, fetched this many bytes: 154088181 (server descriptor fetch); 5760 (server descriptor upload); 12495772 (consensus network-status fetch); 4020088 (microdescriptor fetch)
13:55:37.000 [notice] Circuit handshake stats since last time: 581/581 TAP, 2997064/3009741 NTor.
13:55:37.000 [notice] Since startup we initiated 0 and received 0 v1 connections; initiated 0 and received 0 v2 connections; initiated 0 and received 6720 v3 connections; initiated 3 and received 199738 v4 connections; initiated 73618 and received 3714169 v5 connections.
13:55:37.000 [notice] Heartbeat: DoS mitigation since startup: 1051 circuits killed with too many cells, 33500595 circuits rejected, 404 marked addresses, 1 marked addresses for max queue, 183981 same address concurrent connections rejected, 0 connections rejected, 0 single hop clients refused, 2523887 INTRODUCE2 rejected.
...
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40809Bring padding machines into sync with Tobias's latest changes2023-06-15T10:31:28ZMike PerryBring padding machines into sync with Tobias's latest changesTobias has added some changes to the padding machines in his latest research: in particular, the padding machine can respond to queue length/congestion signals. I believe there are now also probabilistic transitions (ie https://gitlab.to...Tobias has added some changes to the padding machines in his latest research: in particular, the padding machine can respond to queue length/congestion signals. I believe there are now also probabilistic transitions (ie https://gitlab.torproject.org/tpo/core/tor/-/issues/31636 or https://gitlab.torproject.org/tpo/core/tor/-/issues/31787).
I need to read his latest paper, sync with him, and discuss these things. This will generate new tickets.Mike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40807Look for the lib64 directory when using a custom OpenSSL directory2023-06-12T16:28:25ZPier Angelo VendrameLook for the lib64 directory when using a custom OpenSSL directoryFor Tor Browser, we build tor in an old Debian container that has a very old OpenSSL.
So, we also compile OpenSSL and use a custom prefix with `--with-openssl-dir=$openssldir`.
From what I understood (by compiling with `make V=1`), it s...For Tor Browser, we build tor in an old Debian container that has a very old OpenSSL.
So, we also compile OpenSSL and use a custom prefix with `--with-openssl-dir=$openssldir`.
From what I understood (by compiling with `make V=1`), it seems to me that tor's build system tries to use `$openssldir/lib` as a library directory (and IIRC, it falls back to `$openssldir` when it doesn't exist).
However, with OpenSSL 3, the default directory has become `$openssldir/lib64` (at least on Linux amd64).
Linking `$openssldir/lib` to `$openssldir/lib64` seems to solve all the various problems.https://gitlab.torproject.org/tpo/core/tor/-/issues/40806running tor guard/bridge/exit behind layers of nat2023-06-13T20:47:38Zredbearrunning tor guard/bridge/exit behind layers of natThe issue is i can start a node and port is reached by outside but outside ips need uses some intermediate ip to access node. There is some work around ? is possible make something or config on torrc ?The issue is i can start a node and port is reached by outside but outside ips need uses some intermediate ip to access node. There is some work around ? is possible make something or config on torrc ?https://gitlab.torproject.org/tpo/core/tor/-/issues/40802Dir auths say "Failed to find node for hop #2 of our path. Discarding this ci...2024-01-31T23:49:46ZRoger DingledineDir auths say "Failed to find node for hop #2 of our path. Discarding this circuit." every second after boot until new consensusStarting somewhere in Tor 0.4.7, every directory authority now prints thousands of lines of
```
Jun 01 14:51:33.790 [notice] Failed to find node for hop #2 of our path. Discarding this circuit.
```
on startup. It continues until the top ...Starting somewhere in Tor 0.4.7, every directory authority now prints thousands of lines of
```
Jun 01 14:51:33.790 [notice] Failed to find node for hop #2 of our path. Discarding this circuit.
```
on startup. It continues until the top of the hour when
```
Jun 01 14:59:59.942 [notice] Failed to find node for hop #2 of our path. Discarding this circuit.
Jun 01 15:00:00.017 [notice] Time to publish the consensus and discard old votes
Jun 01 15:00:00.162 [notice] Published ns consensus
Jun 01 15:00:00.315 [notice] Published microdesc consensus
```https://gitlab.torproject.org/tpo/core/tor/-/issues/40774libtor.a: pubsub_install tor_raw_abort2024-03-20T17:17:22Zsbslibtor.a: pubsub_install tor_raw_abort### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating)...### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating), this issue occurred 526 times in the last 28 days and was one of the main sources of crashes for the OONI Probe Android app.
A typical stack trace obtained from the Google Play console looks like this:
```
backtrace:
#00 pc 0x0000000000089b0c .../lib64/bionic/libc.so (abort+164)
#01 pc 0x00000000013778a4 .../split_config.arm64_v8a.apk (tor_raw_abort_+12)
#02 pc 0x0000000001382150 .../split_config.arm64_v8a.apk (tor_abort_+12)
#03 pc 0x00000000012470a0 .../split_config.arm64_v8a.apk (pubsub_install+120)
#04 pc 0x0000000001247170 .../split_config.arm64_v8a.apk (tor_run_main+136)
```
We investigated this issue and manage to reproduce it initially on OONI Probe Android, then in Linux using our Go code for managing libtor.a, and finally with a pure C test case working under Linux. During this investigating we have never seen the first bootstrap failing. Rather, in some cases it took > 30 repeated bootstraps to observe the abort; in other cases, it occurred within the first 3-10 bootstraps.
I searched in the issue tracker for "pubsub", "pubsub_install", "SIGABRT", and "abort". AFAICT, there is no other open issue discussing this problem, however, I think https://gitlab.torproject.org/tpo/core/tor/-/issues/32729 may be related and ~similar.
### Steps to reproduce:
The following steps allowed me to reproduce the problem on Ubuntu 22.04.2:
1. `git clone https://gitlab.torproject.org/tpo/core/tor`
2. `cd tor`
3. `git checkout tor-0.4.7.13`
4. `git apply 004.diff` where `004.diff` is
```diff
diff --git a/src/lib/pubsub/pubsub_check.c b/src/lib/pubsub/pubsub_check.c
index 99e604d715..a5cc4b7658 100644
--- a/src/lib/pubsub/pubsub_check.c
+++ b/src/lib/pubsub/pubsub_check.c
@@ -25,6 +25,7 @@
#include "lib/malloc/malloc.h"
#include "lib/string/compat_string.h"
+#include <stdio.h>
#include <string.h>
static void pubsub_adjmap_add(pubsub_adjmap_t *map,
@@ -343,21 +344,27 @@ lint_message(const pubsub_adjmap_t *map, message_id_t msg)
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has subscribers, but no publishers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_pub == 0 for %s\n", get_message_id_name(msg));
ok = false;
} else if (n_sub == 0) {
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has publishers, but no subscribers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_sub == 0 for %s\n", get_message_id_name(msg));
ok = false;
}
/* Check the message graph topology. */
- if (lint_message_graph(map, msg, pub, sub) < 0)
+ if (lint_message_graph(map, msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_graph failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
/* Check whether the messages have the same fields set on them. */
- if (lint_message_consistency(msg, pub, sub) < 0)
+ if (lint_message_consistency(msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_consistency failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
if (!ok) {
/* There was a problem -- let's log all the publishers and subscribers on
@@ -385,6 +392,7 @@ pubsub_adjmap_check(const pubsub_adjmap_t *map)
bool all_ok = true;
for (unsigned i = 0; i < map->n_msgs; ++i) {
if (lint_message(map, i) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message failed for %u %s\n", i, get_message_id_name((message_id_t)i));
all_ok = false;
}
}
@@ -401,11 +409,15 @@ pubsub_builder_check(pubsub_builder_t *builder)
pubsub_adjmap_t *map = pubsub_build_adjacency_map(builder->items);
int rv = -1;
- if (!map)
+ if (!map) {
+ fprintf(stderr, "SBSDEBUG: pubsub_build_adjacency_map failed\n");
goto err; // should be impossible
+ }
- if (pubsub_adjmap_check(map) < 0)
+ if (pubsub_adjmap_check(map) < 0) {
+ fprintf(stderr, "SBSDEBUG: pubsub_adjmap_check failed\n");
goto err;
+ }
rv = 0;
err:
```
5. `./autogen.sh`
6. `./configure --disable-asciidoc`
7. `make`
8. `mkdir tmp`
9. `vi tmp/main.c` where `main.c` contains
```C
#include "../src/feature/api/tor_api.h"
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
static void *threadMain(void *ptr) {
int *fdp = (int*)ptr;
(void)sleep(45 /* seconds */);
(void)close(*fdp);
free(fdp);
return NULL;
}
int main() {
for (;;) {
tor_main_configuration_t *config = tor_main_configuration_new();
if (config == NULL) {
exit(1);
}
char *argv[] = {
"tor",
"Log",
"notice stderr",
"DataDirectory",
"./x",
NULL,
};
int argc = 5;
if (tor_main_configuration_set_command_line(config, argc, argv) != 0) {
exit(2);
}
int filedesc = tor_main_configuration_setup_control_socket(config);
if (filedesc < 0) {
exit(3);
}
int *fdp = malloc(sizeof(*fdp));
if (fdp == NULL) {
exit(4);
}
*fdp = filedesc;
pthread_t thread;
if (pthread_create(&thread, NULL, threadMain, /* move */ fdp) != 0) {
exit(5);
}
(void)tor_run_main(config);
if (pthread_join(thread, NULL) != 0) {
exit(6);
}
fprintf(stderr, "********** doing another round\n");
}
}
```
10. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
11. `./a.out 2>&1|tee LOG.txt`
The `tmp/main.c` command is a reasonable approximation of what our Go code for running tor does. The main difference is that we start tor with `DisableNetwork` set and re-enable network later. This difference does not seem to have any impact, since we saw aborts in both cases.
We run repeated bootstraps because the OONI Probe Android app loads tor and the Go code as a shared library and calls `tor_run_main` each time we run a OONI experiment that requires tor (typically, `vanilla_tor` and `torsf`).
### What is the current bug behavior?
We can cluster the kind of crashes we observed into two groups.
#### pubsub_adjmap_check failed
This crash has been the most frequent one we observed. With the above patch applied, it generally looks like this:
```
[... omitting logs from several bootstraps ...]
Mar 22 14:07:21.000 [notice] Owning controller connection has closed -- exiting now.
Mar 22 14:07:21.000 [notice] Catching signal TERM, exiting cleanly.
********** doing another round
SBSDEBUG: n_sub == 0 for orconn_state
SBSDEBUG: lint_message failed for 5 orconn_state
SBSDEBUG: n_pub == 0 for orconn_state
SBSDEBUG: lint_message failed for 34 orconn_state
SBSDEBUG: pubsub_adjmap_check failed
[1] 300227 IOT instruction (core dumped) ./a.out 2>&1 |
300228 done tee LOG.txt
```
When running this via Go code, we see a different message before the abort. I think this happens because Go installs its own handler for SIGABRT, while the C code does not install any handler. My understanding is also that "IOT instruction" is related to `SIGIOT`, which seems to be an alias for `SIGABRT` judging from include/linux/signal.h and Glib's bits/signum-generic.h.
My understanding of the above logs is that, somehow, a message is registered twice: once without publishers, and once without subscribers.
It's also important to point out that the message causing failure has not always been `orconn_state`. Based on all the aborts we have examined, it seems that also `orconn_status` could cause failures. For the sake of brevity, I am not going to copy here all the logs we collected, but you can read them along with my thought process when analyzing the bug at https://github.com/ooni/probe/issues/2406.
#### INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
This specific error occurred very rarely (2-3 times). It is not clear whether this is the same issue or not, however I think it makes sense to mention it in the same issue, because it occurred when using the above code to investigate pubsub_install aborts.
```
2023/03/21 17:59:13 info tunnel: tor: exec: <internal/libtor> x/tunnel/torsf/tor [...]
BUG: subsystem btrack (at 55) could not connect to publish/subscribe system.
============================================================ T= 1679421553
INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
A subsystem couldn't be connected.
./testtorsf(dump_stack_symbols_to_error_fds+0x58)[0xe6df08]
./testtorsf(tor_raw_assertion_failed_msg_+0x97)[0xe6e8d7]
./testtorsf(subsystems_add_pubsub_upto+0x128)[0xe47df8]
./testtorsf(pubsub_install+0x29)[0xdf9c99]
./testtorsf(tor_run_main+0x8a)[0xdf9e2a]
./testtorsf(_cgo_2d785783cadf_Cfunc_tor_run_main+0x1b)[0xdf665b]
./testtorsf[0x500e04]
SIGABRT: abort
PC=0x7fa00f89aa7c m=14 sigcode=18446744073709551610
signal arrived during cgo execution
```
(Because this specific error occurred when using Go code, here you see also the output of Go `SIGABRT` handler.)
The specific assertion that fails in this case is the following:
```C
int
subsystems_add_pubsub_upto(pubsub_builder_t *builder,
int target_level)
{
for (unsigned i = 0; i < n_tor_subsystems; ++i) {
const subsys_fns_t *sys = tor_subsystems[i];
if (!sys->supported)
continue;
if (sys->level > target_level)
break;
if (! sys_status[i].initialized)
continue;
int r = 0;
if (sys->add_pubsub) {
subsys_id_t sysid = get_subsys_id(sys->name);
raw_assert(sysid != ERROR_ID);
pubsub_connector_t *connector;
connector = pubsub_connector_for_subsystem(builder, sysid);
r = sys->add_pubsub(connector);
pubsub_connector_free(connector);
}
if (r < 0) {
fprintf(stderr, "BUG: subsystem %s (at %u) could not connect to "
"publish/subscribe system.", sys->name, sys->level);
raw_assert_unreached_msg("A subsystem couldn't be connected."); // <- HERE
}
}
return 0;
}
```
### What is the expected behavior?
On a very broad level, I think tor should not abort. Because I do not understand very well what is happening, it is difficult to provide a more specific recommendation about what the code should actually do.
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
Always 0.4.7.13
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
Android (several versions and devices according to the Google Play console); Android 13 on Pixel 4a arm64 (my phone); Ubuntu 22.04.2 on amd64
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
Tor compiled along with all its dependencies using our build scripts as well as tor compiled from sources with Ubuntu 22.04.2 installation dependencies when reproducing the issue using the above mentioned steps.
### Relevant logs and/or screenshots
I think I already provided representative logs above. The https://github.com/ooni/probe/issues/2406 issue contains all the logs we produced while investigating this issue on our end. It also describes how we progressively narrowed down the problem from an abort in the Android app to an abort using Go code on Linux to the minimal instructions for reproducing the issue that I mentioned above.
On this note, I initially suspected that there was a data race on our end. That assumption was true but the abort continued to occur after I fixed the data race inside Go code. In any case, the possible presence of data races on our end prompted me to bypass our Go code and write C code that could allow reproducing the issue. In one of my final attempts at understanding the issue using just C code, I [patched tor to avoid aborting in case pubsub_install failed](https://github.com/ooni/probe/issues/2406#issuecomment-1479884981), recompiled and run with tsan enabled, [seeing just two pubsub_install failures over 490 runs and no sign of data races](https://github.com/ooni/probe/issues/2406#issuecomment-1480826748).
### Possible fixes
I don't know. Since the data-race theory is not supported by data and unlikely, perhaps it could be that state from previous runs causes issues with the pubsub subsystem that appear for repeated bootstraps? I'll be happy to collaborate and try other debugging strategies.https://gitlab.torproject.org/tpo/core/tor/-/issues/40768Decide whether to disable circuit cannibalization entirely2023-04-12T14:47:42Zgabi-250Decide whether to disable circuit cannibalization entirelyThe discussions around #40570 reignited a discussion about circuit cannibalization, and whether the performance improvements it provides (if any) justify the maintenance costs of keeping it around. This ticket is about deciding whether w...The discussions around #40570 reignited a discussion about circuit cannibalization, and whether the performance improvements it provides (if any) justify the maintenance costs of keeping it around. This ticket is about deciding whether we should disable cannibalization in c-tor.https://gitlab.torproject.org/tpo/core/tor/-/issues/40767Investigate high circuit build error rates in simulation2023-04-12T14:46:43Zgabi-250Investigate high circuit build error rates in simulationWe ran some shadow simulations to debug/repro the issue from #40570, and @jnewsome noticed the onion service clients have consistently high [circuit build failure rates](https://gitlab.torproject.org/tpo/core/tor/-/issues/40570#note_2883...We ran some shadow simulations to debug/repro the issue from #40570, and @jnewsome noticed the onion service clients have consistently high [circuit build failure rates](https://gitlab.torproject.org/tpo/core/tor/-/issues/40570#note_2883257).
We should figure out what causes these circuit build failures.https://gitlab.torproject.org/tpo/core/tor/-/issues/40761DDoS mitigation: analysis to understand relay-to-relay connections from non-r...2023-04-12T14:46:14ZbnmDDoS mitigation: analysis to understand relay-to-relay connections from non-relay IPs
We are working on a tor proposal that should help
with protecting non-guard relays from a large fraction
of the DDoS load.
In first tests we have seen a 55% CPU usage decrease
when deploying our proposed mitigations, but we
want to mak...
We are working on a tor proposal that should help
with protecting non-guard relays from a large fraction
of the DDoS load.
In first tests we have seen a 55% CPU usage decrease
when deploying our proposed mitigations, but we
want to make sure that we are not introducing an
over blocking problem. We know about a few
configurations when a relay will use a source IP that is not in
consensus to connect to other relays (OutboundBindAddress, OutboundBindAddressOR)
but we would like to have some actual data about it.
To measure, understand and solve that potential problem and to
back up the proposal with some actual data we would
like to measure the following on our tor relays:
Log when our non-guard tor relays get
an authenticated relay to relay connection to our ORPort
from a source IP that is not in consensus and not in
the exit lists:
```
timestamp relay-fingerprint source-IP
```
If the "and not in the exit lists" part is too hard,
we can take care of that in post-processing of the logs
to filter them out.
We do not care about client to relay connections and do not want to log them.
Would it be possible to provide a patch or branch
that implements that logging on top of main?
It does not have to be in a release and we will run it
only temporarily.
thank you!https://gitlab.torproject.org/tpo/core/tor/-/issues/40747android: fdsan SIGABRT tor_main_configuration_free2023-04-12T14:45:21Zsbsandroid: fdsan SIGABRT tor_main_configuration_free### Summary
When testing tor-0.4.7.13 on Android 13, I experienced a `SIGABRT` in `tor_main_configuration_free` caused by [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md). The reason why fdsan causes an a...### Summary
When testing tor-0.4.7.13 on Android 13, I experienced a `SIGABRT` in `tor_main_configuration_free` caused by [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md). The reason why fdsan causes an abort is that the owning control socket is closed twice. I noticed this crash as part of testing a release candidate of [OONI Probe Android](https://github.com/ooni/probe-android/) where we embed `libtor.a`.
The corresponding OONI Probe issue is: https://github.com/ooni/probe/issues/2405.
### Steps to reproduce
I do not have a very simple procedure to reproduce the issue that does not involve OONI Probe and its build system.
However, the underlying issue is independent of Android. The only reason why Android matters is that the fdsan notices the double close of the same file descriptor and hence triggers a crash (for Android API level >= 30).
Because of this, here are instructions to reproduce the underlying issue using GNU/Linux (I used Ubuntu 22.04):
1. clone tor
2. `git checkout tor-0.4.7.13`
3. `git apply tor.diff` where `tor.diff` is the following patch:
```diff
diff --git a/src/core/mainloop/connection.c b/src/core/mainloop/connection.c
index cf25213cb1..d690de3892 100644
--- a/src/core/mainloop/connection.c
+++ b/src/core/mainloop/connection.c
@@ -149,6 +149,8 @@
#include "core/or/congestion_control_flow.h"
+#include <stdio.h>
+
/**
* On Windows and Linux we cannot reliably bind() a socket to an
* address and port if: 1) There's already a socket bound to wildcard
@@ -949,6 +951,7 @@ connection_free_minimal(connection_t *conn)
if (SOCKET_OK(conn->s)) {
log_debug(LD_NET,"closing fd %d.",(int)conn->s);
+ fprintf(stderr, "SBSDEBUG: connection_free_minimal %lld\n", (long long)conn->s);
tor_close_socket(conn->s);
conn->s = TOR_INVALID_SOCKET;
}
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..fb49d92ad7 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -116,6 +116,11 @@ tor_main_configuration_setup_control_socket(tor_main_configuration_t *cfg)
cfg_add_owned_arg(cfg, "__OwningControllerFD");
cfg_add_owned_arg(cfg, buf);
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_setup_control_socket %lld %lld\n",
+ (long long)fds[0], (long long)fds[1]
+ );
+
cfg->owning_controller_socket = fds[1];
return fds[0];
}
@@ -132,6 +137,10 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
raw_free(cfg->argv_owned);
}
if (SOCKET_OK(cfg->owning_controller_socket)) {
+ fprintf(
+ stderr, "SBSDEBUG: tor_main_configuration_free %lld\n",
+ (long long)cfg->owning_controller_socket
+ );
raw_closesocket(cfg->owning_controller_socket);
}
raw_free(cfg);
```
4. `./autogen.sh`
5. `./configure --disable-asciidoc`
6. `make`
7. `mkdir tmp`
8. `vi tmp/main.c` making sure it contains the following content:
```C
#include "../src/feature/api/tor_api.h"
#include <stdlib.h>
#include <unistd.h>
int main() {
tor_main_configuration_t *cfg = tor_main_configuration_new();
if (cfg == NULL) {
exit(1);
}
tor_control_socket_t sock = tor_main_configuration_setup_control_socket(cfg);
if (sock == INVALID_TOR_CONTROL_SOCKET) {
exit(2);
}
(void)close(sock); // close immediately (it's async on Android but it should not matter AFAICT)
(void)tor_run_main(cfg);
tor_main_configuration_free(cfg);
}
```
9. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
10. `./a.out` which should produce this output:
```
SBSDEBUG: tor_main_configuration_setup_control_socket 4 5
Feb 02 17:18:07.330 [notice] Tor 0.4.7.13 (git-7c1601fb6edd780f) running on Linux with Libevent 2.1.12-stable, OpenSSL 3.0.2, Zlib 1.2.11, Liblzma N/A, Libzstd N/A and Glibc 2.35 as libc.
Feb 02 17:18:07.330 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://support.torproject.org/faq/staying-anonymous/
Feb 02 17:18:07.330 [notice] Configuration file "/usr/local/etc/tor/torrc" not present, using reasonable defaults.
Feb 02 17:18:07.331 [notice] Opening Socks listener on 127.0.0.1:9050
Feb 02 17:18:07.331 [notice] Opened Socks listener connection (ready) on 127.0.0.1:9050
Feb 02 17:18:07.000 [notice] Bootstrapped 0% (starting): Starting
Feb 02 17:18:07.000 [notice] Starting with guard context "default"
Feb 02 17:18:07.000 [notice] Owning controller connection has closed -- exiting now.
SBSDEBUG: connection_free_minimal 5
Feb 02 17:18:07.000 [notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 8
SBSDEBUG: tor_main_configuration_free 5
```
If you analyze the above output, you would see that the file descriptor `5` is closed twice. This output is almost identical to the output that I have seen in the Android logcat (more on that below). Also, I _think_ the way in which I am using the embedding API above (which mirrors our more complex implementation written in Go) is fine; if not, please educate me.
(If you want to reproduce the same problem I experience on Android, I can either explain how to compile and test OONI Probe for Android, or I can try to work on creating a simple Android PoC like the one above.)
### What is the current bug behavior?
We're in an embedding scenario where we eventually call `tor_run_main`, as mentioned in the previous section.
This is the sequence of APIs we call along with my best understanding of what happens inside `tor`:
We create a configuration using `tor_main_configuration_new`.
Calling `tor_main_configuration_setup_control_socket` creates a pair of sockets, returns `fds[0]` to us, and retains `fds[1]` inside `tor_main_configuration_t::owning_controller_socket`.
Calling `tor_run_main` calls (in a way that is not 100% clear to me) `options_act` that passes the `owning_controller_socket` to `control_connection_add_local_fd`. In turn, this function registers the file descriptor `fds[1]` as the control connection.
Eventually we `close` the `fds[0]` that was returned to us, which causes `tor` to stop its libevent loop.
When `tor_run_main` terminates, it calls `tor_cleanup`, which calls `tor_free_all`, which calls `connection_free_all`, which calls `connection_free_minimal` for each connection, including the owning file descriptor `fds[1]`.
After `tor_run_main`, we call `tor_main_configuration_free`. In turn, this function calls `raw_closesocket` on the `owning_controller_socket`, which is hence closed for the second time.
On Android with API level >= 30, the [fdsan](https://android.googlesource.com/platform/bionic/+/master/docs/fdsan.md) sanitizer notices the second close and _sometimes_ (roughly 50%) this fact causes the app to abort.
### What is the expected behavior?
I think tor should duplicate the file descriptor before registering it into the core event loop such that there is a single owner of each of the two duplicates. The `tor_main_configuration_free` function owns one of them and the core event loop owns the other one. This semantics shouldn't cause any issue with the fdsan sanitizer because it's designed to enforce it.
Alternatively, it should probably be documented to use API level < 30 (where the fdsan only warns). Or it should be documented that one should use the proper fdsan API to disable crashing on double close. (I do not remember seeing these warnings when I read how to use the embedding API and a quick `git grep fdsan` or `git grep "API level"` did not return anything, but it's still possible that I overlooked _some_ documentation about this issue.)
### Environment
- Which version of Tor are you using?
tor-0.4.7.13
- Which operating system are you using?
Android 13 (but I also provided a minimal example on GNU/Linux)
- Which installation method did you use?
We cross compile tor for Android using [our cross compilation scripts](https://github.com/ooni/probe-cli/tree/v3.17.0-alpha.1/internal/cmd/buildtool).
We obtain a static set of libraries and a `tor_api.h` that we link as part of building an AAR with [go mobile](https://github.com/golang/mobile).
We use the obtained AAR as a dependency for [OONI Probe Android](https://github.com/ooni/probe-android/).
The code that specifically invokes tor [is written in Go](https://github.com/ooni/probe-android/). The sequence of events in terms of the Tor embedding API is the one I described above in the "what is the current bug behavior?" section.
However, I have also provided a minimal example for GNU/Linux that shows the double-close issue.
### Relevant logs and/or screenshots
The following is an excerpt from the tombstone generated by the crashing app:
```
[notice] Catching signal TERM, exiting cleanly.
fdsan: attempted to close file descriptor 104, expected to be unowned, \
actually owned by unique_fd 0x70b7e5f19c
[...]
ABI: 'arm64'
Timestamp: 2023-02-02 11:39:31.181519664+0100
Cmdline: org.openobservatory.ooniprobe.experimental
pid: 16472, tid: 16593, name: AsyncTask #1 >>> org.openobservatory.ooniprobe.experimental <<<
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
[...]
backtrace:
#00 pc 0000000000055c48 .../lib64/bionic/libc.so (fdsan_error(char const*, ...)+556) (...)
#01 pc 0000000000055954 .../lib64/bionic/libc.so (android_fdsan_close_with_tag+732) (...)
#02 pc 00000000000560a8 .../lib64/bionic/libc.so (close+16) (...)
#03 pc 00000000012ad08c [...]/lib/arm64/libgojni.so (tor_main_configuration_free+128)
```
If I apply the patch that above I called `tor.diff` and run OONI Probe on Android, I see this in the logcat:
```
SBSDEBUG: tor_main_configuration_setup_control_socket 94 98 // <- fds[0] and fds[1]
[...]
[notice] Catching signal TERM, exiting cleanly.
SBSDEBUG: connection_free_minimal 141
SBSDEBUG: connection_free_minimal 98 // <- first close of fds[1]
SBSDEBUG: connection_free_minimal 116
SBSDEBUG: connection_free_minimal 152
SBSDEBUG: tor_main_configuration_free 98 // <- second close of fds[1]
```
### Possible fixes
The following patch makes the app work as intended (i.e., no crashes for several runs):
```diff
diff --git a/src/feature/api/tor_api.c b/src/feature/api/tor_api.c
index 88e91ebfd5..2773949264 100644
--- a/src/feature/api/tor_api.c
+++ b/src/feature/api/tor_api.c
@@ -131,9 +131,13 @@ tor_main_configuration_free(tor_main_configuration_t *cfg)
}
raw_free(cfg->argv_owned);
}
+ /* See https://github.com/ooni/probe/issues/2405 to understand
+ why we're not closing the controller socker here. */
+ /*
if (SOCKET_OK(cfg->owning_controller_socket)) {
raw_closesocket(cfg->owning_controller_socket);
}
+ */
raw_free(cfg);
}
```
That said, I think this patch is wrong because it leaks the file descriptor when `tor_run_main` returns prematurely (e.g., when the command line flags are wrong). Because of this, I think the more robust fix would be to duplicate the file descriptor before registering it into the libevent loop, as I explained above.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.org