The Tor Project issueshttps://gitlab.torproject.org/groups/tpo/-/issues2024-01-16T13:49:10Zhttps://gitlab.torproject.org/tpo/network-health/metrics/descriptorParser/-/issues/29Naming conventions between tor specs and metrics-library2024-01-16T13:49:10ZHiroNaming conventions between tor specs and metrics-libraryThere are some differences on how parsed fields are named in the tor specs documents and in metrics-library. @gk pointed out that users of our DB would have to somehow remember that what they find in the specs might be called differently...There are some differences on how parsed fields are named in the tor specs documents and in metrics-library. @gk pointed out that users of our DB would have to somehow remember that what they find in the specs might be called differently in our database. Ideally we should try to use the same field name as this is defined in the specs. See: https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40016#note_2901568HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/descriptorParser/-/issues/26Parse snowflake stats2024-01-16T13:49:10ZHiroParse snowflake statsIn https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40016#note_2900641 @gk mentioned how we should parse snowflake stats from the document we archive in collector.
At the same time in costa rica we have discus...In https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40016#note_2900641 @gk mentioned how we should parse snowflake stats from the document we archive in collector.
At the same time in costa rica we have discussed with @cohosh and @meskio about how the anti-censorship team is experimenting with different kind of metrics in prometheus. For example you could check the amount of available snowflake proxies in https://prometheus2.torproject.org/classic/graph?g0.range_input=1h&g0.expr=snowflake_available_proxies&g0.tab=1
I think we should find out which metrics we would like to send to the DB and which we would like to have in victoria metrics.
The difference between the two sources is that while victoria metris is intended for time series, the DB should store data that can be considered like a document, maybe that can be extracted and produce an artifact like the ones we are currently serving and archiving in collector.HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/descriptorParser/-/issues/24How to process fields that have changed its meaning over time.2024-01-16T13:49:10ZHiroHow to process fields that have changed its meaning over time.In https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40016#note_2900634 @gk mentioned that the bridge network status r line has a publication field that has changed over time. I.e. in the [specs](https://gitlab....In https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40016#note_2900634 @gk mentioned that the bridge network status r line has a publication field that has changed over time. I.e. in the [specs](https://gitlab.torproject.org/tpo/core/torspec/-/blob/142dda7257318e6924ecda26d1a0e37561c2f225/dir-spec.txt#L2311) it is mentioned that:
```
"Publication" was once the publication time of the router's most
recent descriptor, in the form YYYY-MM-DD HH:MM:SS, in UTC. Now
it is only used in votes, and may be set to a fixed value in
consensus documents.
```
We are currently parsing the field as it appears in the documents, but there might be other fields around other documents that we are not parsing correctly and might need a case by case handling, especially when we will parse old document from our archives.HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/descriptorParser/-/issues/22Process bridgedb metrics2024-01-16T13:49:09ZHiroProcess bridgedb metricsIn https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40016#note_2899995 @gk mentioned how we should map bridgedb metrics in the new database.
Talking with @meskio and @cohosh during the tor meeting we mentioned...In https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40016#note_2899995 @gk mentioned how we should map bridgedb metrics in the new database.
Talking with @meskio and @cohosh during the tor meeting we mentioned there was the possibility to export bridgedb metrics directly into the DB instead of generating a document that has to be stored by collector.
This makes sense especially if we plan to map all our past data into the DB and have a service that can generate documents based on specific queries instead of an archive with tarballs.
The current bridgedb tables are defined in https://gitlab.torproject.org/tpo/network-health/metrics/descriptorParser/-/blob/main/src/main/sql/bridgedb_metrics_tables.sql
As @gk pointed out though, we are missing some fields that we are not sure how to map into columns. At the same time, maybe there are other metrics we are not collecting or we could process in a way that makes more sense with what we want to track.
Besides collecting bridgedb metrics in sql, we could also consider which fields we could send to victoria metrics instead.HiroHirohttps://gitlab.torproject.org/tpo/core/onionmasq/-/issues/46maybe compile out logging with Cargo features?2023-05-15T16:43:11Zetamaybe compile out logging with Cargo features?The following discussion from !75 should be addressed:
- [ ] @trinity-1686a started a [discussion](https://gitlab.torproject.org/tpo/core/onionmasq/-/merge_requests/75#note_2900111): (+1 comment)
> maybe we could leverage `tracing...The following discussion from !75 should be addressed:
- [ ] @trinity-1686a started a [discussion](https://gitlab.torproject.org/tpo/core/onionmasq/-/merge_requests/75#note_2900111): (+1 comment)
> maybe we could leverage `tracing` [feature flags](https://docs.rs/tracing/latest/tracing/level_filters/index.html#compile-time-filters) to totally remove any dangerous and possibly costly log level from the resulting binary on `--release` builds?https://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/41758ESR115: android: don't allow PDFs to be opened by 3rd party apps2023-08-26T05:12:29ZThorinESR115: android: don't allow PDFs to be opened by 3rd party appshttps://bugzilla.mozilla.org/show_bug.cgi?id=1829372 - landed in FF114https://bugzilla.mozilla.org/show_bug.cgi?id=1829372 - landed in FF114https://gitlab.torproject.org/tpo/core/onionmasq/-/issues/45Publish onionmasq for maven2023-07-03T13:45:10Zmicahmicah@torproject.orgPublish onionmasq for maven@cyberta mentioned that having onionmasq published in MavenCentral would be good, this would let us avoid putting the .aar in the tpo/applications/vpn repository, and make it available for other people to use.
Additionally, we can also ...@cyberta mentioned that having onionmasq published in MavenCentral would be good, this would let us avoid putting the .aar in the tpo/applications/vpn repository, and make it available for other people to use.
Additionally, we can also publish maven bits into the gitlab package repository here (see #50) for that.
Maybe we should do both? The Maven Central for the world, and the gitlab package repository for our builds, for self-hosting reasons? I'm not too familiar with maven in general, so any thoughts, suggestions here would be greatly appreciated.https://gitlab.torproject.org/tpo/community/team/-/issues/90Go over Limerick core contributor process notes and salvage what we still fin...2023-08-10T13:48:55ZGeorg KoppenGo over Limerick core contributor process notes and salvage what we still find usefulWe had a [session at our Limerick meeting](https://gitlab.torproject.org/tpo/team/-/wikis/202209MeetingCoreContributors) talking about how to improve our core contributor process. We should go over those notes and salvage whatever we sti...We had a [session at our Limerick meeting](https://gitlab.torproject.org/tpo/team/-/wikis/202209MeetingCoreContributors) talking about how to improve our core contributor process. We should go over those notes and salvage whatever we still find useful and potentially integrate that in our [notes from the Costa Rica session](https://gitlab.torproject.org/tpo/team/-/wikis/202304ImproveCoreContributorProcess).
/cc @armaGusGushttps://gitlab.torproject.org/tpo/team/-/issues/157Create a proposal and process to deal with emergencies2023-07-14T17:21:35ZGeorg KoppenCreate a proposal and process to deal with emergenciesFrom time to time we hit emergencies which lead those of us affected by the, to drop their regular work and somehow try to cope with the current situation. It would be useful to have a proactive approach with a set of guidelines and rule...From time to time we hit emergencies which lead those of us affected by the, to drop their regular work and somehow try to cope with the current situation. It would be useful to have a proactive approach with a set of guidelines and rules to deal with that. That could be specified in a policy which would include as well some thoughts about what counts as an emergency, who is responsible for dealing with them (or a process of how to determine who is managing the emergency) etc.GusGushttps://gitlab.torproject.org/tpo/applications/mullvad-browser/-/issues/173Hide "Open previous windows and tabs" in PBM2023-06-13T14:32:26ZruihildtHide "Open previous windows and tabs" in PBMIn PBM, in the Settings, there's a startup section with `Open previous windows and tabs` greyed out.
Can we hide this when PBM is enabled? Maybe even the whole section when on an OS that doesn't support setting MB as default?
![startup...In PBM, in the Settings, there's a startup section with `Open previous windows and tabs` greyed out.
Can we hide this when PBM is enabled? Maybe even the whole section when on an OS that doesn't support setting MB as default?
![startup](/uploads/fd365dae022af7a1a8760987db868476/startup.png)https://gitlab.torproject.org/tpo/applications/mullvad-browser/-/issues/170Does Mullvad Browser actually need a custom $HOME on Linux?2024-03-05T17:02:13ZPier Angelo VendrameDoes Mullvad Browser actually need a custom $HOME on Linux?Our `start-$name-browser` script customizes the home directory path.
I think it might not be very good from a UX point of view for MB users (maybe we could stop that for TB, too).
Also, it could be something to fix for the system-wide ...Our `start-$name-browser` script customizes the home directory path.
I think it might not be very good from a UX point of view for MB users (maybe we could stop that for TB, too).
Also, it could be something to fix for the system-wide install.https://gitlab.torproject.org/tpo/anti-censorship/connectivity-measurement/probeobserver/-/issues/2test meek-azure connectivity2024-02-27T19:08:20Zmeskiomeskio@torproject.orgtest meek-azure connectivityshelikhooshelikhoohttps://gitlab.torproject.org/tpo/core/arti/-/issues/811Rename/rethink AttemptId in DirMgr2023-10-10T16:14:31ZNick MathewsonRename/rethink AttemptId in DirMgrFor #803 I want to log `AttemptId` values to help trace which directory success/failure is which.
But maybe we need a better name. Currently, `AttemptId` is an identifier for a long series of attempts to do different things, starting f...For #803 I want to log `AttemptId` values to help trace which directory success/failure is which.
But maybe we need a better name. Currently, `AttemptId` is an identifier for a long series of attempts to do different things, starting from zero. From the docs:
```
/// Identifier for an attempt to bootstrap a directory.
///
/// Every time that we decide to download a new directory, _despite already
/// having one_, counts as a new attempt.
///
/// These are used to track the progress of each attempt independently.
```
So maybe this needs a different name than "attempt".https://gitlab.torproject.org/tpo/community/l10n/-/issues/40109Get RT articles for translators to easily translate2023-09-27T17:59:49ZGabagaba@torproject.orgGet RT articles for translators to easily translateWe have a bunch of articles in RT for user support that we need to translate into farsi and other languages: https://rt.torproject.org/Articles/Article/Search.html?Class=10&Parent=0&HideOptions=1
- [ ] Get all user support articles from...We have a bunch of articles in RT for user support that we need to translate into farsi and other languages: https://rt.torproject.org/Articles/Article/Search.html?Class=10&Parent=0&HideOptions=1
- [ ] Get all user support articles from RT into weblate so we can easily translate them.
- [ ] Find a way to easily update RT with the translations.
@gus Could we say that all the support articles that needs translations are around 1800 words? I saw that there are around 55 articles and with an average of 30 words per articles, that would be how much we could translate.emmapeelemmapeelhttps://gitlab.torproject.org/tpo/applications/mullvad-browser/-/issues/160Disable the cookie exceptions button in Private Browsing Mode2023-08-22T20:05:15ZruihildtDisable the cookie exceptions button in Private Browsing ModeCurrently, there's a warning in a grey box, which is very easy to miss (and is indeed missed by a lot of people).
Couldn't we just hide controls what is not relevant in PBM?
![image](/uploads/4fd1e374de35886be3d3e07d4ff05a48/image.png)...Currently, there's a warning in a grey box, which is very easy to miss (and is indeed missed by a lot of people).
Couldn't we just hide controls what is not relevant in PBM?
![image](/uploads/4fd1e374de35886be3d3e07d4ff05a48/image.png)
There's also a [related issue](https://github.com/mullvad/mullvad-browser/issues/29) in Mullvad's Github.https://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/41716Figure out how to display conflux circuits in Tor Browser's UI2024-01-29T19:26:13Zmicahmicah@torproject.orgFigure out how to display conflux circuits in Tor Browser's UINow that circuit display was literally just re-implemented...
When 0.4.8 becomes stabilized (this will take a few months still), conflux will come to the network. Conflux will open a new world with tor circuits: circuits are no longer n...Now that circuit display was literally just re-implemented...
When 0.4.8 becomes stabilized (this will take a few months still), conflux will come to the network. Conflux will open a new world with tor circuits: circuits are no longer necessarily static, now they can be dynamic, the TCP circuits can change paths, and can have multiple paths, and doing so can bring some nice improvements for people.
We need to start thinking about how we want to communicate to the user in TB. Specifically that the circuit(s) are conflux circuits. Users knowing that they are using a conflux circuit will provide valuable information to them (and their feedback to us as well).
Because the circuits can now change paths, this could have a lot of UI visualization implications, such as reflecting that there are these paths that are now ready, or that there is a new circuit leg available. That would be the nice (but complicated) signal to users, but starting with something simple by just indicating in the UI that this is a conflux circuit (eg. literally just writing the word 'conflux' in green or something) would be a good first signal step. So there is a UX question here.
For the dev side of things, right now its possible (in 0.4.8) to get from the control port that a circuit is a conflux circuit, but it doesn't have any advanced information that would be useful for multi-leg displays, but if we want to show those types of things, we will need to talk about what would be needed to be added to tor to extract that information.
perhaps we can have a ad-hoc session at our in-person meeting to brief folks on what conflux means and its benefits.https://gitlab.torproject.org/tpo/web/support/-/issues/326Explain why/how metrics is reporting that a bridge was blocked in a country2023-11-13T10:21:05ZGusExplain why/how metrics is reporting that a bridge was blocked in a countrySome operators were confused with "blocked in" field in Metrics portal about their bridges (see this thread for example - https://forum.torproject.net/t/bridge-blocklist-ru/2989/3?u=gus). It would be nice to have a support entry explaini...Some operators were confused with "blocked in" field in Metrics portal about their bridges (see this thread for example - https://forum.torproject.net/t/bridge-blocklist-ru/2989/3?u=gus). It would be nice to have a support entry explaining what it means.GusGushttps://gitlab.torproject.org/tpo/applications/mullvad-browser/-/issues/147On macOS, the launcher uses the stable icon even when alpha2023-10-16T21:19:18ZruihildtOn macOS, the launcher uses the stable icon even when alphaWhen installing the browser on macOS, the icon in the Applications folder and in the browser are correctly using the green alpha version.
However, the launcher icon in the bar is using the stable yellow version.When installing the browser on macOS, the icon in the Applications folder and in the browser are correctly using the green alpha version.
However, the launcher icon in the bar is using the stable yellow version.https://gitlab.torproject.org/tpo/core/arti/-/issues/796Want various convenience methods on netdoc argument encoder etc.2023-10-10T16:14:31ZIan Jacksoniwj@torproject.orgWant various convenience methods on netdoc argument encoder etc.I think we want at least
```
impl ItemEncoder {
pub fn args(self, impl Iterator<Item=&dyn ItemArgument>) -> Self;
pub fn arg_base64(self, t: impl Writeable) -> Result<Self, tor_bytes::Error>;
}
// and maybe
impl Extend<impl Item...I think we want at least
```
impl ItemEncoder {
pub fn args(self, impl Iterator<Item=&dyn ItemArgument>) -> Self;
pub fn arg_base64(self, t: impl Writeable) -> Result<Self, tor_bytes::Error>;
}
// and maybe
impl Extend<impl ItemArgument> for ItemEncoder {..}
```
I also think we want something like
```
/// Helper that encodes a list of `T` as a count (of type `N`) followed by the encodings of the items, concatenated
pub struct tor_bytes::CountedList<N, T>(T, PhantomData<N>);
impl Writeable for CountedList<N, T> where T: ExactSizeIterator<impl Writeable> {..}
```
Prompted by reading arti!1070Ian Jacksoniwj@torproject.orgIan Jacksoniwj@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40774libtor.a: pubsub_install tor_raw_abort2024-03-20T17:17:22Zsbslibtor.a: pubsub_install tor_raw_abort### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating)...### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating), this issue occurred 526 times in the last 28 days and was one of the main sources of crashes for the OONI Probe Android app.
A typical stack trace obtained from the Google Play console looks like this:
```
backtrace:
#00 pc 0x0000000000089b0c .../lib64/bionic/libc.so (abort+164)
#01 pc 0x00000000013778a4 .../split_config.arm64_v8a.apk (tor_raw_abort_+12)
#02 pc 0x0000000001382150 .../split_config.arm64_v8a.apk (tor_abort_+12)
#03 pc 0x00000000012470a0 .../split_config.arm64_v8a.apk (pubsub_install+120)
#04 pc 0x0000000001247170 .../split_config.arm64_v8a.apk (tor_run_main+136)
```
We investigated this issue and manage to reproduce it initially on OONI Probe Android, then in Linux using our Go code for managing libtor.a, and finally with a pure C test case working under Linux. During this investigating we have never seen the first bootstrap failing. Rather, in some cases it took > 30 repeated bootstraps to observe the abort; in other cases, it occurred within the first 3-10 bootstraps.
I searched in the issue tracker for "pubsub", "pubsub_install", "SIGABRT", and "abort". AFAICT, there is no other open issue discussing this problem, however, I think https://gitlab.torproject.org/tpo/core/tor/-/issues/32729 may be related and ~similar.
### Steps to reproduce:
The following steps allowed me to reproduce the problem on Ubuntu 22.04.2:
1. `git clone https://gitlab.torproject.org/tpo/core/tor`
2. `cd tor`
3. `git checkout tor-0.4.7.13`
4. `git apply 004.diff` where `004.diff` is
```diff
diff --git a/src/lib/pubsub/pubsub_check.c b/src/lib/pubsub/pubsub_check.c
index 99e604d715..a5cc4b7658 100644
--- a/src/lib/pubsub/pubsub_check.c
+++ b/src/lib/pubsub/pubsub_check.c
@@ -25,6 +25,7 @@
#include "lib/malloc/malloc.h"
#include "lib/string/compat_string.h"
+#include <stdio.h>
#include <string.h>
static void pubsub_adjmap_add(pubsub_adjmap_t *map,
@@ -343,21 +344,27 @@ lint_message(const pubsub_adjmap_t *map, message_id_t msg)
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has subscribers, but no publishers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_pub == 0 for %s\n", get_message_id_name(msg));
ok = false;
} else if (n_sub == 0) {
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has publishers, but no subscribers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_sub == 0 for %s\n", get_message_id_name(msg));
ok = false;
}
/* Check the message graph topology. */
- if (lint_message_graph(map, msg, pub, sub) < 0)
+ if (lint_message_graph(map, msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_graph failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
/* Check whether the messages have the same fields set on them. */
- if (lint_message_consistency(msg, pub, sub) < 0)
+ if (lint_message_consistency(msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_consistency failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
if (!ok) {
/* There was a problem -- let's log all the publishers and subscribers on
@@ -385,6 +392,7 @@ pubsub_adjmap_check(const pubsub_adjmap_t *map)
bool all_ok = true;
for (unsigned i = 0; i < map->n_msgs; ++i) {
if (lint_message(map, i) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message failed for %u %s\n", i, get_message_id_name((message_id_t)i));
all_ok = false;
}
}
@@ -401,11 +409,15 @@ pubsub_builder_check(pubsub_builder_t *builder)
pubsub_adjmap_t *map = pubsub_build_adjacency_map(builder->items);
int rv = -1;
- if (!map)
+ if (!map) {
+ fprintf(stderr, "SBSDEBUG: pubsub_build_adjacency_map failed\n");
goto err; // should be impossible
+ }
- if (pubsub_adjmap_check(map) < 0)
+ if (pubsub_adjmap_check(map) < 0) {
+ fprintf(stderr, "SBSDEBUG: pubsub_adjmap_check failed\n");
goto err;
+ }
rv = 0;
err:
```
5. `./autogen.sh`
6. `./configure --disable-asciidoc`
7. `make`
8. `mkdir tmp`
9. `vi tmp/main.c` where `main.c` contains
```C
#include "../src/feature/api/tor_api.h"
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
static void *threadMain(void *ptr) {
int *fdp = (int*)ptr;
(void)sleep(45 /* seconds */);
(void)close(*fdp);
free(fdp);
return NULL;
}
int main() {
for (;;) {
tor_main_configuration_t *config = tor_main_configuration_new();
if (config == NULL) {
exit(1);
}
char *argv[] = {
"tor",
"Log",
"notice stderr",
"DataDirectory",
"./x",
NULL,
};
int argc = 5;
if (tor_main_configuration_set_command_line(config, argc, argv) != 0) {
exit(2);
}
int filedesc = tor_main_configuration_setup_control_socket(config);
if (filedesc < 0) {
exit(3);
}
int *fdp = malloc(sizeof(*fdp));
if (fdp == NULL) {
exit(4);
}
*fdp = filedesc;
pthread_t thread;
if (pthread_create(&thread, NULL, threadMain, /* move */ fdp) != 0) {
exit(5);
}
(void)tor_run_main(config);
if (pthread_join(thread, NULL) != 0) {
exit(6);
}
fprintf(stderr, "********** doing another round\n");
}
}
```
10. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
11. `./a.out 2>&1|tee LOG.txt`
The `tmp/main.c` command is a reasonable approximation of what our Go code for running tor does. The main difference is that we start tor with `DisableNetwork` set and re-enable network later. This difference does not seem to have any impact, since we saw aborts in both cases.
We run repeated bootstraps because the OONI Probe Android app loads tor and the Go code as a shared library and calls `tor_run_main` each time we run a OONI experiment that requires tor (typically, `vanilla_tor` and `torsf`).
### What is the current bug behavior?
We can cluster the kind of crashes we observed into two groups.
#### pubsub_adjmap_check failed
This crash has been the most frequent one we observed. With the above patch applied, it generally looks like this:
```
[... omitting logs from several bootstraps ...]
Mar 22 14:07:21.000 [notice] Owning controller connection has closed -- exiting now.
Mar 22 14:07:21.000 [notice] Catching signal TERM, exiting cleanly.
********** doing another round
SBSDEBUG: n_sub == 0 for orconn_state
SBSDEBUG: lint_message failed for 5 orconn_state
SBSDEBUG: n_pub == 0 for orconn_state
SBSDEBUG: lint_message failed for 34 orconn_state
SBSDEBUG: pubsub_adjmap_check failed
[1] 300227 IOT instruction (core dumped) ./a.out 2>&1 |
300228 done tee LOG.txt
```
When running this via Go code, we see a different message before the abort. I think this happens because Go installs its own handler for SIGABRT, while the C code does not install any handler. My understanding is also that "IOT instruction" is related to `SIGIOT`, which seems to be an alias for `SIGABRT` judging from include/linux/signal.h and Glib's bits/signum-generic.h.
My understanding of the above logs is that, somehow, a message is registered twice: once without publishers, and once without subscribers.
It's also important to point out that the message causing failure has not always been `orconn_state`. Based on all the aborts we have examined, it seems that also `orconn_status` could cause failures. For the sake of brevity, I am not going to copy here all the logs we collected, but you can read them along with my thought process when analyzing the bug at https://github.com/ooni/probe/issues/2406.
#### INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
This specific error occurred very rarely (2-3 times). It is not clear whether this is the same issue or not, however I think it makes sense to mention it in the same issue, because it occurred when using the above code to investigate pubsub_install aborts.
```
2023/03/21 17:59:13 info tunnel: tor: exec: <internal/libtor> x/tunnel/torsf/tor [...]
BUG: subsystem btrack (at 55) could not connect to publish/subscribe system.
============================================================ T= 1679421553
INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
A subsystem couldn't be connected.
./testtorsf(dump_stack_symbols_to_error_fds+0x58)[0xe6df08]
./testtorsf(tor_raw_assertion_failed_msg_+0x97)[0xe6e8d7]
./testtorsf(subsystems_add_pubsub_upto+0x128)[0xe47df8]
./testtorsf(pubsub_install+0x29)[0xdf9c99]
./testtorsf(tor_run_main+0x8a)[0xdf9e2a]
./testtorsf(_cgo_2d785783cadf_Cfunc_tor_run_main+0x1b)[0xdf665b]
./testtorsf[0x500e04]
SIGABRT: abort
PC=0x7fa00f89aa7c m=14 sigcode=18446744073709551610
signal arrived during cgo execution
```
(Because this specific error occurred when using Go code, here you see also the output of Go `SIGABRT` handler.)
The specific assertion that fails in this case is the following:
```C
int
subsystems_add_pubsub_upto(pubsub_builder_t *builder,
int target_level)
{
for (unsigned i = 0; i < n_tor_subsystems; ++i) {
const subsys_fns_t *sys = tor_subsystems[i];
if (!sys->supported)
continue;
if (sys->level > target_level)
break;
if (! sys_status[i].initialized)
continue;
int r = 0;
if (sys->add_pubsub) {
subsys_id_t sysid = get_subsys_id(sys->name);
raw_assert(sysid != ERROR_ID);
pubsub_connector_t *connector;
connector = pubsub_connector_for_subsystem(builder, sysid);
r = sys->add_pubsub(connector);
pubsub_connector_free(connector);
}
if (r < 0) {
fprintf(stderr, "BUG: subsystem %s (at %u) could not connect to "
"publish/subscribe system.", sys->name, sys->level);
raw_assert_unreached_msg("A subsystem couldn't be connected."); // <- HERE
}
}
return 0;
}
```
### What is the expected behavior?
On a very broad level, I think tor should not abort. Because I do not understand very well what is happening, it is difficult to provide a more specific recommendation about what the code should actually do.
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
Always 0.4.7.13
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
Android (several versions and devices according to the Google Play console); Android 13 on Pixel 4a arm64 (my phone); Ubuntu 22.04.2 on amd64
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
Tor compiled along with all its dependencies using our build scripts as well as tor compiled from sources with Ubuntu 22.04.2 installation dependencies when reproducing the issue using the above mentioned steps.
### Relevant logs and/or screenshots
I think I already provided representative logs above. The https://github.com/ooni/probe/issues/2406 issue contains all the logs we produced while investigating this issue on our end. It also describes how we progressively narrowed down the problem from an abort in the Android app to an abort using Go code on Linux to the minimal instructions for reproducing the issue that I mentioned above.
On this note, I initially suspected that there was a data race on our end. That assumption was true but the abort continued to occur after I fixed the data race inside Go code. In any case, the possible presence of data races on our end prompted me to bypass our Go code and write C code that could allow reproducing the issue. In one of my final attempts at understanding the issue using just C code, I [patched tor to avoid aborting in case pubsub_install failed](https://github.com/ooni/probe/issues/2406#issuecomment-1479884981), recompiled and run with tsan enabled, [seeing just two pubsub_install failures over 490 runs and no sign of data races](https://github.com/ooni/probe/issues/2406#issuecomment-1480826748).
### Possible fixes
I don't know. Since the data-race theory is not supported by data and unlikely, perhaps it could be that state from previous runs causes issues with the pubsub subsystem that appear for repeated bootstraps? I'll be happy to collaborate and try other debugging strategies.