The Tor Project issueshttps://gitlab.torproject.org/groups/tpo/-/issues2024-03-21T07:58:03Zhttps://gitlab.torproject.org/tpo/applications/tor-browser-build/-/issues/40553Move to different entitlements files for parent and child processes2024-03-21T07:58:03ZGeorg KoppenMove to different entitlements files for parent and child processesMozilla started to provide/use different entitlements files for parent and child processes to be able to provide a finer-grained ruleset for the hardening depending on process type:
https://bugzilla.mozilla.org/show_bug.cgi?id=1593071
h...Mozilla started to provide/use different entitlements files for parent and child processes to be able to provide a finer-grained ruleset for the hardening depending on process type:
https://bugzilla.mozilla.org/show_bug.cgi?id=1593071
https://bugzilla.mozilla.org/show_bug.cgi?id=1593072
We should do the same for Tor Browser.https://gitlab.torproject.org/tpo/network-health/onbasca/-/issues/128Change initial upload data to be large and cut it off after X bytes, in a sim...2024-03-21T07:36:19ZjugaChange initial upload data to be large and cut it off after X bytes, in a similar way as download size is calculated when downloading.Equivalent to sbws#40142.Equivalent to sbws#40142.onbasca: 1.1https://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/42469newwin: bookmarks toolbar doesn't elegantly handle all app languages2024-03-20T23:39:15ZThorinnewwin: bookmarks toolbar doesn't elegantly handle all app languagesso for those without letterboxing for example, the sizes can be off if not en-US. On my devicePixelRatio 1 desktop, in a linux VM, when starting in `fa` (as an example) the inner window is 10pixels too tall
---
this was all on Linux
di...so for those without letterboxing for example, the sizes can be off if not en-US. On my devicePixelRatio 1 desktop, in a linux VM, when starting in `fa` (as an example) the inner window is 10pixels too tall
---
this was all on Linux
digging a little deeper
- when toolbar is never shown everything is peachy
- both first `newwin` and `menu>new window`
- when toolbar is set to always show, some app languages are approx 10px off (may vary per machine)
- I found `ar`,`fa`,`ga-IE`,`ko`,`th`, zh-CN` and `zh-TW` to be affected
- I wasn't specifically testing for this and required letterboxing to be on so I may have missed some or ones that are even 1px out
- both first `newwin` and `menu>new window`
when the toolbar is set to show only on new tabs (default) things get messy (also see #42192)
- `newwin` on startup is fine
- `menu> new window` is shows the same 10px effect but coupled with #42192
- the height dropped from 900px to 840px but I'm 99% sure this was because my VM window was just the wrong size and triggered a smaller 800px attempt (due to the extra height) which was 30px for the toolbar and outr mysterious 10px
So again, we should be measuring after any newwin creating and then resizing a second time if needed (should be feasible, see other tickets)
cc: @ma1https://gitlab.torproject.org/tpo/applications/tor-browser-spec/-/issues/25021Update Tor Browser spec2024-03-20T23:28:07ZGeorg KoppenUpdate Tor Browser specTor Browser 11.0 is coming out soon. We should update our design document to cover all the new issues that are showing up in it. Highlights are
1) Switch to rbm/tor-browser-build
2) The security slider copy update
...
The update should...Tor Browser 11.0 is coming out soon. We should update our design document to cover all the new issues that are showing up in it. Highlights are
1) Switch to rbm/tor-browser-build
2) The security slider copy update
...
The update should cover the current goals and state of the browser, and fold in all the 8.0, 8.5, 9.0, 9.5, 10.0, and 10.5 changes.Tor Browser: 11.0 Issues with previous releaserichardrichardhttps://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/42450Solve newwin fingerprinting + issues2024-03-20T22:51:57ZThorinSolve newwin fingerprinting + issuesfix newwin (& it's inner measurement entropy) with and w/out LBing
- **not** letterboxing
- **not** betterboxing
- **not** UI/UX or toggling UI (e.g. #41564, #42020)
- **not** zooming/resizing/maximizing/full-screening after newwin (e.g ...fix newwin (& it's inner measurement entropy) with and w/out LBing
- **not** letterboxing
- **not** betterboxing
- **not** UI/UX or toggling UI (e.g. #41564, #42020)
- **not** zooming/resizing/maximizing/full-screening after newwin (e.g #16456, #40858, #20129, #20941, #41585, #41723)
- **ignoring** #27083 as I think #30945 is the answer
- i.e **not** inner window _changes_
cc: @ma1 @pierov @richardhttps://gitlab.torproject.org/tpo/web/donate-neo/-/issues/38PayPal message-bar is missing2024-03-20T19:05:37ZdonutsPayPal message-bar is missingIn the designs we have a message-bar that is displayed when PayPal is selected: [Figma / Doante-dot](https://www.figma.com/file/nIpahk0b9VMaeEnubiO33g/Marble?type=design&node-id=472%3A483&mode=design&t=fDvF3BZ4AeO9vpbq-1)In the designs we have a message-bar that is displayed when PayPal is selected: [Figma / Doante-dot](https://www.figma.com/file/nIpahk0b9VMaeEnubiO33g/Marble?type=design&node-id=472%3A483&mode=design&t=fDvF3BZ4AeO9vpbq-1)stephenstephenhttps://gitlab.torproject.org/tpo/web/donate-neo/-/issues/37Text styles in custom amount-field don't match Figma2024-03-20T18:29:25ZdonutsText styles in custom amount-field don't match FigmaSince this is a fun custom field, the text styles of the currency symbol, value and currency code are all subtly different in Figma. Notably:
- The currency symbol ($) is small.
- The currency code (USD) is de-emphasized.
- There's a 2p...Since this is a fun custom field, the text styles of the currency symbol, value and currency code are all subtly different in Figma. Notably:
- The currency symbol ($) is small.
- The currency code (USD) is de-emphasized.
- There's a 2px gap between the currency symbol and value.
Please see the attached spec for reference.stephenstephenhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40909TPA-RFC-38 wiki replacement2024-03-20T18:27:42ZKezTPA-RFC-38 wiki replacementThis is the discussion ticket for [TPA-RFC-38: Setting Up a Wiki Service](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-38-new-wiki-service). This ticket serves as a place where people can suggest changes to the RFC, ...This is the discussion ticket for [TPA-RFC-38: Setting Up a Wiki Service](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-38-new-wiki-service). This ticket serves as a place where people can suggest changes to the RFC, as well as suggest goals and must-have features for the new wiki serviceanarcatanarcathttps://gitlab.torproject.org/tpo/core/arti/-/issues/1341Sync GeoIP Databases in an as automated way as possible2024-03-20T18:07:33ZAlexander Færøyahf@torproject.orgSync GeoIP Databases in an as automated way as possibleWith C Tor, @dgoulet made this nice toolchain where the CI generates the text files for us for GeoIP usage. It looks like we need something like this for Arti too as the GeoIP DB's are stale right now (not updated since d5632eacb2).With C Tor, @dgoulet made this nice toolchain where the CI generates the text files for us for GeoIP usage. It looks like we need something like this for Arti too as the GeoIP DB's are stale right now (not updated since d5632eacb2).Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40774libtor.a: pubsub_install tor_raw_abort2024-03-20T17:17:22Zsbslibtor.a: pubsub_install tor_raw_abort### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating)...### Summary
We see OONI Probe Android crashes where `pubsub_install` calls `tor_raw_abort` for tor 0.4.7.13 using libtor.a embedded into a dynamic library loaded by an Android app. As of 2023-02-09 (around when we started investigating), this issue occurred 526 times in the last 28 days and was one of the main sources of crashes for the OONI Probe Android app.
A typical stack trace obtained from the Google Play console looks like this:
```
backtrace:
#00 pc 0x0000000000089b0c .../lib64/bionic/libc.so (abort+164)
#01 pc 0x00000000013778a4 .../split_config.arm64_v8a.apk (tor_raw_abort_+12)
#02 pc 0x0000000001382150 .../split_config.arm64_v8a.apk (tor_abort_+12)
#03 pc 0x00000000012470a0 .../split_config.arm64_v8a.apk (pubsub_install+120)
#04 pc 0x0000000001247170 .../split_config.arm64_v8a.apk (tor_run_main+136)
```
We investigated this issue and manage to reproduce it initially on OONI Probe Android, then in Linux using our Go code for managing libtor.a, and finally with a pure C test case working under Linux. During this investigating we have never seen the first bootstrap failing. Rather, in some cases it took > 30 repeated bootstraps to observe the abort; in other cases, it occurred within the first 3-10 bootstraps.
I searched in the issue tracker for "pubsub", "pubsub_install", "SIGABRT", and "abort". AFAICT, there is no other open issue discussing this problem, however, I think https://gitlab.torproject.org/tpo/core/tor/-/issues/32729 may be related and ~similar.
### Steps to reproduce:
The following steps allowed me to reproduce the problem on Ubuntu 22.04.2:
1. `git clone https://gitlab.torproject.org/tpo/core/tor`
2. `cd tor`
3. `git checkout tor-0.4.7.13`
4. `git apply 004.diff` where `004.diff` is
```diff
diff --git a/src/lib/pubsub/pubsub_check.c b/src/lib/pubsub/pubsub_check.c
index 99e604d715..a5cc4b7658 100644
--- a/src/lib/pubsub/pubsub_check.c
+++ b/src/lib/pubsub/pubsub_check.c
@@ -25,6 +25,7 @@
#include "lib/malloc/malloc.h"
#include "lib/string/compat_string.h"
+#include <stdio.h>
#include <string.h>
static void pubsub_adjmap_add(pubsub_adjmap_t *map,
@@ -343,21 +344,27 @@ lint_message(const pubsub_adjmap_t *map, message_id_t msg)
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has subscribers, but no publishers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_pub == 0 for %s\n", get_message_id_name(msg));
ok = false;
} else if (n_sub == 0) {
log_warn(LD_MESG|LD_BUG,
"Message \"%s\" has publishers, but no subscribers.",
get_message_id_name(msg));
+ fprintf(stderr, "SBSDEBUG: n_sub == 0 for %s\n", get_message_id_name(msg));
ok = false;
}
/* Check the message graph topology. */
- if (lint_message_graph(map, msg, pub, sub) < 0)
+ if (lint_message_graph(map, msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_graph failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
/* Check whether the messages have the same fields set on them. */
- if (lint_message_consistency(msg, pub, sub) < 0)
+ if (lint_message_consistency(msg, pub, sub) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message_consistency failed for %s\n", get_message_id_name(msg));
ok = false;
+ }
if (!ok) {
/* There was a problem -- let's log all the publishers and subscribers on
@@ -385,6 +392,7 @@ pubsub_adjmap_check(const pubsub_adjmap_t *map)
bool all_ok = true;
for (unsigned i = 0; i < map->n_msgs; ++i) {
if (lint_message(map, i) < 0) {
+ fprintf(stderr, "SBSDEBUG: lint_message failed for %u %s\n", i, get_message_id_name((message_id_t)i));
all_ok = false;
}
}
@@ -401,11 +409,15 @@ pubsub_builder_check(pubsub_builder_t *builder)
pubsub_adjmap_t *map = pubsub_build_adjacency_map(builder->items);
int rv = -1;
- if (!map)
+ if (!map) {
+ fprintf(stderr, "SBSDEBUG: pubsub_build_adjacency_map failed\n");
goto err; // should be impossible
+ }
- if (pubsub_adjmap_check(map) < 0)
+ if (pubsub_adjmap_check(map) < 0) {
+ fprintf(stderr, "SBSDEBUG: pubsub_adjmap_check failed\n");
goto err;
+ }
rv = 0;
err:
```
5. `./autogen.sh`
6. `./configure --disable-asciidoc`
7. `make`
8. `mkdir tmp`
9. `vi tmp/main.c` where `main.c` contains
```C
#include "../src/feature/api/tor_api.h"
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
static void *threadMain(void *ptr) {
int *fdp = (int*)ptr;
(void)sleep(45 /* seconds */);
(void)close(*fdp);
free(fdp);
return NULL;
}
int main() {
for (;;) {
tor_main_configuration_t *config = tor_main_configuration_new();
if (config == NULL) {
exit(1);
}
char *argv[] = {
"tor",
"Log",
"notice stderr",
"DataDirectory",
"./x",
NULL,
};
int argc = 5;
if (tor_main_configuration_set_command_line(config, argc, argv) != 0) {
exit(2);
}
int filedesc = tor_main_configuration_setup_control_socket(config);
if (filedesc < 0) {
exit(3);
}
int *fdp = malloc(sizeof(*fdp));
if (fdp == NULL) {
exit(4);
}
*fdp = filedesc;
pthread_t thread;
if (pthread_create(&thread, NULL, threadMain, /* move */ fdp) != 0) {
exit(5);
}
(void)tor_run_main(config);
if (pthread_join(thread, NULL) != 0) {
exit(6);
}
fprintf(stderr, "********** doing another round\n");
}
}
```
10. `gcc -Wall tmp/main.c -L. -ltor -levent -lcrypto -lssl -lz -lm`
11. `./a.out 2>&1|tee LOG.txt`
The `tmp/main.c` command is a reasonable approximation of what our Go code for running tor does. The main difference is that we start tor with `DisableNetwork` set and re-enable network later. This difference does not seem to have any impact, since we saw aborts in both cases.
We run repeated bootstraps because the OONI Probe Android app loads tor and the Go code as a shared library and calls `tor_run_main` each time we run a OONI experiment that requires tor (typically, `vanilla_tor` and `torsf`).
### What is the current bug behavior?
We can cluster the kind of crashes we observed into two groups.
#### pubsub_adjmap_check failed
This crash has been the most frequent one we observed. With the above patch applied, it generally looks like this:
```
[... omitting logs from several bootstraps ...]
Mar 22 14:07:21.000 [notice] Owning controller connection has closed -- exiting now.
Mar 22 14:07:21.000 [notice] Catching signal TERM, exiting cleanly.
********** doing another round
SBSDEBUG: n_sub == 0 for orconn_state
SBSDEBUG: lint_message failed for 5 orconn_state
SBSDEBUG: n_pub == 0 for orconn_state
SBSDEBUG: lint_message failed for 34 orconn_state
SBSDEBUG: pubsub_adjmap_check failed
[1] 300227 IOT instruction (core dumped) ./a.out 2>&1 |
300228 done tee LOG.txt
```
When running this via Go code, we see a different message before the abort. I think this happens because Go installs its own handler for SIGABRT, while the C code does not install any handler. My understanding is also that "IOT instruction" is related to `SIGIOT`, which seems to be an alias for `SIGABRT` judging from include/linux/signal.h and Glib's bits/signum-generic.h.
My understanding of the above logs is that, somehow, a message is registered twice: once without publishers, and once without subscribers.
It's also important to point out that the message causing failure has not always been `orconn_state`. Based on all the aborts we have examined, it seems that also `orconn_status` could cause failures. For the sake of brevity, I am not going to copy here all the logs we collected, but you can read them along with my thought process when analyzing the bug at https://github.com/ooni/probe/issues/2406.
#### INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
This specific error occurred very rarely (2-3 times). It is not clear whether this is the same issue or not, however I think it makes sense to mention it in the same issue, because it occurred when using the above code to investigate pubsub_install aborts.
```
2023/03/21 17:59:13 info tunnel: tor: exec: <internal/libtor> x/tunnel/torsf/tor [...]
BUG: subsystem btrack (at 55) could not connect to publish/subscribe system.
============================================================ T= 1679421553
INTERNAL ERROR: Raw assertion failed in Tor 0.4.7.13 at src/app/main/subsysmgr.c:183: 0
A subsystem couldn't be connected.
./testtorsf(dump_stack_symbols_to_error_fds+0x58)[0xe6df08]
./testtorsf(tor_raw_assertion_failed_msg_+0x97)[0xe6e8d7]
./testtorsf(subsystems_add_pubsub_upto+0x128)[0xe47df8]
./testtorsf(pubsub_install+0x29)[0xdf9c99]
./testtorsf(tor_run_main+0x8a)[0xdf9e2a]
./testtorsf(_cgo_2d785783cadf_Cfunc_tor_run_main+0x1b)[0xdf665b]
./testtorsf[0x500e04]
SIGABRT: abort
PC=0x7fa00f89aa7c m=14 sigcode=18446744073709551610
signal arrived during cgo execution
```
(Because this specific error occurred when using Go code, here you see also the output of Go `SIGABRT` handler.)
The specific assertion that fails in this case is the following:
```C
int
subsystems_add_pubsub_upto(pubsub_builder_t *builder,
int target_level)
{
for (unsigned i = 0; i < n_tor_subsystems; ++i) {
const subsys_fns_t *sys = tor_subsystems[i];
if (!sys->supported)
continue;
if (sys->level > target_level)
break;
if (! sys_status[i].initialized)
continue;
int r = 0;
if (sys->add_pubsub) {
subsys_id_t sysid = get_subsys_id(sys->name);
raw_assert(sysid != ERROR_ID);
pubsub_connector_t *connector;
connector = pubsub_connector_for_subsystem(builder, sysid);
r = sys->add_pubsub(connector);
pubsub_connector_free(connector);
}
if (r < 0) {
fprintf(stderr, "BUG: subsystem %s (at %u) could not connect to "
"publish/subscribe system.", sys->name, sys->level);
raw_assert_unreached_msg("A subsystem couldn't be connected."); // <- HERE
}
}
return 0;
}
```
### What is the expected behavior?
On a very broad level, I think tor should not abort. Because I do not understand very well what is happening, it is difficult to provide a more specific recommendation about what the code should actually do.
### Environment
- Which version of Tor are you using? Run `tor --version` to get the version if you are unsure.
Always 0.4.7.13
- Which operating system are you using? For example: Debian GNU/Linux 10.1, Windows 10, Ubuntu Xenial, FreeBSD 12.2, etc.
Android (several versions and devices according to the Google Play console); Android 13 on Pixel 4a arm64 (my phone); Ubuntu 22.04.2 on amd64
- Which installation method did you use? Distribution package (apt, pkg, homebrew), from source tarball, from Git, etc.
Tor compiled along with all its dependencies using our build scripts as well as tor compiled from sources with Ubuntu 22.04.2 installation dependencies when reproducing the issue using the above mentioned steps.
### Relevant logs and/or screenshots
I think I already provided representative logs above. The https://github.com/ooni/probe/issues/2406 issue contains all the logs we produced while investigating this issue on our end. It also describes how we progressively narrowed down the problem from an abort in the Android app to an abort using Go code on Linux to the minimal instructions for reproducing the issue that I mentioned above.
On this note, I initially suspected that there was a data race on our end. That assumption was true but the abort continued to occur after I fixed the data race inside Go code. In any case, the possible presence of data races on our end prompted me to bypass our Go code and write C code that could allow reproducing the issue. In one of my final attempts at understanding the issue using just C code, I [patched tor to avoid aborting in case pubsub_install failed](https://github.com/ooni/probe/issues/2406#issuecomment-1479884981), recompiled and run with tsan enabled, [seeing just two pubsub_install failures over 490 runs and no sign of data races](https://github.com/ooni/probe/issues/2406#issuecomment-1480826748).
### Possible fixes
I don't know. Since the data-race theory is not supported by data and unlikely, perhaps it could be that state from previous runs causes issues with the pubsub subsystem that appear for repeated bootstraps? I'll be happy to collaborate and try other debugging strategies.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40110Obsolete versions of bridges are not triggering the upgrade alert on relay-se...2024-03-20T13:40:39ZGeorg KoppenObsolete versions of bridges are not triggering the upgrade alert on relay-searchWe have
```
This relay is running a version of Tor that is
too old and may be missing important security fixes. If this is your relay, you
should update it as soon as possible.
```
which is shown for relays when they run obsolete Tor ver...We have
```
This relay is running a version of Tor that is
too old and may be missing important security fixes. If this is your relay, you
should update it as soon as possible.
```
which is shown for relays when they run obsolete Tor versions. However, even though Tor versions are marked for bridges as obsolete, too, like
```
{"nickname":"Yuccahimsa","hashed_fingerprint":"252DDE4EF4464904CB1CB6C45BB35ECB5AD2E1B0","or_addresses":["10.250.126.28:49191","[fd9f:2e19:3bcf::3e:1d27]:49191"],"last_seen":"2024-03-20 12:00:33","first_seen":"2022-01-13 00:00:00","running":true,"flags":["Running","V2Dir","Valid"],"last_restarted":"2023-07-25 00:02:03","advertised_bandwidth":8359955,"contact":"yuccahimsa@protonmail.com","platform":"Tor 0.4.7.7 on Linux","version":"0.4.7.7","version_status":"obsolete","recommended_version":false,"transports":["obfs4"],"bridgedb_distributor":"moat"},
```
(as 0.4.7.x is EOL right now) when looking at the details page of bridges no red banner with the text above shows up.
/cc @gushttps://gitlab.torproject.org/tpo/core/torspec/-/issues/251Identify pieces of our protocols to remove or deprecate2024-03-20T13:18:18ZNick MathewsonIdentify pieces of our protocols to remove or deprecateAs part of our forthcoming work on Arti relays, we should look over our current protocols and see what we can remove. For example, it would be great if we never have to implement TAP again.
This may require some design work, if our pro...As part of our forthcoming work on Arti relays, we should look over our current protocols and see what we can remove. For example, it would be great if we never have to implement TAP again.
This may require some design work, if our protocols currently require support for a feature.Nick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/core/arti/-/issues/1339Be consistent with our STUB/STUB+ terminology throughout circmgr/guardmgr2024-03-20T12:31:52Zgabi-250Be consistent with our STUB/STUB+ terminology throughout circmgr/guardmgrThe following discussion from !2046 should be addressed:
- [ ] @nickm started a [discussion](https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/2046#note_3010014): (+3 comments)
> IMO we should do these things in the doc...The following discussion from !2046 should be addressed:
- [ ] @nickm started a [discussion](https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/2046#note_3010014): (+3 comments)
> IMO we should do these things in the docs:
> - Explain what a stub circuit is
> - Explain when we would choose STUB and when we would choose STUB+
> - Get our terminology uniform on "STUB+" vs "Extended". (But let's not do a big rename until we've decided.)Arti: Guard discovery researchgabi-250gabi-250https://gitlab.torproject.org/tpo/team/-/issues/187Code Audit for Sponsor 1122024-03-20T11:59:38ZGabagaba@torproject.orgCode Audit for Sponsor 112- [ ] Create RFPT
- [ ] Send to DRL for approval
- [ ] Send to auditors
- [ ] Choose an auditor to start work- [ ] Create RFPT
- [ ] Send to DRL for approval
- [ ] Send to auditors
- [ ] Choose an auditor to start workGabagaba@torproject.orgGabagaba@torproject.org2024-06-01https://gitlab.torproject.org/tpo/tpa/anon_ticket/-/issues/44Add links to anon-ticket system from gitlab2024-03-20T08:37:21ZNick MathewsonAdd links to anon-ticket system from gitlabIt would be great if our gitlab site would link unauthenticated users to "anonticket.onionize.space" in a way that would make it convenient to comment on an issue or create a ticket if you're already looking at the right place in gitlab.It would be great if our gitlab site would link unauthenticated users to "anonticket.onionize.space" in a way that would make it convenient to comment on an issue or create a ticket if you're already looking at the right place in gitlab.jugajugahttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40755TPA-RFC-33: monitoring system upgrade or replacement2024-03-19T20:17:23ZanarcatTPA-RFC-33: monitoring system upgrade or replacementin #29864, we've gone pretty deep in comparisons between prometheus and icinga and how the first could replace the latter.
but now we're stuck at "i like this one better than the other" because we don't have a clear set of requirements....in #29864, we've gone pretty deep in comparisons between prometheus and icinga and how the first could replace the latter.
but now we're stuck at "i like this one better than the other" because we don't have a clear set of requirements.
the task here is to write a set of requirements for the new alerting system and, ultimately, make a proposal for the replacement of the deprecated Icinga 1 deployment we have now.
* [ ] establish requirements
* [ ] approve requirements
* if replacing icinga:
* [ ] review #29864 for ideas and tasks
* [ ] decide whether we keep the prometheus1/2 distinction
* [ ] deploy alert manager on prometheus1
* [ ] reimplement the Nagios alerting commands (optional?)
* [ ] send Nagios alerts through the alertmanager (optional?)
* [ ] rewrite (non-NRPE) commands (9) as Prometheus alerts
* [ ] scrape the NRPE metrics from Prometheus (optional)
* [ ] create a dashboard and/or alerts for the NRPE metrics (optional)
* [ ] review the NRPE commands (300+) to see which one to rewrite as Prometheus alerts
* [ ] turn off the Icinga server
* [ ] remove all traces of NRPE on all nodes
* if keeping icinga
* [ ] review work from @weasel done on DSA's Puppet/Icinga integration
* [ ] deploy that module or another inciga module inside puppet
* [ ] rewrite all the checks from the `nagios-master.cfg` file into puppet (300+)
* [ ] rebuild a new Icinga 2 server
* [ ] retire the old Icinga 1 serverold service retirement 2023anarcatanarcathttps://gitlab.torproject.org/tpo/team/-/issues/269s144 report2024-03-19T19:40:45ZGabagaba@torproject.orgs144 report2024-03-25https://gitlab.torproject.org/tpo/onion-services/onionspray/-/issues/48Try to opportunistically set nginx_proxy_ssl_trusted_certificate2024-03-19T18:36:54ZSilvio RhattoTry to opportunistically set nginx_proxy_ssl_trusted_certificate# Tasks
* [ ] Auto-detect common certificate chain locations and try to opportunistically
auto-set `nginx_proxy_ssl_trusted_certificate` (maybe at config compilation
time, from template to config file).
# Time estimation
*...# Tasks
* [ ] Auto-detect common certificate chain locations and try to opportunistically
auto-set `nginx_proxy_ssl_trusted_certificate` (maybe at config compilation
time, from template to config file).
# Time estimation
* Complexity: small (1 day)
* Uncertainty: low (x1.1)
* [Reference](https://jacobian.org/2021/may/25/my-estimation-technique/) (adapted)Onionspray 1.7.0Silvio RhattoSilvio Rhatto2024-06-27https://gitlab.torproject.org/tpo/team/-/issues/186Code Audit for Sponsor 1012024-03-19T18:04:53ZGabagaba@torproject.orgCode Audit for Sponsor 101- [x] Create RFPT
- [ ] Send to DRL for approval
- [ ] Send to auditors
- [ ] Choose an auditor to start work- [x] Create RFPT
- [ ] Send to DRL for approval
- [ ] Send to auditors
- [ ] Choose an auditor to start workGabagaba@torproject.orgGabagaba@torproject.org2024-03-20https://gitlab.torproject.org/tpo/team/-/issues/263Wrapping up sponsor 962024-03-19T17:50:14ZGabagaba@torproject.orgWrapping up sponsor 96- [ ] Final review of deliverables
- [ ] Review indicators
- [ ] Write report for last quarter. Due end of April.
- [ ] Schedule retrospective
- [ ] Write final report. Due on July 29th- [ ] Final review of deliverables
- [ ] Review indicators
- [ ] Write report for last quarter. Due end of April.
- [ ] Schedule retrospective
- [ ] Write final report. Due on July 29thGabagaba@torproject.orgGabagaba@torproject.org2024-07-15