The Tor Project issueshttps://gitlab.torproject.org/groups/tpo/-/issues2022-11-12T09:00:36Zhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40704relay: Onionskin wait cutoff and MaxOnionQueueDelay in queue should be conse...2022-11-12T09:00:36ZDavid Gouletdgoulet@torproject.orgrelay: Onionskin wait cutoff and MaxOnionQueueDelay in queue should be consensus parametersOur onion queue code, using the CPU thread pool, has this value hardcoded:
```
/** 5 seconds on the onion queue til we just send back a destroy */
#define ONIONQUEUE_WAIT_CUTOFF 5
```
This is used when we add an onionskin to the qu...Our onion queue code, using the CPU thread pool, has this value hardcoded:
```
/** 5 seconds on the onion queue til we just send back a destroy */
#define ONIONQUEUE_WAIT_CUTOFF 5
```
This is used when we add an onionskin to the queue, we will drop any requests that has been longer than that in the queue by sending a `DESTROY`.
There is also a torrc option named `MaxOnionQueueDelay` that behaves a bit differently. If it takes tor more time than that value to process an ntor, reject it.
Both of these should be consensus parameters because they can affect strongly how our relays behave in times of load or attack like at the moment. For instance, under the DoS conditions of the network, it is possible (unproven) that extending `MaxOnionQueueDelay` to a longer wait time could result in less overload from our relays. But, this is a torrc option meaning that if not all 6000 relays update at once, we might have this partitioning problem.
If one group of the network sets that value higher leading to more room to handle onionskins, it means that these relays will have a higher CBT value which would transition circuit creation away from them to more overloaded relays.
If the network all at once change that value, CBT should in theory remains uniform for all path but just that all the sudden, circuit creation fails less but takes more time.Tor: 0.4.8.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40676ExitPolicy should apply to already established outbound connections (with a c...2023-08-25T16:55:41ZcypherpunksExitPolicy should apply to already established outbound connections (with a config option, off by default)To reduce the impact of tor running out of free TCP source ports (see pending comment in #26646) we added a reject entry for the destination causing most outbound TCP connections to the ExitPolicy and restarted tor.
```
ExitPolicy reje...To reduce the impact of tor running out of free TCP source ports (see pending comment in #26646) we added a reject entry for the destination causing most outbound TCP connections to the ExitPolicy and restarted tor.
```
ExitPolicy reject 1.2.3.4:* <<<< added to avoid outbound connections to this target
ExitPolicy accept *:80
ExitPolicy accept *:443
ExitPolicy reject *:*
```
Expected: Tor should not create any connections to 1.2.3.4
Even after changing the torrc and restarting tor we see established TCP connections to 1.2.3.4,
this is unexpected.Tor: 0.4.8.x-freezehttps://gitlab.torproject.org/tpo/core/tor/-/issues/40629Allow ignoring of SIGINT2022-06-23T21:13:51ZtlaAllow ignoring of SIGINT### Summary
Add an option (.e.g `--IgnoreSigint 1`) which allows to ignore `SIGINT`.
iOS has a feature which enables apps to keep running in the background for a certain amount of time:
https://developer.apple.com/documentation/uikit/...### Summary
Add an option (.e.g `--IgnoreSigint 1`) which allows to ignore `SIGINT`.
iOS has a feature which enables apps to keep running in the background for a certain amount of time:
https://developer.apple.com/documentation/uikit/uiapplication/1623031-beginbackgroundtask
However, even when we're making use of that, iOS is sending `SIGINT` to the app process, as soon as the user swipes away the app. (Sends it into background.)
Tor is currently hardcoded to stop working, when it receives that `SIGINT`:
https://gitlab.torproject.org/tpo/core/tor/-/blob/main/src/app/main/main.c#L223-228
### What is the expected behavior?
When the mentioned configuration option is set, Tor just ignores the `SIGINT` and continues running, to enable processing in the background.Tor: 0.4.8.x-freezeAlexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40422[CircuitPadding] circpad_add_matching_machines() should be called when a cir...2023-06-09T13:26:45ZJaym[CircuitPadding] circpad_add_matching_machines() should be called when a circuit has opened.### Summary
The circuit padding framework supports negotiating padding upon various events. Among them, CIRCPAD_CIRC_OPENED states that a given padding machine should be applied to a circuit when a circuit has opened.
However, no code ...### Summary
The circuit padding framework supports negotiating padding upon various events. Among them, CIRCPAD_CIRC_OPENED states that a given padding machine should be applied to a circuit when a circuit has opened.
However, no code seems to trigger this mechanism. When a circuit has built, the function circpad_machine_event_circ_built() is called and checks whether some machine may be removed/added to the circuit. However, at this stage of the circuit building process, the circuit has built but is not marked as open yet.
### Bug
If some machine uses `client_machine->conditions.apply_state_mask = CIRCPAD_CIRC_OPENED;` the machine would only be applied when another event than a circ building/opening triggers the function circpad_add_matching_machines() (e.g., ap conn links a stream, or the circ purpose changes from general to something else).
### What is the expected behavior?
When circuituse.c calls circuit_has_opened(), it should also call the circpad module; e.g., a new function circpad_machine_event_circ_opened() that checks for adding machine to the circuit.
### Environment
Running a version forked from 0.4.5.7
### Relevant logs and/or screenshots
Contains some logs showing a call to circpad_machine_event_circ_built() while the circuit is still marked as building. Also contains custom logs:
```Jun 30 11:23:50.000 [info] circuit_finish_handshake(): Finished building circuit hop:
Jun 30 11:23:50.000 [info] internal (high-uptime) circ (length 3, last hop test000a): $22BA781A60C0CBB7FFAEA8858128427F67F60038(open) $7684DE04DCBB44538554E2CD1D14CDF836D5AF4D(open) $C7ADB1DBCE99F0B2ED2812B1953E4986EE9846DB(open)
Jun 30 11:23:50.000 [debug] dispatch_send_msg_unchecked(): Queued: ocirc_cevent (<gid=7 evtype=2 reason=0 onehop=0>) from or, on ocirc.
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering: ocirc_cevent (<gid=7 evtype=2 reason=0 onehop=0>) from or, on ocirc:
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering to btrack.
Jun 30 11:23:50.000 [debug] btc_cevent_rcvr(): CIRC gid=7 evtype=2 reason=0 onehop=0
Jun 30 11:23:50.000 [debug] circuit_build_times_add_time(): Adding circuit build time 43
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking condition state mask 21 vs condition: 2
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_event_circ_built(): Circpad module event circ built -- circ state: 0
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking condition state mask 21 vs condition: 2
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] circpad_machine_conditions_apply(): Checking circuit purpose, 5
Jun 30 11:23:50.000 [debug] invoke_plugin_operation_or_default(): Plugin found for caller calling a plugin in the circpad module when a circuit has built
Jun 30 11:23:50.000 [info] circpad_dropmark_activate_when_built(): Looks like the client_dropmark_def machine does not exist over this circuit
Jun 30 11:23:50.000 [debug] plugin_run(): Plugin execution returned -2147483648
Jun 30 11:23:50.000 [debug] plugin_run(): vm error message: (null)
Jun 30 11:23:50.000 [info] entry_guards_note_guard_success(): Recorded success for primary confirmed guard test002r ($22BA781A60C0CBB7FFAEA8858128427F67F60038)
Jun 30 11:23:50.000 [debug] dispatch_send_msg_unchecked(): Queued: ocirc_state (<gid=7 state=4 onehop=0>) from or, on ocirc.
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering: ocirc_state (<gid=7 state=4 onehop=0>) from or, on ocirc:
Jun 30 11:23:50.000 [debug] dispatcher_run_msg_cbs(): Delivering to btrack.
Jun 30 11:23:50.000 [debug] btc_state_rcvr(): CIRC gid=7 state=4 onehop=0
Jun 30 11:23:50.000 [info] circuit_build_no_more_hops(): circuit built!
Jun 30 11:23:50.000 [info] pathbias_count_build_success(): Got success count 3.000000/3.000000 for guard test002r ($22BA781A60C0CBB7FFAEA8858128427F67F60038)
Jun 30 11:23:50.000 [debug] circuit_has_opened(): calling circuit_has_opened()
```
### Possible fixes
Add a new function circpad_machine_event_circ_opened() called from circuituse.c when the circuit has opened.Tor: 0.4.8.x-freezeMike PerryMike Perryhttps://gitlab.torproject.org/tpo/community/training/-/issues/67Update Community Materials2023-03-30T11:29:33ZNahUpdate Community MaterialsThis is a major ticket so we can keep track on what changes the @tpo/ux team needs to do to the Community Materials, as well what needs to be created, for the next quarters.
**Tor Training Package: everything you need in one pack**
- T...This is a major ticket so we can keep track on what changes the @tpo/ux team needs to do to the Community Materials, as well what needs to be created, for the next quarters.
**Tor Training Package: everything you need in one pack**
- Tor Training Material: slides deck, instructions pdf (which includes materials needed, like internet, pen, etc)
- Feedback Material: [feedback form](https://gitlab.torproject.org/tpo/ux/research/-/blob/master/scripts%20and%20activities/2022/training%20feedback/participants-feedback.md), [report template](https://gitlab.torproject.org/tpo/ux/research/-/blob/master/scripts%20and%20activities/2022/training%20feedback/training-report.md), [instructions](https://gitlab.torproject.org/tpo/ux/research/uploads/e07032aa6ae4085e9828ca04b769e6b4/collect-user-feedback-during-a-training.pdf).
**Participants Material:** [Tor Guides](https://community.torproject.org/outreach/kit/).
**Needs Design - update:**
- Training material slides deck
- Tor Connection Training Graphic
- Tor Guides
- Feedback MaterialSponsor 9 - Phase 6 - Usability and Community Intervention on Support for Democracy and Human Rightshttps://gitlab.torproject.org/tpo/community/l10n/-/issues/40078Onboarding trainings for translators2024-01-10T14:05:08ZGabagaba@torproject.orgOnboarding trainings for translatorsInclude onboarding trainings to translators during the Localization Hangout we already host monthlyInclude onboarding trainings to translators during the Localization Hangout we already host monthlySponsor 9 - Phase 6 - Usability and Community Intervention on Support for Democracy and Human Rightsemmapeelemmapeelhttps://gitlab.torproject.org/tpo/ux/research/-/issues/87Collect and analyze feedback from trainings in EA2022-08-12T18:18:59ZNahCollect and analyze feedback from trainings in EAOrganizations that will be running Tor and digital security trainings are being instructed to collect feedback during their trainings, either using a survey hosted by Tor (for remote trainings) or a printed form (for in person trainings)...Organizations that will be running Tor and digital security trainings are being instructed to collect feedback during their trainings, either using a survey hosted by Tor (for remote trainings) or a printed form (for in person trainings).
After the trainings are done, we will need to gather all user and trainer feedback, compile and report it. This ticket is to track this activity.Sponsor 9 - Phase 6 - Usability and Community Intervention on Support for Democracy and Human RightsNahNahhttps://gitlab.torproject.org/tpo/web/tpo/-/issues/180Users do not recognize the download page is for Tor Browser2023-01-30T19:15:14Zjosernitoshola@josernitos.comUsers do not recognize the download page is for Tor Browser**Problem:**
Potential users doubt if the "[Download](https://www.torproject.org/download/)" Page is indeed where they can download the Tor Browser.
**Why is it important:**
After potential users have decided to click on one of the re...**Problem:**
Potential users doubt if the "[Download](https://www.torproject.org/download/)" Page is indeed where they can download the Tor Browser.
**Why is it important:**
After potential users have decided to click on one of the results of their Search Engine. _They were **still confused** if they were on the Download page of Tor Browser_.
The problem stems from the lack of reference of being a Browser's download page. We are only shown to "Defend yourself." But we should expand on what is this page about.
**Recommendations:**
- Add the word "Browser" in the page title.
- I think is worthwhile to think of other ways to display this page, and not just add a word.
- I'd recommend to check how other browsers do this: [Firefox](https://www.mozilla.org/en-US/firefox/new/), [Chrome](https://www.google.com/chrome/), [Brave](https://brave.com/download/), [Vivaldi](https://vivaldi.com/download/).
**Research associated:**
- [Onboarding - Costa Rica - UR](https://gitlab.torproject.org/tpo/ux/research/-/blob/master/reports/2021/UR-Tor-CostaRica.md)Sponsor 9 - Phase 6 - Usability and Community Intervention on Support for Democracy and Human Rightshttps://gitlab.torproject.org/tpo/community/l10n/-/issues/40017Migrate to an alternative to Transifex, ideally self hosted2022-09-30T19:32:40ZGhost UserMigrate to an alternative to Transifex, ideally self hostedI believe Transifex has problems: privacy issues, proprietary software and resilience issues.
Transifex's website is full of (mandatory) JavaScript with hooks to DoubleClick, Facebook, GitHub, Google, Hotjar, LinkedIn and other third-pa...I believe Transifex has problems: privacy issues, proprietary software and resilience issues.
Transifex's website is full of (mandatory) JavaScript with hooks to DoubleClick, Facebook, GitHub, Google, Hotjar, LinkedIn and other third-party tracking and data collection JavaScript. Transifex's [privacy policy][1] discloses the data it collects and shares, and I see it is too loose for users' privacy to be protected. Users' identifying information, contact details and their writings (translations) are collected.
[1]: https://www.transifex.com/about/privacy/
Transifex is a proprietary SaaS (software as a service), which I would guess doesn't align with Tor Project's values. Transifex was originally an open-source project, but in 2013 the open-source version was discontinued ([source][2]). If Transifex closes down, transfers ownership or censors Tor Project, the ability to translate Tor will be compromised. Ideally, Tor Project should self-host the l10n project. For example, Tails uses its own instance of [Weblate][3] for website and documentation translation, and so does [SecureDrop][4].
[2]: https://en.wikipedia.org/wiki/Transifex
[3]: https://translate.tails.boum.org/
[4]: https://weblate.securedrop.org/
- Is a migration away from Transifex, for example self hosting, possible?
- How can people interested in l10n contribute without touching Transifex?Sponsor 9 - Phase 6 - Usability and Community Intervention on Support for Democracy and Human Rightsemmapeelemmapeelhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40781build boxes fail to update ubuntu chroots since bullseye upgrade2023-05-08T18:56:07Zanarcatbuild boxes fail to update ubuntu chroots since bullseye upgradeever since we upgraded the buildbox servers, we've been getting regular errors like this by email:
```
E: specified keyring file (/usr/share/keyrings/ubuntu-archive-keyring.gpg) not found
```
That's because the [ubuntu-keyring package]...ever since we upgraded the buildbox servers, we've been getting regular errors like this by email:
```
E: specified keyring file (/usr/share/keyrings/ubuntu-archive-keyring.gpg) not found
```
That's because the [ubuntu-keyring package](https://tracker.debian.org/pkg/ubuntu-keyring) was [removed from bullseye in 2019](https://tracker.debian.org/news/1047972/ubuntu-keyring-removed-from-testing/) because of a [release critical bug](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929165) that is still not fixed.
It looks like that specific bug requires some hacks with debhelper to be fixed, it would be nice of us to provide a patch to the package to implement those fixes. Alternatively, we could also use the package from buster, but that is bound to eventually bitrot as well.
@weasel also suggested turning off those chroots, but i'm not exactly sure how to do that myself, so I assigned him this ticket.
The full log is:
```
From: root@ci-runner-arm64-02.torproject.org (Cron Daemon)
Subject: Cron <root@ci-runner-arm64-02> sleep $((RANDOM % 36000)) && chronic setup-all-dchroots
To: root@ci-runner-arm64-02.torproject.org
Date: Wed, 01 Jun 2022 05:54:11 +0000
E: specified keyring file (/usr/share/keyrings/ubuntu-archive-keyring.gpg) not found
E: specified keyring file (/usr/share/keyrings/ubuntu-archive-keyring.gpg) not found
E: specified keyring file (/usr/share/keyrings/ubuntu-archive-keyring.gpg) not found
E: specified keyring file (/usr/share/keyrings/ubuntu-archive-keyring.gpg) not found
+ debootstrap --keyring /usr/share/keyrings/ubuntu-archive-keyring.gpg --include=apt,gnupg,ca-certificates,apt-transport-https --variant=buildd --arch=arm64 xenial /srv/chroot/schroot-unpack/create-xenial-MqyCRi http://ports.ubuntu.com/ubuntu-ports /usr/share/debootstrap/scripts/xenial
+ do_cleanup
+ local cnt
+ cnt=2
++ seq 2 -1 0
+ for i in $(seq ${cnt} -1 0)
+ umount /srv/chroot/schroot-unpack/create-xenial-MqyCRi/sys
umount: /srv/chroot/schroot-unpack/create-xenial-MqyCRi/sys: no mount point specified.
+ true
+ for i in $(seq ${cnt} -1 0)
+ rm -r /srv/chroot/schroot-unpack/create-xenial-MqyCRi
+ for i in $(seq ${cnt} -1 0)
+ :
Error: setting up xenial:arm64 dchroot failed.
WARNING: tempfile is deprecated; consider using mktemp instead.
+ debootstrap --keyring /usr/share/keyrings/ubuntu-archive-keyring.gpg --include=apt,gnupg,ca-certificates --variant=buildd --arch=arm64 bionic /srv/chroot/schroot-unpack/create-bionic-C1udMW http://ports.ubuntu.com/ubuntu-ports /tmp/fileqy2Sje
+ do_cleanup
+ local cnt
+ cnt=3
++ seq 3 -1 0
+ for i in $(seq ${cnt} -1 0)
+ rm /tmp/fileqy2Sje
+ for i in $(seq ${cnt} -1 0)
+ umount /srv/chroot/schroot-unpack/create-bionic-C1udMW/sys
umount: /srv/chroot/schroot-unpack/create-bionic-C1udMW/sys: no mount point specified.
+ true
+ for i in $(seq ${cnt} -1 0)
+ rm -r /srv/chroot/schroot-unpack/create-bionic-C1udMW
+ for i in $(seq ${cnt} -1 0)
+ :
Error: setting up bionic:arm64 dchroot failed.
WARNING: tempfile is deprecated; consider using mktemp instead.
+ debootstrap --keyring /usr/share/keyrings/ubuntu-archive-keyring.gpg --include=apt,gnupg,ca-certificates --variant=buildd --arch=arm64 focal /srv/chroot/schroot-unpack/create-focal-O9eY7i http://ports.ubuntu.com/ubuntu-ports /tmp/file9YBgBL
+ do_cleanup
+ local cnt
+ cnt=3
++ seq 3 -1 0
+ for i in $(seq ${cnt} -1 0)
+ rm /tmp/file9YBgBL
+ for i in $(seq ${cnt} -1 0)
+ umount /srv/chroot/schroot-unpack/create-focal-O9eY7i/sys
umount: /srv/chroot/schroot-unpack/create-focal-O9eY7i/sys: no mount point specified.
+ true
+ for i in $(seq ${cnt} -1 0)
+ rm -r /srv/chroot/schroot-unpack/create-focal-O9eY7i
+ for i in $(seq ${cnt} -1 0)
+ :
Error: setting up focal:arm64 dchroot failed.
WARNING: tempfile is deprecated; consider using mktemp instead.
+ debootstrap --keyring /usr/share/keyrings/ubuntu-archive-keyring.gpg --include=apt,gnupg,ca-certificates --variant=buildd --arch=arm64 hirsute /srv/chroot/schroot-unpack/create-hirsute-PXI7Uh http://ports.ubuntu.com/ubuntu-ports /tmp/fileMWqaJg
+ do_cleanup
+ local cnt
+ cnt=3
++ seq 3 -1 0
+ for i in $(seq ${cnt} -1 0)
+ rm /tmp/fileMWqaJg
+ for i in $(seq ${cnt} -1 0)
+ umount /srv/chroot/schroot-unpack/create-hirsute-PXI7Uh/sys
umount: /srv/chroot/schroot-unpack/create-hirsute-PXI7Uh/sys: no mount point specified.
+ true
+ for i in $(seq ${cnt} -1 0)
+ rm -r /srv/chroot/schroot-unpack/create-hirsute-PXI7Uh
+ for i in $(seq ${cnt} -1 0)
+ :
Error: setting up hirsute:arm64 dchroot failed.
```Debian 11 bullseye upgradeweasel (Peter Palfrader)weasel (Peter Palfrader)https://gitlab.torproject.org/tpo/tpa/team/-/issues/40417inconsistent systemd-journald storage policies2022-11-17T02:29:32Zanarcatinconsistent systemd-journald storage policiesIt seems we don't have a consistent journald storage policy. By default, before bullseye, the default is `auto`, which means that we *have* journald persistent storage (in `/var/log/journal`) if the directory exists. but in bullseye and ...It seems we don't have a consistent journald storage policy. By default, before bullseye, the default is `auto`, which means that we *have* journald persistent storage (in `/var/log/journal`) if the directory exists. but in bullseye and later, this is enabled by default.
on the other hand, new machines created in bullseye *do* have the persistent journal enabled, and I have *just* enabled persistent journaling on polyanthum for tpo/tpa/team#40414.
we need to decide what we do about this as part of the buster upgrade.
right now, the situation is as follows:
```
(4) chi-node-[05,08].torproject.org,polyanthum.torproject.org,tb-pkgstage-01.torproject.org
----- OUTPUT of 'file /var/log/journal' -----
/var/log/journal: setgid, directory
```
that is, 4 servers have persistent journals. all other servers do not have /var/log/journal, so, in theory, should not have persistent journals as well.
do note that polyanthum *explicitely* had that disabled in `/etc/systemd/journald.conf.d/volatile.conf`, but this wasn't in puppet so I couldn't trace *why* this was done so. this needs to be revised, along with the journald retention policies.
i'm also worried about systemd's general lack of attention to PII retention. in [journal ip anonymization #2447](https://github.com/systemd/systemd/issues/2447) for example, maintainers have showed they do not want to implement log mangling to remove PII.
and on the other hand, persistent journals are *required* for some operations. for example, [user journals need it](https://github.com/systemd/systemd/issues/2744). there is a patch to [support runtime (volatile) user journals #12263 ](https://github.com/systemd/systemd/pull/12263), but it's been stalled for years.
one way to resolve this would be to enable persistent journaling, but keep it in a tmpfs.
we should also consider the duplication between journald and our regular syslog and whether we want to completely ditch one or the other.Debian 11 bullseye upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/29864TPA-RFC-33: consider replacing nagios with prometheus2023-05-17T18:06:54ZanarcatTPA-RFC-33: consider replacing nagios with prometheusAs a followup to the Prometheus/Grafana setup started in #29681, I am wondering if we should also consider replacing the Nagios/Icinga server with Prometheus. I have done a little research on the subject and figured it might be good to a...As a followup to the Prometheus/Grafana setup started in #29681, I am wondering if we should also consider replacing the Nagios/Icinga server with Prometheus. I have done a little research on the subject and figured it might be good to at least document the current state of affairs.
This would remove a complex piece of architecture we have at TPO that was designed before Puppet was properly deployed. Prometheus has an interesting federated design that allows it to scale to multiple machines easily, along with a high availability component for the alertmanager that allows it to be more reliable than a traditionnal Nagios configuration. It would also simplify our architecture as the Nagios server automation is a complex mix of Debian packages and git hooks that is serving us well, but hard to comprehend and debug for new administrators. (I managed to wipe the entire Nagios config myself on my first week on the job by messing up a configuration file.) Having the monitoring server fully deployed by Puppet would be a huge improvement, even if it would be done with Nagios instead of Prometheus, of course.
Right now the Nagios server is actually running Icinga 1.13, a Nagios fork, on a heztner machine (`hetzner-hel1-01`). It's doing its job generally well although it feels a *little* noisy, but that's to be expected form Nagios servers. Reducing the number of alerts seems to be an objective, explicitely documented in #29410, for example.
Both Grafana and Prometheus can do alerting, with various mechanisms and plugins. I haven't investigated those deeply, but in general that's not a problem in alerting: you fire some script or API and the rest happens. I suspect we could port the current Nagios alerting scripts to Prometheus fairly easily, although I haven't investigated our scripts in details.
The problem is reproducing the check scripts and their associated alert threshold. In the Nagios world, when a check is installed, it *comes* with its own health ("OK", "WARNING", "CRITICAL") threshold and TPO has developed a wide variety of such checks. According to the current Nagios dashboard, it monitors 4612 services on 88 hosts (which is interesting considering LDAP thinks there are 78). That looks terrifying, but it's actually a set of 9 commands running on the Nagios server, including the complex `check_nrpe` system, which is basically a client-side nagios that has its own set of checks. And that's where the "cardinal explosion" happens: on a typical host, there are 315 such checks implemented.
That's the hard part: convert those 324 checks into Prometheus alerts, one at a time. Unfortunately, there are no "built-in" or even "third-party" "prometheus alert sets" that I could find in my [original research](https://anarc.at/blog/2018-01-17-monitoring-prometheus/), although that might have changed in the last year.
Each check in Prometheus is basically a YAML file describing a Prometheus query that, when it evaluates to "true" (e.g. disk_space > 90%), sends an alert. It's not impossible to do that conversion, it's just a lot of work.
To do this progressively while allowing us to make new alerts on Prometheus instead of Nagios, I suggest to proceed the same way Cloudflare did, which is to establish a "Nagios to Prometheus" bridge, by which Nagios doesn't send the alerts on its own and instead forwards them to the Prometheus server, a plugin they called [Promsaint](https://github.com/cloudflare/promsaint).
With the bridge in place, Nagios checks can be migrated into Prometheus alerts progressively without disruption. Note that Cloudflare documented their experience with Prometheus in [this 2017 promcon talk](https://promcon.io/2017-munich/talks/monitoring-cloudflares-planet-scale-edge-network-with-prometheus/). Cloudflare also made an alert dashboard called [unsee](https://github.com/cloudflare/unsee) (see also the fork called [karma](https://github.com/prymitive/karma)) and [elasticsearch integration](https://github.com/cloudflare/alertmanager2es) which might be good to investigate further.
Another useful piece is this [NRPE to Prometheus exporter](https://www.robustperception.io/nagios-nrpe-prometheus-exporter), which allows Prometheus to directly scrape NRPE targets. It doesn't include Prometheus alerts and instead relies on a Grafana dashboard to show possible problems so, as such, I don't think it's that useful an alternative. There's a [similar approach using check_mk](https://github.com/m-lab/prometheus-nagios-exporter) instead.
Another possible approach is to send alerts from Nagios based on Prometheus checks, using the [Prometheus nagios plugins](https://github.com/prometheus/nagios_plugins). This might allow us to get rid of NRPE everywhere but it would probably be useful only if we do want to keep Nagios in the long term and remove NRPE in favor of the existing Prometheus exporters.
So, battle plan is basically this:
1. `apt install prometheus-alertmanager`
2. reimplement the Nagios alerting commands
3. send Nagios alerts through the alertmanager
4. rewrite (non-NRPE) commands (9) as Prometheus alerts
5. optionnally, scrape the NRPE metrics from Prometheus
6. optionnally, create a dashboard and/or alerts for the NRPE metrics
7. rewrite NRPE commands (300+) as Prometheus alerts
8. turn off the Nagios server
9. remove all traces of NRPE on all nodes
Update: this, obviously, will require more discussion than just implementing the above battle plan, as there isn't a consensus in the team towards Prometheus as a replacement for Icinga. I have assigned TPA-RFC-33 to this and started drafting requirements and personas in #40755Debian 11 bullseye upgradeanarcatanarcat2022-09-01https://gitlab.torproject.org/tpo/core/onionmasq/-/issues/60geoip lookup API2023-07-17T20:22:04Zcybertageoip lookup APIIn order to show the assumed country of a tor relay based on its IPs onionmasq has to provide an API for GeoIP lookups. This task is a follow-up bit of https://gitlab.torproject.org/tpo/core/onionmasq/-/issues/38.In order to show the assumed country of a tor relay based on its IPs onionmasq has to provide an API for GeoIP lookups. This task is a follow-up bit of https://gitlab.torproject.org/tpo/core/onionmasq/-/issues/38.VPN pre-alpha 02etaetahttps://gitlab.torproject.org/tpo/applications/vpn/-/issues/89add glow to connection progressbar2023-07-11T11:55:38Zcybertaadd glow to connection progressbarThe glow below the upper connectivity progress bar is missing yet.
We can probably implement that by increasing the height of the progress bar and adding a gradient filter at the bottom of the view.The glow below the upper connectivity progress bar is missing yet.
We can probably implement that by increasing the height of the progress bar and adding a gradient filter at the bottom of the view.VPN pre-alpha 02ankitgusai19ankitgusai19https://gitlab.torproject.org/tpo/core/onionmasq/-/issues/39Throughput statistics reporting2023-06-28T13:02:03ZetaThroughput statistics reportingWe'd like to be able to show some pretty graphs about how much traffic the app is using (I guess maybe both pre- and post-VPN, i.e. log both user traffic and Arti traffic)? @trinity-1686a pointed out the latter can be done easily by just...We'd like to be able to show some pretty graphs about how much traffic the app is using (I guess maybe both pre- and post-VPN, i.e. log both user traffic and Arti traffic)? @trinity-1686a pointed out the latter can be done easily by just wrapping the TCP sockets.
As for exporting this data; probably just over the JNI, but @ahf also suggested maybe exporting it as prometheus(-style) metrics over HTTP by having the VPN listen on its gateway IP. That might be neat to explore, if only for development and testing purposes.
This should also ideally be per-app, too!VPN pre-alpha 02https://gitlab.torproject.org/tpo/applications/vpn/-/issues/44Data usage chart2023-07-10T15:05:48Zmicahmicah@torproject.orgData usage chartWhen a user is connected to and routing their device’s apps over Tor they should be able to see a chart of my up/down data over time so they can monitor data usage and feel reassured their traffic is being successfully routed through TorWhen a user is connected to and routing their device’s apps over Tor they should be able to see a chart of my up/down data over time so they can monitor data usage and feel reassured their traffic is being successfully routed through TorVPN pre-alpha 02cybertacybertahttps://gitlab.torproject.org/tpo/applications/vpn/-/issues/36New Identity / New Circuit2023-07-07T17:34:33Zmicahmicah@torproject.orgNew Identity / New CircuitUsers wish to be able to obtain a new circuit for various reasons:
- browsing a website that may not respect their privacy, want to get a new circuit so subsequent browsing activity isn't linked to what was being done before
- a service...Users wish to be able to obtain a new circuit for various reasons:
- browsing a website that may not respect their privacy, want to get a new circuit so subsequent browsing activity isn't linked to what was being done before
- a service has blocked the IP address of the user's exit node, so they want to find an exit node that is not blocked
- the connection is slow, and the user wants to find a faster circuit to improve the speed
Note that most users in the study did not differentiate New Circuit from New Identity, and use it when the connection is slow, similarly to "restart" button on their VPN.
With the latest figma designs, refresh circuits can be triggered for:
- [x] all circuits and
- [x] for each app individually.VPN pre-alpha 02cybertacybertahttps://gitlab.torproject.org/tpo/applications/vpn/-/issues/109Leak Canary reports ConnectionFragment.binding as having a distinct leak2023-09-07T21:25:43Zmicahmicah@torproject.orgLeak Canary reports ConnectionFragment.binding as having a distinct leakUsing the 425e2e1d version of the vpn, on a google pixel 4a(5g), running calyxOS. I get often a LeakCanary reporting a problem.
This is the leak trace that I printed to Logcat, I am not sure which pieces are useful to share, but I can p...Using the 425e2e1d version of the vpn, on a google pixel 4a(5g), running calyxOS. I get often a LeakCanary reporting a problem.
This is the leak trace that I printed to Logcat, I am not sure which pieces are useful to share, but I can provide more:
```
08-11 11:14:04.988 22504 22535 D LeakCanary: LeakCanary is running and ready to detect memory leaks.
08-11 11:20:59.863 22504 22504 D LeakCanary:
08-11 11:20:59.863 22504 22504 D LeakCanary: ┬───
08-11 11:20:59.863 22504 22504 D LeakCanary: │ GC Root: System class
08-11 11:20:59.863 22504 22504 D LeakCanary: │
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ android.app.ActivityThread class
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (MainActivity↓ is not leaking and a class is never leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ static ActivityThread.sCurrentActivityThread
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ android.app.ActivityThread instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (MainActivity↓ is not leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ mInitialApplication instance of org.torproject.vpn.TorApplication
08-11 11:20:59.863 22504 22504 D LeakCanary: │ mSystemContext instance of android.app.ContextImpl
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ ActivityThread.mActivities
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ android.util.ArrayMap instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (MainActivity↓ is not leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ ArrayMap.mArray
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ java.lang.Object[] array
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (MainActivity↓ is not leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ Object[1]
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ android.app.ActivityThread$ActivityClientRecord instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (MainActivity↓ is not leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ activity instance of org.torproject.vpn.MainActivity with mDestroyed = false
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ ActivityThread$ActivityClientRecord.activity
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ org.torproject.vpn.MainActivity instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (ConnectionFragment↓ is not leaking and Activity#mDestroyed is false)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ mApplication instance of org.torproject.vpn.TorApplication
08-11 11:20:59.863 22504 22504 D LeakCanary: │ mBase instance of androidx.appcompat.view.ContextThemeWrapper
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ ComponentActivity.mOnConfigurationChangedListeners
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ java.util.concurrent.CopyOnWriteArrayList instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (ConnectionFragment↓ is not leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ CopyOnWriteArrayList[5]
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ androidx.fragment.app.FragmentManager$$ExternalSyntheticLambda0 instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (ConnectionFragment↓ is not leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ FragmentManager$$ExternalSyntheticLambda0.f$0
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ androidx.fragment.app.FragmentManagerImpl instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (ConnectionFragment↓ is not leaking)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ FragmentManager.mParent
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ org.torproject.vpn.ui.connectionsettings.ConnectionFragment instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: NO (Fragment#mFragmentManager is not null)
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Fragment.mTag=a2ad2df2-7ab9-468b-87c5-c4f7355dccb2
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ ConnectionFragment.binding
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ~~~~~~~
08-11 11:20:59.863 22504 22504 D LeakCanary: ├─ org.torproject.vpn.databinding.FragmentConnectionsettingsBindingImpl instance
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Leaking: UNKNOWN
08-11 11:20:59.863 22504 22504 D LeakCanary: │ Retaining 464.4 kB in 4226 objects
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ↓ ViewDataBinding.mRoot
08-11 11:20:59.863 22504 22504 D LeakCanary: │ ~~~~~
08-11 11:20:59.863 22504 22504 D LeakCanary: ╰→ androidx.coordinatorlayout.widget.CoordinatorLayout instance
08-11 11:20:59.863 22504 22504 D LeakCanary: Leaking: YES (ObjectWatcher was watching this because org.torproject.vpn.ui.connectionsettings.ConnectionFragment
08-11 11:20:59.863 22504 22504 D LeakCanary: received Fragment#onDestroyView() callback (references to its views should be cleared to prevent leaks))
08-11 11:20:59.863 22504 22504 D LeakCanary: Retaining 2.7 kB in 71 objects
08-11 11:20:59.863 22504 22504 D LeakCanary: key = 67f37547-40c5-44fc-a4de-cac3c33512f6
08-11 11:20:59.863 22504 22504 D LeakCanary: watchDurationMillis = 31085
08-11 11:20:59.863 22504 22504 D LeakCanary: retainedDurationMillis = 26083
08-11 11:20:59.863 22504 22504 D LeakCanary: View not part of a window view hierarchy
08-11 11:20:59.863 22504 22504 D LeakCanary: View.mAttachInfo is null (view detached)
08-11 11:20:59.863 22504 22504 D LeakCanary: View.mWindowAttachCount = 1
08-11 11:20:59.863 22504 22504 D LeakCanary: mContext instance of org.torproject.vpn.MainActivity with mDestroyed = false
08-11 11:20:59.863 22504 22504 D LeakCanary:
08-11 11:20:59.863 22504 22504 D LeakCanary: METADATA
08-11 11:20:59.863 22504 22504 D LeakCanary:
08-11 11:20:59.863 22504 22504 D LeakCanary: Build.VERSION.SDK_INT: 33
08-11 11:20:59.863 22504 22504 D LeakCanary: Build.MANUFACTURER: Google
08-11 11:20:59.863 22504 22504 D LeakCanary: LeakCanary version: 2.9.1
08-11 11:20:59.863 22504 22504 D LeakCanary: App process name: org.torproject.vpn
08-11 11:20:59.863 22504 22504 D LeakCanary: Class count: 25281
08-11 11:20:59.863 22504 22504 D LeakCanary: Instance count: 191621
08-11 11:20:59.863 22504 22504 D LeakCanary: Primitive array count: 129463
08-11 11:20:59.863 22504 22504 D LeakCanary: Object array count: 24642
08-11 11:20:59.863 22504 22504 D LeakCanary: Thread count: 29
08-11 11:20:59.863 22504 22504 D LeakCanary: Heap total bytes: 26478243
08-11 11:20:59.863 22504 22504 D LeakCanary: Bitmap count: 1
08-11 11:20:59.863 22504 22504 D LeakCanary: Bitmap total bytes: 222481
08-11 11:20:59.863 22504 22504 D LeakCanary: Large bitmap count: 0
08-11 11:20:59.863 22504 22504 D LeakCanary: Large bitmap total bytes: 0
08-11 11:20:59.863 22504 22504 D LeakCanary: Stats: LruCache[maxSize=3000,hits=115115,misses=192140,hitRate=37%]
08-11 11:20:59.863 22504 22504 D LeakCanary: RandomAccess[bytes=9849438,reads=192140,travel=93180964532,range=31716041,size=39433354]
08-11 11:20:59.863 22504 22504 D LeakCanary: Analysis duration: 9444 ms
```VPN pre-alpha 03cybertacybertahttps://gitlab.torproject.org/tpo/applications/vpn/-/issues/107Manually create screenshots for translations2023-10-06T21:25:29ZcybertaManually create screenshots for translationsTo help with the translation efforts we should provide some screenshots of the diffent views.
according to the conversation with @emmapeel (https://gitlab.torproject.org/tpo/applications/vpn/-/issues/60#note_2922351), visual context sho...To help with the translation efforts we should provide some screenshots of the diffent views.
according to the conversation with @emmapeel (https://gitlab.torproject.org/tpo/applications/vpn/-/issues/60#note_2922351), visual context should be given especially for small strings, containing variables as they are hard to understand.VPN pre-alpha 03kwadronautkwadronauthttps://gitlab.torproject.org/tpo/applications/vpn/-/issues/104Action Buttons not correctly centered2023-09-11T16:38:39ZcybertaAction Buttons not correctly centeredAfter switching the bottom tabs, the action buttons are not centered anymore
| before switching bottom tabs | after switching bottom tabs (tab on `Configure` and back to `Connect`) |
| ------ | ------ |
| ![buttons_correct](/uploads/b9a...After switching the bottom tabs, the action buttons are not centered anymore
| before switching bottom tabs | after switching bottom tabs (tab on `Configure` and back to `Connect`) |
| ------ | ------ |
| ![buttons_correct](/uploads/b9acc8067c216ade6d9e8666445b6a63/buttons_correct.png){width=30%} | ![buttons](/uploads/3e6cea84843aae167fe172c058dfbdea/buttons.png){width=30%} |VPN pre-alpha 03ankitgusai19ankitgusai19