TPA issueshttps://gitlab.torproject.org/groups/tpo/tpa/-/issues2022-11-12T20:48:24Zhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40934evaluate CPU/power usage of unattended upgrades2022-11-12T20:48:24Zanarcatevaluate CPU/power usage of unattended upgradesafter running unattended-upgrades for a while everywhere now, one has to wonder how well it works! in general, we've had some issues with packages updating and breaking (i'm looking at you grub), and restarts breaking things (i'm looking...after running unattended-upgrades for a while everywhere now, one has to wonder how well it works! in general, we've had some issues with packages updating and breaking (i'm looking at you grub), and restarts breaking things (i'm looking at you needrestart and open vswitch!), but otherwise it seems to work okay...
... but does it!?
on my laptop, i run debian bookworm, and i have found that it is *much* slower at resolving dependencies on large upgrades (say a couple hundred packages, which can definitely happen when you follow testing/unstable). and by slower I mean "it's taking a long time just resolving the dep tree or decision tree or whatever that thing is doing, while `apt upgrade` does the equivalent in an instant".
so i wonder if we're not wasting a microton of CPU usage (and therefore power and therefore destroying civilzation) because of this possible flaw.
because we've instrumented per-service metrics elsewhere (i'm looking at you postgresql), i figured we might be able to do something similar, which could help the entire community and, ultimately, possibly save us some resources as well.
the end goal here is that maybe we'd switch away from u-u, either when running testing or altogether. alternatives range from "just write your own damn systemd service" to "oh, is cronapt still around? that was cute"anarcatanarcathttps://gitlab.torproject.org/tpo/tpa/ci-templates/-/issues/14check dead links during CI builds2024-03-01T04:50:25ZKezcheck dead links during CI buildsin tpo/web/donate-static#93 mattlav reported a broken link on one of our pages. we can test for broken links pretty easily using something like python's html.parser
one of the cons is that we'd have to have CI make HTTP requests for eve...in tpo/web/donate-static#93 mattlav reported a broken link on one of our pages. we can test for broken links pretty easily using something like python's html.parser
one of the cons is that we'd have to have CI make HTTP requests for every single link, which could be a decent amount of traffic with how often we run CI buildsanarcatanarcathttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40909TPA-RFC-38 wiki replacement2024-03-20T18:27:42ZKezTPA-RFC-38 wiki replacementThis is the discussion ticket for [TPA-RFC-38: Setting Up a Wiki Service](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-38-new-wiki-service). This ticket serves as a place where people can suggest changes to the RFC, ...This is the discussion ticket for [TPA-RFC-38: Setting Up a Wiki Service](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-38-new-wiki-service). This ticket serves as a place where people can suggest changes to the RFC, as well as suggest goals and must-have features for the new wiki serviceanarcatanarcathttps://gitlab.torproject.org/tpo/tpa/triage-ops/-/issues/8Unifying part of the work flow for engineering teams2023-09-15T12:53:09ZAlexander Færøyahf@torproject.orgUnifying part of the work flow for engineering teamsWe have gotten to the point now where Triage Ops is used by Network Team, Anti-Censorship Team, and Applications Team. The Sysadmins uses it too, but they have slightly different workflows I believe.
In this ticket I am trying to write ...We have gotten to the point now where Triage Ops is used by Network Team, Anti-Censorship Team, and Applications Team. The Sysadmins uses it too, but they have slightly different workflows I believe.
In this ticket I am trying to write down what would be needed to unify some of our workflows.
## Assumptions
- Each team consists of a number of members.
- Each team have a team lead who can get slightly more of the annoying/boring tasks.
- Each Gitlab project is usually maintained either by the team in whole (the Applications Team does this it seems) or by a subset of the team (the Network Team does this: we have C Tor and Arti maintainers as two different subsets with some overlap).
## Issue Workflow
### New Ticket
When a new ticket is opened, we do the following:
1. If the issue gets marked "new contribution", we inform the submitter (with an @-highlight on Gitlab) about the workflow we are doing, a welcome to Tor, some info on our community, etc. (talk with the community team about this).
2. The issue gets marked for "~Needs Triage" with a label.
3. The issue gets put into the ~Backlog.
4. The issue is assigned to the team lead of the team (or another owner of the specific project).
## Merge Request workflow
### New Merge Request
1. If the MR gets marked "new contribution", we inform the submitter (with an @-highlight on Gitlab) about the workflow we are doing, a welcome to Tor, some info on our community, etc. (talk with the community team about this).
2. If this MR is a draft, write a message on the MR that the current MR is marked as a draft and the submitter has to remove the draft status before it will get assigned a reviewer. Be careful here to track state.
3. If the MR has the "Needs Review" label, etc. (see #2 for more info), don't auto-assign any reviewer.
4. Assign the submitter as assignee of the MR.
5. If it needs a reviewer, we will assign a reviewer to the MR.
## Convenience Features
### Random Assignee
It should be possible to mark a ticket with the label ~Random Assignee and have the bot remove the ~Random Assignee label and put an assignee in from the set of project maintainers there. This should make it easier for the team lead to bulk triage things where nobody is more qualified to solve things than others.🤖 Triage Bot 🤖🤖 Triage Bot 🤖https://gitlab.torproject.org/tpo/tpa/triage-ops/-/issues/7automatic restart of some failed CI jobs2022-10-03T18:28:13Ztrinity-1686aautomatic restart of some failed CI jobsSome CI errors are spurious, I'm thinking of things like [docker daemon errors](https://gitlab.torproject.org/Diziet/arti/-/jobs/172704#L1912) and "unable to resolve host gitlab.torproject.org", but there might be other recurring errors....Some CI errors are spurious, I'm thinking of things like [docker daemon errors](https://gitlab.torproject.org/Diziet/arti/-/jobs/172704#L1912) and "unable to resolve host gitlab.torproject.org", but there might be other recurring errors.
It would be nice if the bot could automatically restart the corresponding pipelines.
Docker issues fall under the failure_reason `runner_system_failure` so it shouldn't be hard to detect them, "unable to resolve host" when cloning are considered `stript_failure` so catching them would require parsing job logs.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/131gitlab logs hold too much information2023-06-28T18:43:31Zanarcatgitlab logs hold too much informationin https://gitlab.torproject.org/tpo/tpa/team/-/issues/40873 i found where gitlab keeps its logs and it turns out it keeps a lot of information, infringing on our normal policy to not even log the time of a request. we log IP addresses, ...in https://gitlab.torproject.org/tpo/tpa/team/-/issues/40873 i found where gitlab keeps its logs and it turns out it keeps a lot of information, infringing on our normal policy to not even log the time of a request. we log IP addresses, user agents, all sorts of garbage, and for a full month.
figure out how to tune that down at least a little.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40861deploy dynamic environments on the Puppet server2024-03-26T15:14:06Zanarcatdeploy dynamic environments on the Puppet serverWe should be able to push to a new branch and have that be a specific environment that can be ran only on a subset of machines.
I've done something like this in my home lab: code in /etc/puppet/code/production is the default, but i can ...We should be able to push to a new branch and have that be a specific environment that can be ran only on a subset of machines.
I've done something like this in my home lab: code in /etc/puppet/code/production is the default, but i can make new ones (currently by hand) in /etc/puppet/code/BRANCHNAME. It's pretty useful to avoid "YOLO" commits that plague our history, but can also be used for more sensitive deployments.
This probably depends on first creating a role account (#29663) and goes along validation checks (#31226). It could probably be done without any of those though...Puppet CIJérôme Charaouilavamind@torproject.orgJérôme Charaouilavamind@torproject.orghttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40859restore IPv6 service on btcpay2023-05-23T21:03:05Zanarcatrestore IPv6 service on btcpayin #40836, we found that HTTPS cert renewals were failing because let's encrypt couldn't reach the server over IPv6. that, in turn was caused by a [regression in Docker](https://github.com/moby/libnetwork/issues/2607) which was [filed a...in #40836, we found that HTTPS cert renewals were failing because let's encrypt couldn't reach the server over IPv6. that, in turn was caused by a [regression in Docker](https://github.com/moby/libnetwork/issues/2607) which was [filed as a bug in Debian](https://bugs.debian.org/1017477) as well.
the task here is to restore the IPv6 record in LDAP (`2604:8800:5000:82:466:38ff:fe07:b154`, already configured on the host) once proper behavior is restored. this could be done as part of the %"Debian 12 bookworm upgrade" or if the `docker.io` package gets an update to fix the above bug report in bullseye (which I think is rather unlikely).
So for now, just keep this in the bookworm backlog.
Checklist:
* [ ] make sure that the docker package is fixed, see https://bugs.debian.org/1017477 for the reproducer
* [ ] re-add the AAAA record (`2604:8800:5000:82:466:38ff:fe07:b154`) in LDAP
* [ ] uncomment the record in the reverse zone file
* [ ] monitor (force?) the certificate renewal to make sure that still worksDebian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/126Links to "Commented on issue" on GitLab user's activity use https on the onio...2022-08-29T20:38:20ZPier Angelo VendrameLinks to "Commented on issue" on GitLab user's activity use https on the onion serviceIf you go to GitLab user's activity from the onion service, the "Commented on issue #..." entries have a HTTPS URL instead of a HTTP one.
![Screenshot_from_2022-06-14_15-05-20](/uploads/40bc753ac596de727444b4c1030c5e96/Screenshot_from_2...If you go to GitLab user's activity from the onion service, the "Commented on issue #..." entries have a HTTPS URL instead of a HTTP one.
![Screenshot_from_2022-06-14_15-05-20](/uploads/40bc753ac596de727444b4c1030c5e96/Screenshot_from_2022-06-14_15-05-20.png)
All the other links seem to work.Jérôme Charaouilavamind@torproject.orgJérôme Charaouilavamind@torproject.orghttps://gitlab.torproject.org/tpo/tpa/nextcloud/-/issues/8Features in NC Polls not working2022-06-06T18:56:05ZErin WyattFeatures in NC Polls not workingHello,
A couple of features are not working on NC Polls:
1) When inviting internal people to a poll, no one was notified. I manually added each person to the poll in NC itself and not a single person received a notification in NC or an...Hello,
A couple of features are not working on NC Polls:
1) When inviting internal people to a poll, no one was notified. I manually added each person to the poll in NC itself and not a single person received a notification in NC or an email notification.
2) The download/export feature doesn't work. I tried in 3 different browsers with all privacy-enhancing features turned off, and could not get any of the download options to work.
Thank you!https://gitlab.torproject.org/tpo/tpa/team/-/issues/40755TPA-RFC-33: monitoring system upgrade or replacement2024-03-19T20:17:23ZanarcatTPA-RFC-33: monitoring system upgrade or replacementin #29864, we've gone pretty deep in comparisons between prometheus and icinga and how the first could replace the latter.
but now we're stuck at "i like this one better than the other" because we don't have a clear set of requirements....in #29864, we've gone pretty deep in comparisons between prometheus and icinga and how the first could replace the latter.
but now we're stuck at "i like this one better than the other" because we don't have a clear set of requirements.
the task here is to write a set of requirements for the new alerting system and, ultimately, make a proposal for the replacement of the deprecated Icinga 1 deployment we have now.
* [ ] establish requirements
* [ ] approve requirements
* if replacing icinga:
* [ ] review #29864 for ideas and tasks
* [ ] decide whether we keep the prometheus1/2 distinction
* [ ] deploy alert manager on prometheus1
* [ ] reimplement the Nagios alerting commands (optional?)
* [ ] send Nagios alerts through the alertmanager (optional?)
* [ ] rewrite (non-NRPE) commands (9) as Prometheus alerts
* [ ] scrape the NRPE metrics from Prometheus (optional)
* [ ] create a dashboard and/or alerts for the NRPE metrics (optional)
* [ ] review the NRPE commands (300+) to see which one to rewrite as Prometheus alerts
* [ ] turn off the Icinga server
* [ ] remove all traces of NRPE on all nodes
* if keeping icinga
* [ ] review work from @weasel done on DSA's Puppet/Icinga integration
* [ ] deploy that module or another inciga module inside puppet
* [ ] rewrite all the checks from the `nagios-master.cfg` file into puppet (300+)
* [ ] rebuild a new Icinga 2 server
* [ ] retire the old Icinga 1 serverold service retirement 2023anarcatanarcathttps://gitlab.torproject.org/tpo/tpa/triage-ops/-/issues/6Optionally, don't assign tickets where CI fails2022-09-01T13:28:10ZNick MathewsonOptionally, don't assign tickets where CI failsFor Arti, we'd like the option to _not_ automatically assign review on tickets where CI is failing. I'm not sure if we'd want to turn it on or not; sometimes CI is broken on `main` for a day here and there.
Conceivably this tool should...For Arti, we'd like the option to _not_ automatically assign review on tickets where CI is failing. I'm not sure if we'd want to turn it on or not; sometimes CI is broken on `main` for a day here and there.
Conceivably this tool should write a message on the ticket when it decides not to assign it, so that things don't get confusing. But that might lead to a more general "helpful message bot"; not that that would be a bad thing.
It might also be a good idea to have this rule apply to some users and not others: I'm unsure about declining to assign reviewers to less experienced volunteers, since sometimes they need communication to understand what CI is telling them.
@trinity-1686a asked to get an @ on this ticket.Alexander Færøyahf@torproject.orgAlexander Færøyahf@torproject.orghttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40747remove pre-bookworm legacy code in Puppet2023-09-26T20:39:59Zanarcatremove pre-bookworm legacy code in PuppetWe have a bunch of code in our puppet codebase that checks for pre-bullseye releases. Remove that code when we finish upgrading to bullseye.
an example:
```
modified modules/profile/manifests/prometheus/bind_exporter.pp
@@ -4,6 +4,14...We have a bunch of code in our puppet codebase that checks for pre-bullseye releases. Remove that code when we finish upgrading to bullseye.
an example:
```
modified modules/profile/manifests/prometheus/bind_exporter.pp
@@ -4,6 +4,14 @@ class profile::prometheus::bind_exporter(
) {
$package_ensure = $ensure ? { 'present' => 'installed', 'absent' => 'absent' }
$service_ensure = $ensure ? { 'present' => 'running', 'absent' => 'stopped' }
+
+ # this should be dropped when bind exporter is upgraded to 0.3.0
+ # or later (BULLSEYE) everywhere
+ if versioncmp($facts['os']['release']['major'], '11') >= 0 {
+ $extra_options = ''
+ } else {
+ $extra_options = "-bind.pid-file /var/run/named/named.pid -bind.stats-groups 'server,view,tasks'"
+ }
class { 'prometheus::bind_exporter':
package_ensure => $package_ensure,
service_ensure => $service_ensure,
@@ -17,8 +25,10 @@ class profile::prometheus::bind_exporter(
package_name => 'prometheus-bind-exporter',
service_name => 'prometheus-bind-exporter',
# purge_config_dir => true,
- # should be dropped when bind exporter is upgraded to 0.3.0 or later (BULLSEYE)
- extra_options => "-bind.pid-file /var/run/named/named.pid -bind.stats-groups 'server,view,tasks'",
+
+ # this should be dropped when bind exporter is upgraded to 0.3.0
+ # or later (BULLSEYE) everywhere
+ extra_options => $extra_options,
}
# realize the allow rules defined on the prometheus server(s)
```
in the above code, we're remove first `if` block, and the `extra_options =>` class parameter passed to `prometheus::bind_exporter` completely.
but in general, grep for:
* [ ] `buster` (case-insensitive)
* [ ] `bullseye` (case-insensitive)
* [ ] `bookworm` (case-insentitive)
* [ ] `versioncmp`
* [ ] `lsbmajdistrelease`
* [ ] `lsbdistcodename`Debian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/gitlab/-/issues/124Making GitLab more searchable for Tor Log entries2022-05-05T23:57:17ZcypherpunksMaking GitLab more searchable for Tor Log entriesComment on GitLab Layout:
Gitlab issues would be easier to search if the List overview contained a "symptom"-column or a search- & sort-able subtitle that matched the "symptom"s appearing in Tor Log since this is what people would copy/p...Comment on GitLab Layout:
Gitlab issues would be easier to search if the List overview contained a "symptom"-column or a search- & sort-able subtitle that matched the "symptom"s appearing in Tor Log since this is what people would copy/paste from Tor Log and search for.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40743Serve .onion links on our static websites accessed via .onion2023-04-11T17:51:02ZJérôme Charaouilavamind@torproject.orgServe .onion links on our static websites accessed via .onionWhen visiting a TPO website via its .onion address, many of the links on those pages point to Internet adresses instead of their corresponding .onion address. For example, visiting the .onion for `blog.torproject.org`, http://pzhdfe7jrak...When visiting a TPO website via its .onion address, many of the links on those pages point to Internet adresses instead of their corresponding .onion address. For example, visiting the .onion for `blog.torproject.org`, http://pzhdfe7jraknpj2qgu5cz2u3i4deuyfwmonvzu5i3nyw4t4bmg7o5pad.onion displays navigation links on the top of the page that point to the canonical domain instead of the .onion adresses. When a Tor Browser clicks a link pointing to one of these sites, it will hit an exit node and if automatic .onion redirection is enabled, the `Onion-Location` header will cause the page to be loaded a second time. This leads to unnecessary delays when navigating and hopping between different TPO websites.
In tpo/tpa/team#40667 we discussed two approaches to fixing this problem:
* modify the pages dynamically when serving them through the .onion address
* build an "onionized" version of all our static websites and host those on a different `DocumentRoot`https://gitlab.torproject.org/tpo/tpa/team/-/issues/40723port puppet-managed configs to Debian bullseye2023-09-26T20:43:33Zanarcatport puppet-managed configs to Debian bullseyeas part of the %"Debian 11 bullseye upgrade", we have a bunch of (sometimes really old) configuration that needs to be ported to the new stuff that's in bullseye. I have identified, so far:
* [ ] `/etc/apt/apt.conf.d/50unattended-upgra...as part of the %"Debian 11 bullseye upgrade", we have a bunch of (sometimes really old) configuration that needs to be ported to the new stuff that's in bullseye. I have identified, so far:
* [ ] `/etc/apt/apt.conf.d/50unattended-upgrades`: to be investigated. probably mostly whitespace changes, but also possibly missing features. complicated by the fact that this is a third party Puppet module and would require significant work to catchup with the Debian package
* [ ] `/etc/unbound/unbound.conf`: switch to `include-toplevel` after the fleet is upgraded (does not work in buster)
* [x] `/etc/sudoers`: use `@include` instead of `#include`, former added only in bullseye and later. should be split out in a `sudoers.d` file to avoid future conflicts and, generally, split in snippets per service instead of this monolithic file
* [ ] `/etc/syslog-ng/syslog-ng.conf`: silly version number logic in the template, needs to be ported to newer config or replaced with rsyslog or journald
* [x] ~~`/etc/ferm/ferm.conf`: `web-cymru-01` had diffs pending from the previous upgrade (presumably?), might be worth catching up to *buster*, that is, unless we just ditch ferm completely (#40554)~~ the latter
* [ ] `/etc/lvm/lvm.conf`: same as above
* [x] ~~`/etc/bind/named.conf.options`: TBD, on fallax~~ fallax retired
if a file is added in the above list, do not forget to add it to the [conflicts resolution list](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/upgrades/bullseye#conflicts-resolution) in the upgrade procedure.
more such issues could come up, but for now that's what I got. for now the diff for those has been minimized as much as possible and the proposed version from the Debian package should generally be ignored.Debian 11 bullseye upgradehttps://gitlab.torproject.org/tpo/tpa/nextcloud/-/issues/3Set up LDAP authn for nc.tpn2022-04-06T20:09:53ZLinus Nordberglinus@torproject.orgSet up LDAP authn for nc.tpnAll LDAP users should have a NC account.
Can this be done using the "LDAP User and group backend" application?All LDAP users should have a NC account.
Can this be done using the "LDAP User and group backend" application?https://gitlab.torproject.org/tpo/tpa/nextcloud/-/issues/2nextcloud collaborative "pad" synchronization breaks down with multiple users2023-07-24T13:27:13Zanarcatnextcloud collaborative "pad" synchronization breaks down with multiple usersToday, in both the metrics and vegas meetings, we had problems using the "nextcloud pad", that collaborative, WYSIWYG, "markdown" text editor. We're trying to use this as a replacement for Storm's Etherpad installation and it worked the ...Today, in both the metrics and vegas meetings, we had problems using the "nextcloud pad", that collaborative, WYSIWYG, "markdown" text editor. We're trying to use this as a replacement for Storm's Etherpad installation and it worked the last meeting.
But now we were over 12 people in the pad and it didn't work so well. The main problem is synchronization, e.g. one user would "strikethrough" some text but others wouldn't see the style change. This is particularly critical in Vegas meetings as we use that style information to carry information about how the meeting should proceed.
There are also issues scrolling around the content using cursor keys (e.g. pgdown jumps to the end of the document instead of one page down), but that's probably unrelated.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40695upgrade or rebuild hetzner-hel1-01 (nagios/icinga)2023-11-21T22:45:00Zanarcatupgrade or rebuild hetzner-hel1-01 (nagios/icinga)Nagios is going to be a particularly tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
We need to decide whether we keep icinga around at all or replace it with Prometheus (https://gitla...Nagios is going to be a particularly tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
We need to decide whether we keep icinga around at all or replace it with Prometheus (https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864). if we do keep icinga, we need to decide whether we keep the current "push to git to rebuild the config" model or "puppetize the setup" (https://gitlab.torproject.org/tpo/tpa/team/-/issues/32901).Debian 11 bullseye upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40694upgrade eugeni to bullseye2023-11-15T21:33:42Zanarcatupgrade eugeni to bullseyeeugeni is going to be a tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
we might want to decide what to do with mailman (https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471) and...eugeni is going to be a tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
we might want to decide what to do with mailman (https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471) and schleuder (https://gitlab.torproject.org/tpo/tpa/team/-/issues/40564) *before* we do the upgrade. mailman 2, in particular, is EOL so we *will* need to upgrade or replace it.
we might also want to consider the impact of the %"improve mail services" roadmap here. it's possible we might want to completely rebuild eugeni in different components instead of upgrading it.Debian 11 bullseye upgrade