TPA issueshttps://gitlab.torproject.org/groups/tpo/tpa/-/issues2022-04-06T21:00:59Zhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40404establish policy for email services2022-04-06T21:00:59Zanarcatestablish policy for email serviceswe've had a few examples recently where I questioned support requests about email aliases, for example #40395, #40391, and especially #40378 vs #40348.
in general, the broad question here is, for email services when should we use one of...we've had a few examples recently where I questioned support requests about email aliases, for example #40395, #40391, and especially #40378 vs #40348.
in general, the broad question here is, for email services when should we use one of those services:
* mailman
* forwards (ie. `tor-puppet.git/modules/postfix/files/virtual`)
* schleuder
* RT
* Discourse
* CiviCRM
* GitLab
In other words, there's a *lot* of stuff that can receive and forward email. When should we use which?improve mail serviceshttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40405consider disabling read/write work queues on SSD devices2024-02-08T16:21:05Zanarcatconsider disabling read/write work queues on SSD devicesseems like we could do a significant (twofold, [according to cloudflare](https://blog.cloudflare.com/speeding-up-linux-disk-encryption/)) performance improvement on SSD drives if we disable "work queues" in dm-crypt, by specifying `no-re...seems like we could do a significant (twofold, [according to cloudflare](https://blog.cloudflare.com/speeding-up-linux-disk-encryption/)) performance improvement on SSD drives if we disable "work queues" in dm-crypt, by specifying `no-read-workqueue` and `no-write-workqueue` in `/etc/crypttab`. this is available with kernels starting with Linux 5.9, so maybe this needs to wait until the bullseye upgrade, however.
The [arch wiki](https://wiki.archlinux.org/) has [good documentation on how to enable this][docs].
[docs]: https://wiki.archlinux.org/title/Dm-crypt/Specialties#Disable_workqueue_for_increased_solid_state_drive_(SSD)_performanceDebian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40416long server names crash the backup server2022-04-07T16:02:52Zanarcatlong server names crash the backup serverin tpo/tpa/team#40364 I went a little overboard and created a server named:
static-gitlab-shim-source.torproject.org
I even thought of adding a -01 in there. That short name (`static-gitlab-shim-source`) is 25 characters long which...in tpo/tpa/team#40364 I went a little overboard and created a server named:
static-gitlab-shim-source.torproject.org
I even thought of adding a -01 in there. That short name (`static-gitlab-shim-source`) is 25 characters long which leads to a label on the backup server that crashes Bacula:
Sep 24 17:14:45 bacula-director-01 bacula-dir[1467]: Config error: name torproject-static-gitlab-shim-source.torproject.org-full.${Year}-${Month:p/2/0/r}-${Day:p/2/0/r}_${Hour:p/2/0/r}:${Minute:p/2/0/r} length 130 too long, max is 127
Now: maybe I should have used a shorter server name (and I have since retired the box). But it seems to me that a single server with a bad configuration shouldn't hang the entire backup server.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40421enhance incident response procedures2024-02-13T16:04:39Zanarcatenhance incident response procedurestoday we had an ... interesting situation with the puppet infrastructure. while we have actually recovered pretty well, all things considered, it would be important to enhance our response to such situation so that they are less stressfu...today we had an ... interesting situation with the puppet infrastructure. while we have actually recovered pretty well, all things considered, it would be important to enhance our response to such situation so that they are less stressful and why not, even more "fun", if i can be so daring.
some background reading:
* [Got game? Secrets of great incident management](https://bitfieldconsulting.com/blog/got-game-secrets-of-great-incident-management)
* [pager duty incident response documentation](https://response.pagerduty.com/)
some ideas:
* have an issue template for incidents (so, in git, which requires a git repository here, but maybe it's finally time to merge the wiki repo here anyways), available offline
* run simulations/games
* have post-mortem templates, here's the [pager duty template](https://response.pagerduty.com/after/post_mortem_template/)
* gitlab has some [incident management primitives](https://docs.gitlab.com/ee/operations/incident_management/) including aforementioned "[incidents](https://docs.gitlab.com/ee/operations/incident_management/incidents.html)" (which are really just issues)...
* ... but also [integrations](https://docs.gitlab.com/ee/operations/incident_management/integrations.html) which is especially interesting considering they have *native* Prometheus integration, which might require switching from nagios to prometheus (#29864)
anyways, the core idea here is:
1. have incident roles (note-taker, driver, comms, etc)
2. incident and post-mortem templates
3. run gameshttps://gitlab.torproject.org/tpo/tpa/anon_ticket/-/issues/50Rejected notes are shown as pending on fulltext page.2021-10-29T13:13:31ZcypherpunksRejected notes are shown as pending on fulltext page.If some note is rejected by moderator, it becomes listed as such on user's landing page. There is also "see full note text" link on its row. Clicking on it, I get a page with text of that note, but with words "Pending Note" in header.If some note is rejected by moderator, it becomes listed as such on user's landing page. There is also "see full note text" link on its row. Clicking on it, I get a page with text of that note, but with words "Pending Note" in header.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471upgrade mailman to mailman 32024-02-20T17:06:19Zanarcatupgrade mailman to mailman 3Mailman 2 was removed from Debian bullseye, we need to either upgrade to Mailman 3 or get rid of it. This is part of the 2022-Q1/Q2 OKRs and the %"Debian 11 bullseye upgrade" milestone.
upgrade procedure: https://docs.mailman3.org/en/l...Mailman 2 was removed from Debian bullseye, we need to either upgrade to Mailman 3 or get rid of it. This is part of the 2022-Q1/Q2 OKRs and the %"Debian 11 bullseye upgrade" milestone.
upgrade procedure: https://docs.mailman3.org/en/latest/migration.htmlDebian 11 bullseye upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40479scale out GitLab to 2k users2024-03-07T21:48:54Zanarcatscale out GitLab to 2k usersIt seems like we have outgrown the initial [reference architecture](https://docs.gitlab.com/ee/administration/reference_architectures/) we were previously on, which was ([up to 1,000 users](https://docs.gitlab.com/ee/administration/refer...It seems like we have outgrown the initial [reference architecture](https://docs.gitlab.com/ee/administration/reference_architectures/) we were previously on, which was ([up to 1,000 users](https://docs.gitlab.com/ee/administration/reference_architectures/1k_users.html)) 8 vCPU, and 7.2 GB memory on a single server. We've already bumped memory to 16GB and are still swapping.
Let's look at how to scale out this thing. The next reference architecture is "[up to 2k users](https://docs.gitlab.com/ee/administration/reference_architectures/2k_users.html)", and it involves the following:
* load balancer: 2 vCPU, 1.8 GB memory (they suggest haproxy)
* postgresql: 2 vCPU, 7.5 GB memory
* redis: 1 vCPU, 3.75 GB memory
* gitaly: 4 vCPU, 15 GB memory
* gitlab rails: 2 x 8 vCPU, 7.2 GB memory
* object storage: unspecified
This therefore involves at least 5 machines, 25 cores, and 42GB of memory. We currently use 8 cores and 16GB of memory, so this would almost triple the hardware usage, and add significant complexity.
On the upside, however, we would have a saner GitLab deployment in the sense that we could reuse our existing components (e.g. the postgresql backups, tpo/tpa/gitlab#20). And from there on, it's much easier to reliably scale the box, restart components, and work on high availability. Each one of those components can be made redundant fairly trivially (apart from maybe postgresql, but their [3k user architecture](https://docs.gitlab.com/ee/administration/reference_architectures/3k_users.html) solves this with [pgbouncer](https://www.pgbouncer.org/) although it also needs to introduce more complexity with things like [consul](https://www.consul.io/) (a service mesh) and [sentinel](https://redis.io/topics/sentinel) (redis HA).
it should be noted, however, that the instructions still use the GitLab Omnibus package, so we do not get away from that level of complexity, unfortunately.(next) cluster scalinghttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40492consider bcrypt or yescrypt for password hashing after bullseye upgrade2024-02-08T16:21:09Zanarcatconsider bcrypt or yescrypt for password hashing after bullseye upgradein #30608 we were forced to downgrade to SHA for hashing our (mail) passwords. that's really too bad, and it's basically only because `crypt(3)` doesn't support bcrypt or better (yescrypt!) in Debian buster.
once we're upgraded (basical...in #30608 we were forced to downgrade to SHA for hashing our (mail) passwords. that's really too bad, and it's basically only because `crypt(3)` doesn't support bcrypt or better (yescrypt!) in Debian buster.
once we're upgraded (basically everywhere, but we could do it only on the submission server for starters), implement the logic to build bcrypt-specific (or yescrypt?) in userdir-ldap-cgi. the caller is in `update.cgi` (grep for `Salt`) and the definition is in `Util.pm`. we should probably create a new function for more complex salts like bcrypt and yescrypt because the actual "settings" (what comes after `$y$`) are not exactly similar than for md5/sha (e.g. salts are separated from the hashed password with `$` in SHA, not so in bcrypt, from what i understand.
in any case, this needs experimentation. this is the code i had for bcrypt:
my $bcrypt = Digest->new('Bcrypt', cost=>12, salt=>rand_bits(16*8));
my $hashed_password = crypt($password, $bcrypt->settings());
note that I don't actually *trust* `rand_bits` anymore, after reading the [Data::Entropy::Algorithms](https://metacpan.org/pod/Data::Entropy::Algorithms) documentation. turns out it relies on [Data::Entropy](https://metacpan.org/pod/Data::Entropy) and *that* says:
> If nothing is done to set a source then it defaults to the use of Rijndael (AES) in counter mode (see Data::Entropy::RawSource::CryptCounter and Crypt::Rijndael), keyed using Perl's built-in rand function. This gives a data stream that looks like concentrated entropy, but really only has at most the entropy of the rand seed. Within a single run it is cryptographically difficult to detect the correlation between parts of the pseudo-entropy stream. If more true entropy is required then it is necessary to configure a different entropy source.
And *then* [rand()](https://perldoc.perl.org/functions/rand) says:
> rand is not cryptographically secure. You should not rely on it in security-sensitive situations. As of this writing, a number of third-party CPAN modules offer random number generators intended by their authors to be cryptographically secure, including: Data::Entropy, Crypt::Random, Math::Random::Secure, and Math::TrulyRandom.
and now we have inception. brilliant.Debian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40494email deliverability monitoring2023-12-20T19:06:51Zanarcatemail deliverability monitoringAs part of the DKIM/SPF/etc plan (tpo/tpa/team#40363) and the %"improve mail services" OKR, it would be critical to have metrics that show whether or not mail is actually getting delivered to major providers, which are a key problem we'r...As part of the DKIM/SPF/etc plan (tpo/tpa/team#40363) and the %"improve mail services" OKR, it would be critical to have metrics that show whether or not mail is actually getting delivered to major providers, which are a key problem we're having right now with email delivery (e.g. #40484, https://gitlab.torproject.org/tpo/tpa/team/-/issues/34134, https://gitlab.torproject.org/tpo/tpa/team/-/issues/40149, https://gitlab.torproject.org/tpo/tpa/team/-/issues/40170).
There are a few parts to this:
* end-to-end deliverability tests
* feedback loops
* blocklist checks
# Deliverability tests
A simple monitoring system we might want to implement, is an end-to-end deliverability test which would send email from point X (says lists.tpo, eugeni, submission server, or CiviCRM) and check mailbox on provider Y and Z (say hotmail, gmail, etc) to see if the email arrives.
To implement this, create accounts on:
* hotmail/live.com/microsoft
* yahoo.com (also covers Verizon now, bizarrely)
* gmail.com
Nowadays, this might require an actual phone number, so we could get at VoIP provider like voip.ms. There, the [prices currently are](https://voip.ms/en/phone-numbers/canada):
* toll-free: 1.25\$/mth and 0.027\$/min.
* canada: 0.85\$/mtn and 0.009\$/min
I mention toll-free numbers because those could eventually be useful if we want to provide support over phone. This is something that @irl suggested because they use it to help people with censorship circumvention (or at least reporting): when the internet is down, no one can send email to tell you, but they *can* send text messages or voicemail sometimes... I also looked at toll-free numbers in europe and africa (germany and egypt, in particular), and both are somewhat expenseive (25-15$/mth) so maybe not worth it for now.
In any case, at this stage the phone service would be strictly to register the accounts.
Once we have an account, we need to setup monitoring. This can take a few forms:
* Nagios/Icinga: [check_email_delivery](http://buhacoff.net/software/check_email_delivery/) (in Debian [nagios-plugins-contrib](https://tracker.debian.org/pkg/nagios-plugins-contrib), [check_email_loop](https://github.com/setec/check-email-loop)
* Prometheus: [a/i service-prober](https://git.autistici.org/ai3/tools/service-prober)
* Manual: https://code.mayfirst.org/mfmt/filter-check
Might be more.
# Feedback loops
The point of this is to have a place where we collect failure reports from various providers. Those can take many forms:
* DMARC reports
* Hetzner email feedback loops
* TLS failure reports
This is tricky. I have enabled DMARC on my personal domain and regularly receive DMARC delivery reports. They are not human-readable and, even with a parser like [dmarc-cat](https://github.com/keltia/dmarc-cat), it's hard to figure out what is a legitimate misconfiguration on our end and what's active spoofing attacks from the outside. I suspect the reports coming out of torproject.org would be monstruous, so they would necessarily need to be somehow aggregated. Here are some aggregation tools:
* [dmarcts-report-parser](https://github.com/techsneeze/dmarcts-report-parser): can parse an IMAP mailbox, stores results in a database, has a [web UI](https://www.techsneeze.com/dmarc-report/), Perl, in Debian but out of date, no upstream release
* [lafayette](https://github.com/LinkedInAttic/lafayette): deployment unclear, documentation rather minimal
* [parsedmarc](https://github.com/domainaware/parsedmarc): python, IMAP inbox input, JSON/CSV output, integrates with ElasticSearch, Splunk, Kafka, [Grafana](https://grafana.com/grafana/dashboards/11227) (but not prometheus)
* [dmarcs-metrics-exporter](https://pypi.org/project/dmarc-metrics-exporter/): Prometheus exporter, scrapes IMAP inbox, good-looking metrics, some Grafana dashboard included
* [reports-collector](https://git.autistici.org/ai3/tools/reports-collector): supports DMARC, but also (SMTP) TLS-RPT, HTTP (CSP, etc), digests reports by HTTP or SMTP, turns reports into JSON, they use Kibana to process it
We already process the hetzner feedback loops with a [handle-abuse.py script](https://gitweb.torproject.org/admin/tsa-misc.git/tree/handle-abuse.py) which we run by hand. It doesn't cover mailing lists complaints yet but it can deal with CiviCRM messages that are filed as spam instead of bounced.
# Block list checks
Finally, we need to make sure we're not listed on major block lists. In nagios, there's [check_rbl](https://github.com/matteocorti/check_rbl), part of the [monitoring-plugins-contrib Debian package](https://tracker.debian.org/pkg/nagios-plugins-contrib).improve mail serviceshttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40499GitLab CI best practice & builds directory2021-11-11T14:13:37ZJérôme Charaouilavamind@torproject.orgGitLab CI best practice & builds directoryAs several of the lingering docker volumes appear to contain stuff from `/builds`, we should probably take note of the following recommendation from GitLab:
> GitLab Runner does not stop you from storing things inside of the Builds Dire...As several of the lingering docker volumes appear to contain stuff from `/builds`, we should probably take note of the following recommendation from GitLab:
> GitLab Runner does not stop you from storing things inside of the Builds Directory. For example, you can store tools inside of /builds/tools that can be used during CI execution. We HIGHLY discourage this, you should never store anything inside of the Builds Directory. GitLab Runner should have total control over it and does not provide stability in such cases. If you have dependencies that are required for your CI, we recommend installing them in some other place.
Seen at https://docs.gitlab.com/runner/best_practice/#build-directory
I couldn't find more information about the reason behind this recommendation.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40518rethink gitlab backup strategy2024-02-20T16:08:54Zanarcatrethink gitlab backup strategywe are currently backing up everything in GitLab twice: once through Bacula, and another time through the `gitlab-backup` script. in tpo/tpa/team#40517 we at least pulled artifacts out of this, but we should think real hard about whether...we are currently backing up everything in GitLab twice: once through Bacula, and another time through the `gitlab-backup` script. in tpo/tpa/team#40517 we at least pulled artifacts out of this, but we should think real hard about whether or not we need the `gitlab-backup` script at all, because it duplicates things and wastes CPU cycles.
my preference would be to have rotating ZFS snapshots (because LVS would be too costly in performance) on this server. one snapshot every 10 minute for the last 10 minute, another every hour, for the last 24h, and then Bacula backs up the latest available snapshot. that way bacula backups are consistent. we could even implement some flushing of the postgresql database to ensure it's consistent as well.
this would completely remove the need for the `gitlab-backup` script, and would also mitigate tpo/tpa/gitlab#20 to a certain extent.
it's a significant re-engineering effort, however: it might be simpler to just implement tpo/tpa/gitlab#20 and use regular postgresql backups combined with bacula, and hope for the best in terms of consistency. we use GitLab so much though that I would really like to be able to easily go back in time in smaller chunks than what bacula offers.
the backups to review are:
- [x] PostgreSQL databases: moved to our normal backup system (#41426)
- [ ] Git repositories: covered by bacula, risk of "corrupt" git repositories on disaster recovery (e.g. partial writes like "a ref was uploaded but not its blob" or "a part of a blob was uploaded"), see https://gitlab.com/gitlab-org/gitlab/-/issues/432743 for a discussion
- [ ] Blobs: currently on disk, assumed to be safe to backup by bacula, but also covered by the rake task, could be moved to object storage and rely on that for backups, those are:
- [ ] uploads
- [ ] builds
- [ ] artifacts
- [ ] pages
- [ ] lfs
- [ ] terraform states (!?)
- [ ] packages
- [ ] ci secure files
- [ ] Container registry: same, currently in object storage without backups
- [x] Configuration files: backed up by bacula, assumed safe
- [ ] Other data: mainly redis for the job queue and elastic search for the advanced search, we don't use the latter and the former we could probably live withoutanarcatanarcathttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40523ganeti is losing ram?2022-04-06T21:07:42Zanarcatganeti is losing ram?the new shadow runner was created with 300G of memory (was that GiB or GB?), yet inside it says 295GiB are available:
```
root@ci-runner-x86-05:~# top -n 1 | grep MiB\ Mem; free -h | grep ^Mem
MiB Mem : 302104.9 total, 300003.9 free, ...the new shadow runner was created with 300G of memory (was that GiB or GB?), yet inside it says 295GiB are available:
```
root@ci-runner-x86-05:~# top -n 1 | grep MiB\ Mem; free -h | grep ^Mem
MiB Mem : 302104.9 total, 300003.9 free, 290.4 used, 1810.6 buff/cache
Mem: 295Gi 290Mi 292Gi 0.0Ki 1.8Gi 292Gi
```
Those are clearly marked 295 Gibyte, which is 295 * 2^30 bytes. Yet on the parent host, we see this:
```
root@chi-node-04:~# cat /proc/139683/stat | cut -d ' ' -f 23
329177686016
```
... that is field 23 of the `stat` file in procfs, which according to Table 1-4 in [this documentation](https://www.kernel.org/doc/html/latest/_sources/filesystems/proc.rst.txt) is the `vsize` fields, which *seems* to be in bytes. it would amount to 306GiB or 317GB.
I have so many questions...
1. why is the vsize 306GiB and not 300GiB?
2. why doesn't it match the available memory inside the VM?
I never noticed until now because the differences were insignificant, but 5GiB (or is it 10GiB?) seems like a lot of memory to just lose on your way to school...https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/issues/18files in dangerzone/processing silently block sanitization2021-12-07T18:58:50Zanarcatfiles in dangerzone/processing silently block sanitizationwhen we have (say) file FOOBAR in dangerzone/processing/FOOBAR, it blocks processing of file FOOBAR, which just stays stuck there, which leaves people to believe dangerzone-bot is stuck (e.g. #17). this is confusing, and while it can hap...when we have (say) file FOOBAR in dangerzone/processing/FOOBAR, it blocks processing of file FOOBAR, which just stays stuck there, which leaves people to believe dangerzone-bot is stuck (e.g. #17). this is confusing, and while it can happen during a crash (e.g. #14), it can also happen during reboots (!) which is probably what happened in #17.
so we need to handle this situation better: at the very least we need to have better feedback for the operator, because i couldn't figure out what was going on just by looking at the logs:
```
Dec 03 17:23:26 dangerzone-01 systemd[1]: Starting Dangerzone WebDAV processor...
Dec 03 17:23:26 dangerzone-01 dangerzone-webdav-processor[2491]: authenticated with webdav https://nc.torproject.net/remote.php/dav/files/dangerzone-bot/
Dec 03 17:23:31 dangerzone-01 dangerzone-webdav-processor[2491]: sanitizing CVs (4)/ Candidate 33/
Dec 03 17:23:32 dangerzone-01 dangerzone-webdav-processor[2491]: sanitizing CVs (4)/ Candidate 35/
Dec 03 17:23:37 dangerzone-01 systemd[1]: dangerzone-webdav-processor.service: Succeeded.
Dec 03 17:23:37 dangerzone-01 systemd[1]: Started Dangerzone WebDAV processor.
```
notice those "sanitizing" messages there? those are confusing too, and unrelated. i have been staring at those three lines of code for a while now:
```
logging.info("sanitizing %s %s", folder, path)
# non-empty folder or regular file
if len(listing) > 1 or not path.endswith("/"):
```
i cannot fathom what the logic is in there. we'd need to look at those actual folders to see what's going on, but it could be just a distraction from this issue.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40539check SPF/DKIM/DMARC records on incoming mail2023-08-08T14:33:50Zanarcatcheck SPF/DKIM/DMARC records on incoming mailas part of the %"improve mail services" roadmap, I have realized that we should not only publish SPF/DKIM records (and sign outgoing mail), we should also *check* incoming mail. This is becoming critically important because Hetzner are b...as part of the %"improve mail services" roadmap, I have realized that we should not only publish SPF/DKIM records (and sign outgoing mail), we should also *check* incoming mail. This is becoming critically important because Hetzner are becoming unhappy with us backscatter-spamming people through Mailman (see message `[AbuseID:998963:1A]`).
That was a Spamcop complaint about a user that was receiving backscatter bounce through Mailman. Specifically a message that was marked "too big" and "held for moderation". That specific instance would have been solved by an SPF check, because there are fairly strict ones on that specific victim's email server:
```
account.co.za. 9476 IN TXT "v=spf1 +a +mx +ip4:136.243.12.222 +ip4:5.9.29.165 +ip4:5.9.29.168 -all"
```
So it seems like part of roadmap should also include checking incoming email, if only to limit the spam we relay through (and then hurts our reputation).
First step would be to check SPF, but we should also probably check DMARC since it may influence SPF. DKIM would be second.
* [ ] SPF checks
* [ ] DMARC checks? if necessary for SPF, definitely needed for DKIM...
* [ ] DKIM checksimprove mail serviceshttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40542Docker containers using Google DNS?2022-04-06T21:07:32ZJérôme Charaouilavamind@torproject.orgDocker containers using Google DNS?I've noticed these recurring log entries showing up on our Docker runners:
level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]"
level=...I've noticed these recurring log entries showing up on our Docker runners:
level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]"
level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
I'm wondering if we should instead ensure the containers are using the same DNS as the hosts?https://gitlab.torproject.org/tpo/tpa/team/-/issues/40550deploy fail2ban on all services with password authentication2023-07-24T19:19:17Zanarcatdeploy fail2ban on all services with password authenticationany service using passwords for authentication should ban users attempting to bruteforce it.
consider that we have onion services and figure out how to thwart users there as well.
possible list:
* [x] db.tpo
* [ ] crm
* [ ] nextclo...any service using passwords for authentication should ban users attempting to bruteforce it.
consider that we have onion services and figure out how to thwart users there as well.
possible list:
* [x] db.tpo
* [ ] crm
* [ ] nextcloud
* [ ] forum
* [ ] prometheus/grafana?
* [ ] ... ? review the "Auth" column in the [service list](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/)
/cc @lavamindhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40554replace ferm with nftables2024-02-08T16:21:14ZJérôme Charaouilavamind@torproject.orgreplace ferm with nftables# Context
Since [version 2.5](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929416#17) released with Debian 11, ferm, our firewall manager, loads its firewall rules via `iptables-legacy` instead of `iptables` which is aliased to `ip...# Context
Since [version 2.5](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929416#17) released with Debian 11, ferm, our firewall manager, loads its firewall rules via `iptables-legacy` instead of `iptables` which is aliased to `iptables-nft` since Debian 10.
The result is that our bullseye machines run *two* firewalls, both nftables and iptables. This is explicitely [discouraged](https://wiki.debian.org/nftables#Should_I_mix_nftables_and_iptables.2Febtables.2Farptables_rulesets.3F) in Debian.
# Workaround
Since we mainly deploy simple firewall rules and never had any issues with ferm using `iptables` on Debian 10 (hence `iptables-nft`), we should consider implementing the workaround suggested here to force ferm to interact with `iptables`/`iptables-nft` on bullseye and later : https://github.com/MaxKellermann/ferm/issues/47#issuecomment-845940826
# Fix
We should just stop using ferm altogether and directly generate rules with nftables.
@weasel apparently has a good nftables module for puppet:
https://github.com/weaselp/puppet-nry_nft
... and naturally voxpupuli has one too:
https://forge.puppet.com/modules/puppet/nftables / https://github.com/voxpupuli/puppet-nftablesDebian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40568Better monitoring for webserver response times2022-07-26T17:42:40ZJérôme Charaouilavamind@torproject.orgBetter monitoring for webserver response timesIn the wake of tpo/tpa/team#40566, it was shown that our monitoring infrastructure isn't sufficiently sensitive with respect to web server response times. We had an ongoing DoS on the static mirror hosts for days and we only noticed when...In the wake of tpo/tpa/team#40566, it was shown that our monitoring infrastructure isn't sufficiently sensitive with respect to web server response times. We had an ongoing DoS on the static mirror hosts for days and we only noticed when the response times consistently surpassed 10 seconds.
We should probably modify the existing checks or add new ones that will monitor whether the static mirror host (or even any web host) is serving pages within an acceptable delay, say 1 second.Debian 11 bullseye upgradeJérôme Charaouilavamind@torproject.orgJérôme Charaouilavamind@torproject.orghttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40580consider retiring build boxes2022-04-07T16:21:19Zanarcatconsider retiring build boxesin the jenkins retirement (#40218) we have decided to keep a few (three machines with debian_build_box, one of which is also a CI runner) build boxes with sbuild on it. even though we have retired Jenkins, which were its primary consumer...in the jenkins retirement (#40218) we have decided to keep a few (three machines with debian_build_box, one of which is also a CI runner) build boxes with sbuild on it. even though we have retired Jenkins, which were its primary consumer, users like @weasel and @kez may still require those boxes for two use cases:
* @kez doesn't run Debian and might need a place to build random Debian packages not currently in GitLab (which could be fixed by moving those package builds inside GitLab)
* @weasel has a similar use case although he obviously runs debian, he also needs access to the ARM builder. it's unclear whether he still requires access to the build box in the long term or why (sorry, my memory fails me here)https://gitlab.torproject.org/tpo/tpa/team/-/issues/40586provide instructions that work without IMAP or provide an IMAP server2022-04-20T14:31:43Zanarcatprovide instructions that work without IMAP or provide an IMAP servermultiple clients (at least Outlook/Office 365 but also Apple). I looked at Apple Mail specifically with @sebastian and we couldn't find a way to just add an SMTP server. We tried two things:
* adding a new account: this tries to config...multiple clients (at least Outlook/Office 365 but also Apple). I looked at Apple Mail specifically with @sebastian and we couldn't find a way to just add an SMTP server. We tried two things:
* adding a new account: this tries to configure a POP3 or IMAP service, which fails because we don't provide that
* adding a new SMTP server: this works, but then you need to configure one of the existing IMAP accounts to use *that*, which means that unrelated email addresses (ie. not `From: *@torproject.org`) will be using the submission server which is also not we want.
I believe that @irl also reported a similar problem with Outlook (or Office 365) where you couldn't configure an extra SMTP server without a related IMAP server.
I'm not sure what the solution here is. Maybe me and @sebastian and @irl couldn't find the right way to do this, in which case documentation needs to be clarified.
But I suspect that there's something fundamentally wrong with just hosting the SMTP server and that you actually need to provide some sort of IMAP service for this to work reliably in all clients. In that case, the solution could be for TPA to just host an empty IMAP server. That would actually be fairly easy because we already have a Dovecot server configured on submit-01 (through puppet, even), so we could easily replicate that on another server. We wouldn't actually deliver mail there, it would just be to support configuring those mail clients.
Makes sense?
In any case, this is a blocker for the SPF deployment (#40363).improve mail services