TPA issueshttps://gitlab.torproject.org/groups/tpo/tpa/-/issues2024-03-11T21:39:34Zhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40960Document our privacy-preserving webserver log setup for the world2024-03-11T21:39:34ZRoger DingledineDocument our privacy-preserving webserver log setup for the worldWe use a novel log format for our webservers, which makes sure we don't collect the IP addresses of our visitors, and doesn't record the precise timestamp of the visits, yet still produces a format compatible with various log parsing too...We use a novel log format for our webservers, which makes sure we don't collect the IP addresses of our visitors, and doesn't record the precise timestamp of the visits, yet still produces a format compatible with various log parsing tools.
Everybody in the world should be doing this.
We should document what we do and how and why, and tell the world so everybody else can do it too.
Apparently Debian uses the same approach we do, so we have some adoption already, but much more remains!
See
http://seclists.org/nmap-announce/2004/16
for some of our original motivation.
And see
http://lists.spi-inc.org/pipermail/spi-general/2016-December/003645.html
for a summary of what we do currently.
We should also invite/encourage people to find bugs in our set-up. It can always get better!
And lastly, a blog post like this will be really useful to point to when we start doing analysis and graphs and metrics and stuff.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40934evaluate CPU/power usage of unattended upgrades2022-11-12T20:48:24Zanarcatevaluate CPU/power usage of unattended upgradesafter running unattended-upgrades for a while everywhere now, one has to wonder how well it works! in general, we've had some issues with packages updating and breaking (i'm looking at you grub), and restarts breaking things (i'm looking...after running unattended-upgrades for a while everywhere now, one has to wonder how well it works! in general, we've had some issues with packages updating and breaking (i'm looking at you grub), and restarts breaking things (i'm looking at you needrestart and open vswitch!), but otherwise it seems to work okay...
... but does it!?
on my laptop, i run debian bookworm, and i have found that it is *much* slower at resolving dependencies on large upgrades (say a couple hundred packages, which can definitely happen when you follow testing/unstable). and by slower I mean "it's taking a long time just resolving the dep tree or decision tree or whatever that thing is doing, while `apt upgrade` does the equivalent in an instant".
so i wonder if we're not wasting a microton of CPU usage (and therefore power and therefore destroying civilzation) because of this possible flaw.
because we've instrumented per-service metrics elsewhere (i'm looking at you postgresql), i figured we might be able to do something similar, which could help the entire community and, ultimately, possibly save us some resources as well.
the end goal here is that maybe we'd switch away from u-u, either when running testing or altogether. alternatives range from "just write your own damn systemd service" to "oh, is cronapt still around? that was cute"anarcatanarcathttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40861deploy dynamic environments on the Puppet server2024-03-26T15:14:06Zanarcatdeploy dynamic environments on the Puppet serverWe should be able to push to a new branch and have that be a specific environment that can be ran only on a subset of machines.
I've done something like this in my home lab: code in /etc/puppet/code/production is the default, but i can ...We should be able to push to a new branch and have that be a specific environment that can be ran only on a subset of machines.
I've done something like this in my home lab: code in /etc/puppet/code/production is the default, but i can make new ones (currently by hand) in /etc/puppet/code/BRANCHNAME. It's pretty useful to avoid "YOLO" commits that plague our history, but can also be used for more sensitive deployments.
This probably depends on first creating a role account (#29663) and goes along validation checks (#31226). It could probably be done without any of those though...Puppet CIJérôme Charaouilavamind@torproject.orgJérôme Charaouilavamind@torproject.orghttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40859restore IPv6 service on btcpay2023-05-23T21:03:05Zanarcatrestore IPv6 service on btcpayin #40836, we found that HTTPS cert renewals were failing because let's encrypt couldn't reach the server over IPv6. that, in turn was caused by a [regression in Docker](https://github.com/moby/libnetwork/issues/2607) which was [filed a...in #40836, we found that HTTPS cert renewals were failing because let's encrypt couldn't reach the server over IPv6. that, in turn was caused by a [regression in Docker](https://github.com/moby/libnetwork/issues/2607) which was [filed as a bug in Debian](https://bugs.debian.org/1017477) as well.
the task here is to restore the IPv6 record in LDAP (`2604:8800:5000:82:466:38ff:fe07:b154`, already configured on the host) once proper behavior is restored. this could be done as part of the %"Debian 12 bookworm upgrade" or if the `docker.io` package gets an update to fix the above bug report in bullseye (which I think is rather unlikely).
So for now, just keep this in the bookworm backlog.
Checklist:
* [ ] make sure that the docker package is fixed, see https://bugs.debian.org/1017477 for the reproducer
* [ ] re-add the AAAA record (`2604:8800:5000:82:466:38ff:fe07:b154`) in LDAP
* [ ] uncomment the record in the reverse zone file
* [ ] monitor (force?) the certificate renewal to make sure that still worksDebian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40799review efficiency of the Ganeti cluster, particularly gnt-fsn2023-11-22T20:56:18Zanarcatreview efficiency of the Ganeti cluster, particularly gnt-fsnLet's review the provisioning ration on the ganti cluster: how much do we over or under provision and how much does this thing cost per VM anyway.
The idea is that we started down the project of hosting everything with Ganeti mostly as...Let's review the provisioning ration on the ganti cluster: how much do we over or under provision and how much does this thing cost per VM anyway.
The idea is that we started down the project of hosting everything with Ganeti mostly as an experiement, with the belief it would give us better reliability. That probably is the case: we can reboot nodes without causing outages on instances (when only the needs need an upgrade, which is often not the case, that said). We could probably survive a total server loss as well, for example. But we haven't thought of how much resources go into making sure that availability is around.
So here's the task list:
- [x] evaluate the cost of hosting a single VM in the gnt-fsn cluster (or maybe per disk/memory/cpu unit? not sure how to evaluate this?)
- [ ] evaluate how much waste we have, for example
- [ ] how many CPUs are actually fully in use (say over a given 24h period or week?)
- [ ] how much memory is fully in use (same)
- [ ] how much disk is in use (probably just current snapshot)
I wonder if we should also evaluate performance overhead:
- [ ] how much is Qemu/KVM costing us in terms of raw processing power? maybe run a CPU-intensive benchmark in and out of a VM on an otherwise idle node
- [ ] same with disk: do we pay a big price for virtualized I/O?
- [ ] DRBD overhead: benchmark plain disk vs DRBD
- [ ] network overhead: benchmark local disk vs network disk (might be tough without also testing DRBD? we're mostly concerned about vswitch vs local switch performance, could we compare performance with gnt-chi here for example?)
The point here is that we're paying Hetzner a lot of money for a lot of rented metal (8 machines), instead of hosting everything in their cloud. That imposes a significant management cost, while at the same time giving us certain garantees in terms of privacy in control. We need to seriously consider whether it's worth hosting our own metal, still, and efficiency is certainly a big part of this.
Also related to #40163 (evaluate power usage).(next) cluster scalinganarcatanarcathttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40797Upload deb.torproject.org-keyring to Debian2022-06-27T13:56:04Zmicahmicah@torproject.orgUpload deb.torproject.org-keyring to DebianThe `deb.torproject.org-keyring` package is very useful to keep the tor archive keyring up-to-date. However, it requires that one go through the manual process of establishing a chain of trust via the method described on the [Debian inst...The `deb.torproject.org-keyring` package is very useful to keep the tor archive keyring up-to-date. However, it requires that one go through the manual process of establishing a chain of trust via the method described on the [Debian installation page](https://support.torproject.org/apt/tor-deb-repo/).
If this package was uploaded into Debian proper, then Debian users could install this package without having to jump through these hoops, bootstrapping from the already established Debian trust path.weasel (Peter Palfrader)weasel (Peter Palfrader)https://gitlab.torproject.org/tpo/tpa/team/-/issues/40747remove pre-bookworm legacy code in Puppet2023-09-26T20:39:59Zanarcatremove pre-bookworm legacy code in PuppetWe have a bunch of code in our puppet codebase that checks for pre-bullseye releases. Remove that code when we finish upgrading to bullseye.
an example:
```
modified modules/profile/manifests/prometheus/bind_exporter.pp
@@ -4,6 +4,14...We have a bunch of code in our puppet codebase that checks for pre-bullseye releases. Remove that code when we finish upgrading to bullseye.
an example:
```
modified modules/profile/manifests/prometheus/bind_exporter.pp
@@ -4,6 +4,14 @@ class profile::prometheus::bind_exporter(
) {
$package_ensure = $ensure ? { 'present' => 'installed', 'absent' => 'absent' }
$service_ensure = $ensure ? { 'present' => 'running', 'absent' => 'stopped' }
+
+ # this should be dropped when bind exporter is upgraded to 0.3.0
+ # or later (BULLSEYE) everywhere
+ if versioncmp($facts['os']['release']['major'], '11') >= 0 {
+ $extra_options = ''
+ } else {
+ $extra_options = "-bind.pid-file /var/run/named/named.pid -bind.stats-groups 'server,view,tasks'"
+ }
class { 'prometheus::bind_exporter':
package_ensure => $package_ensure,
service_ensure => $service_ensure,
@@ -17,8 +25,10 @@ class profile::prometheus::bind_exporter(
package_name => 'prometheus-bind-exporter',
service_name => 'prometheus-bind-exporter',
# purge_config_dir => true,
- # should be dropped when bind exporter is upgraded to 0.3.0 or later (BULLSEYE)
- extra_options => "-bind.pid-file /var/run/named/named.pid -bind.stats-groups 'server,view,tasks'",
+
+ # this should be dropped when bind exporter is upgraded to 0.3.0
+ # or later (BULLSEYE) everywhere
+ extra_options => $extra_options,
}
# realize the allow rules defined on the prometheus server(s)
```
in the above code, we're remove first `if` block, and the `extra_options =>` class parameter passed to `prometheus::bind_exporter` completely.
but in general, grep for:
* [ ] `buster` (case-insensitive)
* [ ] `bullseye` (case-insensitive)
* [ ] `bookworm` (case-insentitive)
* [ ] `versioncmp`
* [ ] `lsbmajdistrelease`
* [ ] `lsbdistcodename`Debian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40743Serve .onion links on our static websites accessed via .onion2023-04-11T17:51:02ZJérôme Charaouilavamind@torproject.orgServe .onion links on our static websites accessed via .onionWhen visiting a TPO website via its .onion address, many of the links on those pages point to Internet adresses instead of their corresponding .onion address. For example, visiting the .onion for `blog.torproject.org`, http://pzhdfe7jrak...When visiting a TPO website via its .onion address, many of the links on those pages point to Internet adresses instead of their corresponding .onion address. For example, visiting the .onion for `blog.torproject.org`, http://pzhdfe7jraknpj2qgu5cz2u3i4deuyfwmonvzu5i3nyw4t4bmg7o5pad.onion displays navigation links on the top of the page that point to the canonical domain instead of the .onion adresses. When a Tor Browser clicks a link pointing to one of these sites, it will hit an exit node and if automatic .onion redirection is enabled, the `Onion-Location` header will cause the page to be loaded a second time. This leads to unnecessary delays when navigating and hopping between different TPO websites.
In tpo/tpa/team#40667 we discussed two approaches to fixing this problem:
* modify the pages dynamically when serving them through the .onion address
* build an "onionized" version of all our static websites and host those on a different `DocumentRoot`https://gitlab.torproject.org/tpo/tpa/team/-/issues/40723port puppet-managed configs to Debian bullseye2023-09-26T20:43:33Zanarcatport puppet-managed configs to Debian bullseyeas part of the %"Debian 11 bullseye upgrade", we have a bunch of (sometimes really old) configuration that needs to be ported to the new stuff that's in bullseye. I have identified, so far:
* [ ] `/etc/apt/apt.conf.d/50unattended-upgra...as part of the %"Debian 11 bullseye upgrade", we have a bunch of (sometimes really old) configuration that needs to be ported to the new stuff that's in bullseye. I have identified, so far:
* [ ] `/etc/apt/apt.conf.d/50unattended-upgrades`: to be investigated. probably mostly whitespace changes, but also possibly missing features. complicated by the fact that this is a third party Puppet module and would require significant work to catchup with the Debian package
* [ ] `/etc/unbound/unbound.conf`: switch to `include-toplevel` after the fleet is upgraded (does not work in buster)
* [x] `/etc/sudoers`: use `@include` instead of `#include`, former added only in bullseye and later. should be split out in a `sudoers.d` file to avoid future conflicts and, generally, split in snippets per service instead of this monolithic file
* [ ] `/etc/syslog-ng/syslog-ng.conf`: silly version number logic in the template, needs to be ported to newer config or replaced with rsyslog or journald
* [x] ~~`/etc/ferm/ferm.conf`: `web-cymru-01` had diffs pending from the previous upgrade (presumably?), might be worth catching up to *buster*, that is, unless we just ditch ferm completely (#40554)~~ the latter
* [ ] `/etc/lvm/lvm.conf`: same as above
* [x] ~~`/etc/bind/named.conf.options`: TBD, on fallax~~ fallax retired
if a file is added in the above list, do not forget to add it to the [conflicts resolution list](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/upgrades/bullseye#conflicts-resolution) in the upgrade procedure.
more such issues could come up, but for now that's what I got. for now the diff for those has been minimized as much as possible and the proposed version from the Debian package should generally be ignored.Debian 11 bullseye upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40695upgrade or rebuild hetzner-hel1-01 (nagios/icinga)2023-11-21T22:45:00Zanarcatupgrade or rebuild hetzner-hel1-01 (nagios/icinga)Nagios is going to be a particularly tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
We need to decide whether we keep icinga around at all or replace it with Prometheus (https://gitla...Nagios is going to be a particularly tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
We need to decide whether we keep icinga around at all or replace it with Prometheus (https://gitlab.torproject.org/tpo/tpa/team/-/issues/29864). if we do keep icinga, we need to decide whether we keep the current "push to git to rebuild the config" model or "puppetize the setup" (https://gitlab.torproject.org/tpo/tpa/team/-/issues/32901).Debian 11 bullseye upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40694upgrade eugeni to bullseye2023-11-15T21:33:42Zanarcatupgrade eugeni to bullseyeeugeni is going to be a tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
we might want to decide what to do with mailman (https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471) and...eugeni is going to be a tricky bullseye upgrade, so it's not part of the large bullseye upgrade batches (#40690 or #40692).
we might want to decide what to do with mailman (https://gitlab.torproject.org/tpo/tpa/team/-/issues/40471) and schleuder (https://gitlab.torproject.org/tpo/tpa/team/-/issues/40564) *before* we do the upgrade. mailman 2, in particular, is EOL so we *will* need to upgrade or replace it.
we might also want to consider the impact of the %"improve mail services" roadmap here. it's possible we might want to completely rebuild eugeni in different components instead of upgrading it.Debian 11 bullseye upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40677monitor certificate transparency logs2024-01-16T20:19:21Zanarcatmonitor certificate transparency logsnow that [certificate transparency](https://certificate.transparency.dev/) (CT) is a thing, we should probably start monitoring those. specifically, because major web browsers now *require* CAs to publish their certs in CT, there's a goo...now that [certificate transparency](https://certificate.transparency.dev/) (CT) is a thing, we should probably start monitoring those. specifically, because major web browsers now *require* CAs to publish their certs in CT, there's a good chance that a hostile actor who manages to generate a cert for us would show up on that radar.
(it wouldn't keep a rogue CA from generating a fake cert though, as they could, in theory, disobey those policies at some point.)
As a side note, Chrome and Safari currently [enforce CT for HTTPS sites](https://en.wikipedia.org/wiki/Certificate_Transparency#Mandatory_certificate_transparency) (but not Firefox) as well.
The goal here is to have an alert in monitoring (typically nagios/icinga right now) when a rogue certificate gets issued so we can act quickly on it. What to do next is out of scope, but should also be considered eventually.
This is actually a surprisingly hard problem, because the CT logs are huge and constantly changing, so we can't just "run a log and watch it". The tooling around this isn't well settled either. Finally, we need to consider that we *do* issue certificates regularly and those shouldn't trigger an alert.
# Known solutions
* [cert spotter](https://sslmate.com/certspotter/) is both a commercial service from sslmate.com and a [free software log monitor](https://github.com/SSLMate/certspotter) written in Golang, outputs matching certs to stdout, doesn't seem actively maintained
* [fetchallcerts.py](https://git.sunet.se/catlfish.git/tree/tools/fetchallcerts.py) is a homegrown script from @linus that is a proof of concept that writes a JSON representation of the merkle tree, zips up matching certs and warns about inconsistencies in the log
* Let's Encrypt runs [ct-woodpecker](https://github.com/letsencrypt/ct-woodpecker) which involves running a full log and is more useful for actual CAs that want to monitor their own things
* [DSA](https://dsa.debian.org/) uses [this nagios plugin](https://salsa.debian.org/dsa-team/mirror/dsa-nagios/-/blob/master/dsa-nagios-checks/checks/dsa-check-ct-logs) to monitor certspotter, and it *does* check for the existing cert bundle, see [this YAML config](https://salsa.debian.org/dsa-team/mirror/dsa-nagios/-/blob/5c73a99322881ec2ea6b4fe7ae7ae188aebfa19a/config/nagios-master.cfg#L2110) for how it's called (` remotecheck: "/usr/lib/nagios/plugins/dsa-check-ct-logs --domain debian.org --dir /srv/letsencrypt.debian.org/var/result/ --cert-bundle /etc/ssl/ca-global/ca-certificates.crt --subdomains --ignore-re '.*\\.acc\\.umu\\.se' --ignore-fp 90b1c027ff49c22e1dfbde6dcd4e3ef99d795ffe02e61e5ef3850896a33a430b"`)
* [ct-sans](https://git.cs.kau.se/rasmoste/ct-sans) and (a less-deps rewrite) [ct](https://gitlab.torproject.org/rgdd/ct)
* [certstream-python](https://github.com/CaliDog/certstream-python), client for [certstream.calldog.io](https://certstream.calidog.io/)
* [scrape-ct-log](https://github.com/mpalmer/scrape-ct-log) claimed to be the "fastest" in [this post](https://www.hezmatt.org/~mpalmer/blog/2024/01/16/pwned-certificates-on-the-fediverse.html)
In a duplicate of this issue (#33602), I tested cert spotter:
> it's a debian package since buster. i ran a test locally, and it's basically:
>
> ```
> sed 's/ /\n/g;/^#/d;/^ *$/d' letsencryt-domains/domains | sort | certspotter -watchlist -
> ```
>
> the key trick however, is to *not* warn *when* a new cert is renewed. therefore we would need to be somewhat clever and recognize our own certificates in there and filter those out.https://gitlab.torproject.org/tpo/tpa/team/-/issues/40675switch prometheus target discovery from file_sd and exporter resources to pup...2024-02-08T16:20:02Zanarcatswitch prometheus target discovery from file_sd and exporter resources to puppetdbwe currently use Puppet exported resources to configure the Prometheus server to scrape our various targets. this works somewhat okay, but requires a lot of exported resources and takes a while to propagate, and is error prone.
there's ...we currently use Puppet exported resources to configure the Prometheus server to scrape our various targets. this works somewhat okay, but requires a lot of exported resources and takes a while to propagate, and is error prone.
there's a much easier way which doesn't require exported resources at all, the [puppetdb_sd_config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#puppetdb_sd_config) parameter, which basically pulls a list of targets from PuppetDB dynamically.
That [code was added in Prometheus 2.31](https://github.com/prometheus/prometheus/commit/8920024323ad8fef353ec2fc495894f8748f0687), so this won't work until Debian bookworm is released (or backports), but it is nevertheless a very promising tool.
See also this [sample configuration](https://github.com/prometheus/prometheus/blob/release-2.34/documentation/examples/prometheus-puppetdb.yml) for more ideas on what this can do.Debian 12 bookworm upgradehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40651Disable SSLSessionTickets in Apache22022-04-06T20:57:59ZJérôme Charaouilavamind@torproject.orgDisable SSLSessionTickets in Apache2Mozilla's server-side TLS [guidelines](https://ssl-config.mozilla.org/#server=apache&version=2.4.41&config=intermediate&openssl=1.1.1k&guideline=5.6) suggest setting `SSLSessionTickets off` (default is `on`) in Apache2 because session ke...Mozilla's server-side TLS [guidelines](https://ssl-config.mozilla.org/#server=apache&version=2.4.41&config=intermediate&openssl=1.1.1k&guideline=5.6) suggest setting `SSLSessionTickets off` (default is `on`) in Apache2 because session key rotation isn't [handled properly](https://github.com/mozilla/server-side-tls/issues/135) and [weakens](https://timtaubert.de/blog/2017/02/the-future-of-session-resumption/) security properties of TLS connections.
There's a small performance cost, but we'd only pay it for TLS <=1.2 connections, since TLS 1.3 did away with TLS session tickets altogether.Jérôme Charaouilavamind@torproject.orgJérôme Charaouilavamind@torproject.orghttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40632trouble delivering email role forwards to gmail2022-11-10T16:56:48Zanarcattrouble delivering email role forwards to gmailin #40585, we found that we specifically have trouble delivering "role" aliases (job-XX@tpo) to gmail.com addresses. the same user, sending the same mail, to individual users at gmail would work, but not to the role address.
i speculate...in #40585, we found that we specifically have trouble delivering "role" aliases (job-XX@tpo) to gmail.com addresses. the same user, sending the same mail, to individual users at gmail would work, but not to the role address.
i speculated that the problem was with the user identity on gmail's side:
> https://support.google.com/mail/answer/22370
>
> ie. that gmail is unhappy about receiving mail addressed to that job-direng@ alias because it doesn't know about it, but it's happy to receive (say) [erin@torproject.org](mailto:erin@torproject.org) because that's the alias you have setup there?
>
> could it just be that you need to setup the alias as documented above?
so one thing to check is to see if @isabela or @ewyatt could follow the above guide to see if it helps with the problem.
otherwise I'm hoping solutions considered as part of the larger "plan for email" project (tpo/tpa/team#40363 and TPA-RFC-15) would help with this (specifically SRS but also possibly ARC?).improve mail serviceshttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40628TPA-RFC-17: establish a global disaster recovery plan2024-01-19T19:19:00ZanarcatTPA-RFC-17: establish a global disaster recovery planin our [service template](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/template), we have a "Disaster recovery" section, but it's not very detailed. furthermore, it's per service, and doesn't cover stuff like "Hetzner goes do...in our [service template](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/template), we have a "Disaster recovery" section, but it's not very detailed. furthermore, it's per service, and doesn't cover stuff like "Hetzner goes down for a long time", or "we get ransomware'd", or "john's laptop got hacked, now what".
so this ticket is about setting a global disaster recovery policy that takes all of those cases into accounts. it should also make sure we have disaster recovery scenarios for *all* services, which will include coordinating with other teams. and, of course, it will require coordinating with everyone to make sure the plan makes sense.
finally, it will probably require budgeting some work in the future so that we not only have an idea of a plan, but have measures in place to mitigate disasters (e.g. another backup server, everyone gets a yubikey, etc).
i started brain-dumping ideas in this RFC: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-17-disaster-recovery
this, in turn, could inform the security policy as well, see tpo/team#41https://gitlab.torproject.org/tpo/tpa/team/-/issues/40626cleanup the postfix code in puppet2022-04-06T20:53:24Zanarcatcleanup the postfix code in puppetour Puppet configuration in puppet is problematic in a few ways:
* main.cf hardcodes a lot of things (like `smtp_dns_support_level`, certificates, digest fingerprints, mynetworks, and more)
* master.cf and main.cf has host-specific co...our Puppet configuration in puppet is problematic in a few ways:
* main.cf hardcodes a lot of things (like `smtp_dns_support_level`, certificates, digest fingerprints, mynetworks, and more)
* master.cf and main.cf has host-specific configurations (e.g. eugeni, polyanthum) or role-specific (email::submission), instead of having this part of hiera
* transport maps are hardcoded in the module instead of (say) in a profile
* access control is difficult: it's unclear how to block a given email address (e.g. on RT/rude)
* the code is specific to our project and not reusable, and is therefore basically unmaintained (we should use an existing module instead)
so we should look at refactoring this to make it easier to expand and tweak.
The following projects are similar and we might want to collaborate with those in the future.
* [voxpupuli/postfix](https://github.com/voxpupuli/puppet-postfix) - multiple issues, see below
* [shared-puppet-modules-group/postfix](https://gitlab.com/shared-puppet-modules-group/postfix) - marked "LEGACY"
* [cirrax/postfix](https://github.com/cirrax/puppet-postfix) - used by tails/puscii
There are multiple issues with the camptocamp (now voxpupuli) module that seemed to be
deal-breakers during a quick evaluation:
* [Changes to postfix::files causes a restart and reload](https://github.com/camptocamp/puppet-postfix/issues/134) - this
is a performance concern: could be trouble for large servers
* ~~[Do not manage /etc/mailname](https://github.com/camptocamp/puppet-postfix/issues/186) - we *remove* this file in our
configuration, so that's in direct conflict~~ fixed!
* [init.pp: use the postfix default for mydestination](https://github.com/camptocamp/puppet-postfix/pull/256) - this is a
default that could be worked around (and we could just use a
template for `main.cf`)
* [main.cf empty](https://github.com/camptocamp/puppet-postfix/blob/master/files/main.cf) - main.cf is completely empty by default, which
is a major change from Debian (for example)
In general, it's unclear that the camptocamp brings enough benefit to
justify switching to it at this stage. But since it's the most popular
module and that it's actively maintained, it might be worth biting the
bullet and adapting it to our needs instead of reinventing the wheel.
We should definitely look at the cirrax module, in any case, as we heard good things about it from fellow sysadmins.cleanup and publish the sysadmin codebasehttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40604Reduce risk that critical (communication) services are down at the same time2022-04-06T20:58:45ZGeorg KoppenReduce risk that critical (communication) services are down at the same timeYesterday we had short outages of chives which a bunch of us use for IRC. I said "Okay, fine, let's do email in the mean time" until I realized that our shiny new submission service was affected by the same outage. We should think about ...Yesterday we had short outages of chives which a bunch of us use for IRC. I said "Okay, fine, let's do email in the mean time" until I realized that our shiny new submission service was affected by the same outage. We should think about critical (communication) services trying to make sure they are on different networks, so we reduce the risk that all/most of them are down at the same time.improve mail serviceshttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40596figure out some ways to handle phishing / scam attempts2023-08-08T14:34:10Zanarcatfigure out some ways to handle phishing / scam attemptswe are getting complaints from our users about phishing and scam attempts. this ticket will try to document those and possible workarounds.
ideas:
* [ ] report issues upstream when SPF/DKIM checks out
* [ ] implement *incoming* DNS c...we are getting complaints from our users about phishing and scam attempts. this ticket will try to document those and possible workarounds.
ideas:
* [ ] report issues upstream when SPF/DKIM checks out
* [ ] implement *incoming* DNS checks (DKIM, SPF, etc: #40539)
* [ ] implement spamassassin filtering?
* [ ] implement body checks to bounce some content, e.g. on From or specific Subject headers...improve mail serviceshttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40591monitor gitlab issues counts in prometheus2023-12-21T16:51:03Zanarcatmonitor gitlab issues counts in prometheusright now i manually write down the number of issues that are open/closed/~Icebox/~Backlog/~Next/~Doing in each monthly report. that's kind of ridiculous because it takes time every month, but also limited because we don't get historical...right now i manually write down the number of issues that are open/closed/~Icebox/~Backlog/~Next/~Doing in each monthly report. that's kind of ridiculous because it takes time every month, but also limited because we don't get historical data. in [september 2021](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/meeting/2021-09-07#ticket-analysis), i did a somewhat detailed analysis of the month-per-month metrics, which give a good idea of the data we could render.
basically, i think we want to define a set of (gitlab) labels and then put in prometheus the number of tickets that match one of those labels (or none, or closed), and per project. so this report from september:
* open: 0
* icebox: 119
* backlog: 17
* next: 6
* doing: 5
* needs information: 3
* needs review: 0
* (closed: 2387)
would be something like this in Prometheus:
gitlab_issues{status=open,label=icebox,project=tpo/tpa/team} 119
gitlab_issues{status=open,label=backlog,project=tpo/tpa/team} 17
gitlab_issues{status=open,label=next,project=tpo/tpa/team} 6
gitlab_issues{status=open,label=doing,project=tpo/tpa/team} 5
gitlab_issues{status=open,label=needs information,project=tpo/tpa/team} 3
gitlab_issues{status=open,label=needs review,project=tpo/tpa/team} 0
gitlab_issues{status=open,project=tpo/tpa/team} 0
gitlab_issues{status=closed,project=tpo/tpa/team} 2387
alternatively, if we don't want to pre-define a list of labels, we could just pull all the labels in one metric, and open/close in another
gitlab_open_issues_per_label{label=icebox,project=tpo/tpa/team} 119
gitlab_open_issues_per_label{label=backlog,project=tpo/tpa/team} 17
gitlab_open_issues_per_label{label=next,project=tpo/tpa/team} 6
gitlab_open_issues_per_label{label=doing,project=tpo/tpa/team} 5
gitlab_open_issues_per_label{label=needs information,project=tpo/tpa/team} 3
gitlab_open_issues_per_label{label=needs review,project=tpo/tpa/team} 0
gitlab_open_issues_per_label{project=tpo/tpa/team} 0
gitlab_issues{status=closed,project=tpo/tpa/team} 2387
gitlab_issues{status=open,project=tpo/tpa/team} 150
there's this [one python project](https://github.com/owentl/gitlab-prometheus/) that might do it, but it will probably needs to be patched. i filed a [ticket upstream with gitlab](https://gitlab.com/gitlab-org/gitlab/-/issues/350602) to see if this was possible there as well.