From feae5dd08cdd3bfd914339ec61ee775a6ffc9a98 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
Date: Tue, 20 Feb 2024 12:49:38 -0500
Subject: [PATCH] document more password rotation mechanisms
 (tpo/tpa/team#41530)

---
 service/password-manager.md | 74 +++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 3 deletions(-)

diff --git a/service/password-manager.md b/service/password-manager.md
index 81341d3b..52584301 100644
--- a/service/password-manager.md
+++ b/service/password-manager.md
@@ -208,9 +208,9 @@ automate this than to manually perform each reset using the above.
 
 ### LUKS
 
-Next, full disk encryption keys.
-
-TODO: fabric task?
+Next, full disk encryption keys. Those are currently handled manually
+(with `pass update`) as well, but we are hoping to automate this as
+well, see [issue 41537](https://gitlab.torproject.org/tpo/tpa/team/-/issues/41537) for details.
 
 ### lists
 
@@ -223,6 +223,74 @@ Mailman 3 is deployed, all those will go away anyway.
 Those can probably be left alone; it's unclear if they have any
 relevance left and should probably be removed.
 
+### Trocla
+
+Some passwords are stored in Trocla, on the Puppet server (currently
+`pauli.torproject.org`). If we worry about lateral movement of an
+hostile attacker or a major compromise, it might be worth resetting
+all some of Trocla's password.
+
+This is currently not automated. In theory, deleting the entire Trocla
+database (its path is configured in `/etc/troclarc.yaml`) and running
+Puppet everywhere *should* reset all passwords, but this hides a *lot*
+of complexity, namely:
+
+ 1. IPSec tunnels will collapse until Puppet is ran on both ends,
+    which could break lots of things (e.g. CiviCRM, Ganeti)
+
+ 2. application passwords are sometimes manually set, for example the
+    CiviCRM IMAP and MySQL passwords are *not* managed by Puppet and
+    would need to be reset by hand
+
+Here's a non-exhaustive list of passwords that need manual resets:
+
+ * CiviCRM IMAP and MySQL
+ * Dangerzone WebDAV
+ * Grafana user accounts
+ * KGB bot password (used in GitLab)
+ * Prometheus CI password (used in GitLab's prometheus-alerts CI)
+ * metrics DB, Tagtor, victoria metrics, weather
+ * network health relay
+ * probetelemetry/v2ray
+ * rdsys frontend/backend
+
+Run `git grep trocla` in `tor-puppet.git` for the list. Note that it
+will match secrets that *are* correctly managed by Puppet.
+
+Automation could be built to incrementally perform those rotations,
+interactively. Alternatively, some password expiry mechanism could be
+used, especially for secrets that are managed in one Puppet run
+(e.g. the Dovecot mail passwords in GitLab).
+
+### GitLab secrets
+
+In case of a full compromise, an attacker could have sucked the
+secrets out of GitLab projects. The `gitlab-tokens-audit.py` script in
+[gitlab-tools](https://gitlab.torproject.org/tpo/tpa/gitlab-tools/) provides a view of all the group and project access
+tokens and CI/CD variables in a set of groups or projects.
+
+Those tokens are currently rotated manually, but there could be more
+automation here as well: the above Python script could be improved to
+allow rotating tokens and resetting the associated CI/CD variable. A
+lot of CI/CD secret variables are SSH deploy keys, those would need
+coordination with the Puppet repository, maybe simply modifying the
+YAML files at first, but eventually those could be generated by Trocla
+and (why not) automatically populated in GitLab as well.
+
+### S3
+
+Object storage uses secrets extensively to provide access to
+buckets. In case of a compromise, some or all of those tokens need to
+be reset. The [authentication section of the object storage
+documentation](service/object-storage#authentication) has some more information.
+
+Basically, all access keys need to be rotated, which means expiring
+the existing one and creating a new one, then copying the
+configuration over to the right place, typically Puppet, but GitLab
+runners need manual configuration.
+
+The bearer token also needs to be reset for Prometheus monitoring.
+
 ## Pager playbook
 
 This service is likely not going to alert or require emergency
-- 
GitLab