Primary navigation

Project

TPA team
- Activity
- Members
- Labels
- Issues
- Issue boards
- Milestones
- Wiki
- Incidents
- Service Desk
- Value stream analytics

Snippets Groups Projects

Issues
#41636

New look

Closed deploy webPassword authentication on prometheus1

View options

deploy webPassword authentication on prometheus1

View options

Closed Issue created 10 months ago by anarcat

Quote from TPA-RFC-33:

Authentication

To unify the clusters as we intend to, we need to fix authentication on the Prometheus and Grafana servers.

Current situation

Authentication is currently handled as follows:

Icinga: static htpasswd file, not managed by Puppet, modified manually when onboarding/off-boarding

Prometheus 1: static htpasswd file with dummy password managed by Puppet

Grafana 1: same, with an extra admin password kept in Trocla, using the auth proxy configuration

Prometheus 2: static htpasswd file with real admin password deployed, extra password generated for [prometheus-alerts][] continuous integration (CI) validation, all deployed through Puppet

Grafana 2: static htpasswd file with real admin password for "admin" and "metrics", both of which are shared with an unclear number of people

Originally, both Prometheus servers had the same authentication system but that was split in 2019 to protect the external server.

Proposed changes

The plan was originally to just delegate authentication to Grafana but we're concerned this is going to introduce yet another authentication source, which we want to avoid. Instead, we should re-enable the webPassword field in LDAP, which has been mysteriously in userdir-ldap-cgi's 7cba921 (drop many fields from update form, 2016-03-20), a trivial patch.

This would allow any tor-internal person to access the dashboards. Access levels would be managed inside the Grafana database.

Prometheus servers would reuse the same password file, allowing tor-internal users to issue "raw" queries, browse and manage alerts.

Note that this change will negatively impact the prometheus-alerts CI which will require another way to validate its rulesets.

We have briefly considered making Grafana dashboards publicly available, but ultimately rejected this idea, as it would mean having two entirely different time series datasets, which would be too hard to separate reliably. That would also impose a cardinal explosion of servers if we want to provide high availability.

TL;DR: deploy the new webPassword file from LDAP (probably by tweaking the host entry in the LDAP DB) and hook the webserver up to it. Notify users.

prom1
- merge https://gitlab.torproject.org/tpo/tpa/puppet-control/-/merge_requests/44 and deploy on both prom1 and prom2
- set a timeline for the retirement of tor-guest
  - april 17th
- announce the change to TPA
- fix prometheus-alerts/tor-puppet so nothing relies on the shared user
  - only found the minio container that hardcodes the credentials
- add a fallback credentials for prom1 so that we have a fallback admin account even if ud-ldap fails (probably the same password as services/grafana.torproject.org in tor-password.git)
- send reminder to TPA before the cutoff date
- fix permissions for the individual users
  - setup TPA team with access to the main org and the folder users as admins of the main org
- wait for the tor-guest retirement, see if anything breaks
prom2
- set a timeline for retiring the shared passwords from /etc/apache2/prom_htpasswd (essentially the metrics account)
  - april 17th
- announce retirement of shared user to the teams that use it
- fix permissions for users who set their new users
  - setup a grafana team that has access to the main org and the folder that contains the team's dashboards. add users to the team. if they need to be able to modify dashboards, set them as Admin in the team.
- send reminder before the cutoff date
- at the planned cutoff date, remove the metrics account entirely