From 172d9cc02f3c696176751616fbc465d30921136f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Charaoui?= <jerome@riseup.net> Date: Thu, 2 May 2024 21:14:53 -0400 Subject: [PATCH] update service/donate-review (tpo/web/donate-neo#45) --- service.md | 2 +- service/donate-review.md | 170 ++++++++++++++++----------------------- 2 files changed, 69 insertions(+), 103 deletions(-) diff --git a/service.md b/service.md index c4825d62..c147ccab 100644 --- a/service.md +++ b/service.md @@ -21,7 +21,7 @@ Those are services managed by TPA directly. | [dangerzone][] | Sanitize untrusted documents | N/A | TPA | 100% | LDAP Nextcloud | | [dns][] | domain name service | N/A | TPA | 10% | N/A | | [documentation][] | documentation (this wiki) | <https://help.torproject.org/> | TPA | 10% | see GitLab | -| [donate-review][] | donate-neo review apps | `*.donate-review.torproject.org` | TPA | 25% | N/A | +| [donate-review][] | donate-neo review apps | `*.donate-review.torproject.net` | TPA lavamind | 90% | N/A | | [drbd][] | disk redundancy | N/A | TPA | 10% | N/A | | [email][] | @torproject.org emails services | N/A | TPA | 0% | LDAP Puppet | | [forum][] | Tor Project community forums | <https://forum.torproject.net> | TPA hiro gus duncan | 50% | yes | diff --git a/service/donate-review.md b/service/donate-review.md index 0e3dfb7c..fc3dda50 100644 --- a/service/donate-review.md +++ b/service/donate-review.md @@ -1,4 +1,5 @@ -Donate-review is a gitlab runner and review-app deployment target for [donate-neo](gitlab.torproject.org/tpo/web/donate-neo). +Donate-review is a GitLab runner and Review App deployment target for +[donate-neo](gitlab.torproject.org/tpo/web/donate-neo). [[_TOC_]] @@ -6,40 +7,22 @@ Donate-review is a gitlab runner and review-app deployment target for [donate-ne ## Starting a review app -1. Push some changes to a branch (other than main) on tpo/web/donate-neo -1. Create an MR -1. Wait for the `lint`, `mypy`, and `test` jobs to finish -1. Run the `deploy-review` job, and wait for it to finish -1. Click the "View app" button -1. When done reviewing, click "Stop environment" or run the `stop-review` job -1. Merge changes +Pushing a commit on a non-main branch in the project repository will trigger a +CI pipeline that includes `deploy-review` job. This job will deploy a review +app hosted at `<branchname>.donate-review.torproject.net`. -# How-to - -<!-- more in-depth procedure that may require interpretation --> - -## Pager playbook - -### deploy-review job fails with "The user who started this CI job does not have permission to deploy" - -The donate-review runner is a shell runner, it doesn't use any kind of sandboxing or containerization. To try to mitigate potential security issues with created by letting any contributor run a shell script on the donate-review machine, a very simple "access control" was implemented in the gitlab CI script. - -```bash -access_level="$(curl --header "PRIVATE-TOKEN: $CI_PROJECT_ACCESS_TOKEN" "$CI_API_V4_URL/projects/$CI_PROJECT_ID/members/$GITLAB_USER_ID" | jq .access_level)" -if [ "$access_level" -ne 50 ]; then - echo The user who started this CI job does not have permission to deploy - echo "Required level: 50. Actual level: $access_level" - exit 1 -fi -``` +Commits to the `main` branch will be deployed to a review app by the +`deploy-staging` job. The deployment process is similar except the app will be +hosted at `staging.donate-review.torproject.net`. -This script checks if the user is an owner of the repository (access level 50). Anyone who is *not an owner* won't be allowed to run the deploy job. If a user's jobs are failing with the above error message, make sure they're an owner, not a maintainer, developer, reporter, or guest. +All review apps are automatically stopped and cleaned up once the associated +branch is deleted. -### Static assets don't load on deployed review apps +# How-to -An issue we've seen several times is that static assets sometimes don't load. The issue was tracked back to donate-neo handling its own static assets. To fix this, the gitlab CI script was changed to run `manage.py collectstatic` and configure donate-review apache vhosts to serve the static files directly. This issue shouldn't rear its head again, but in the event that it does, start by ensuring that django's collectstatic is being run, that the static files are deployed to the correct location, and that the review app vhost is configured to serve the static files from the correct directory, with `Require all granted`. +## Pager playbook -## Disaster recovery +# Disaster recovery In the event that the box is compromised, it should be rebuilt from scratch. See [Installation](#installation) below. @@ -49,68 +32,64 @@ See [Installation](#installation) below. ## Installation 1. bootstrap a new vm, and set up puppet -1. install gitlab runner on the machine and configure a [shell executor][] -1. add the `role: donate_review` parameter to the new machine in hiera +1. add the `role: donate_review` parameter to the new machine in hiera-enc 1. run puppet -1. add the runner to a gitlab project - -[shell executor]: <https://docs.gitlab.com/runner/executors/shell.html> ## Upgrades -Upgrades are performed automatically through Debian packages. Gitlab runner is +Upgrades are performed automatically through Debian packages. `gitlab-runner` is excluded from unattended-upgrades and must be upgraded manually. -Packages used by the review-app instances (gunicorn, django, etc.) are -installed into a virtualenv as part of the CI process. These packages will be -updated as part of donate-neo maintenance. - -## SLA +The review apps are not upgraded unless new commits appear in their branch to +trigger a rebuild. -<!-- this describes an acceptable level of service for this service --> +TODO: The `renovate-cron` project should be enabled here to ensure timely +upgrades to the staging and production deployments. ## Design and architecture -There are two major pieces to donate-review. The donate-neo [gitlab-ci file][], -and a `setup-apache-config.sh` script that lives in puppet. Here's what happens -when the deploy-review job runs: +There are three major pieces to donate-review: + +- the donate-neo [.gitlab-ci.yml file][] +- the `review-app.conf` apache2 configuration file +- the `ci-reviewapp-generate-vhosts` script + +When a new feature branch is pushed to the project repository, the CI pipeline +will build a new container and store it in the project's container registry. + +If tests are successful, the pipeline will then run a job on the shell executor +to create (or update) a rootless podman container in the `gitlab-runner` user +context. This container is set up to expose its internal port 8000 to a random +outside port on the host. -1. check the access-level of the user who ran the job -1. install dependencies, gunicorn, create and run migrations -1. start gunicorn on port 0 (kernel chooses), and write the gunicorn pid to a file -1. use the gunicorn pid to find what port the kernel chose -1. run the setup-apache-config script as root (with a `NOPASSWD` sudoers rule) -1. the script writes an apache site config file for this review-app instance -1. reload apache +Finally, the `ci-reviewapp-generate-vhosts` script is executed via `sudo`. It +will inspect all the running review app containers and create a configuration +file where each line will instantiate a virtual host macro. These virtual hosts +will proxy incoming connections to the appropriate port where the container is +listening. [gitlab-ci file]: <https://gitlab.torproject.org/tpo/web/donate-neo/-/blob/main/.gitlab-ci.yml> ### Subdomains and TLS Certs -the apache configuration uses subdomain-based vhosts. the review-apps use -domains like `<ci commit ref slug>.donate-review-01.torproject.org`. -Letsencrypt won't give us a wildcard cert for -`*.donate-review-01.torproject.org` because it's too many nested subdomains. -Instead, we use apache's [mod\_md][modmd] to dynamically request Letsencrypt -certificates as-needed for new review-apps - -[modmd]: <https://httpd.apache.org/docs/trunk/mod/mod_md.html> +A wildcard certificate for `*.donate-review.torproject.net` is used for all +review apps virtual host configurations. ## Services - gitlab-runner - apache -- gunicorn (one instance per review app) - -## Storage - -Each donate-neo instance has its own sqlite database. The database resides -on-disk (as opposed to in-memory). +- podman containers (one per review app) ## Authentication The donate-review *runner* uses token authentication to pick up jobs from -Gitlab. Each donate-neo instance handles its own authentication. +GitLab. To access the review apps, HTTP basic authentication is required to +prevent passers-by from stumbling onto the review apps and to keep indexing +bots at bay. The username is `tor-www` and the password is blank. + +The Django-based review apps don't handle authentication, as there are no +management users created by the app. ## Issues @@ -126,20 +105,30 @@ There is no issue tracker specifically for this the donate-review runner. ## Users -Anyone contributing to tpo/web/donate-neo +Anyone contributing to [tpo/web/donate-neo][]. + +[tpo/web/donate-neo]: https://gitlab.torproject.org/tpo/web/donate-neo ## Logs -The only relevant logs are from apache, logrotate rotates them every 14 days. +The review app logs may be accessed using the `podman logs <container>` command +as the `gitlab-runner` user. ## Backups -donate-review has no special backup needs. All the donate-neo instances are +This service has no special backup needs. All the donate-neo instances are ephemeral, and a new system can be bootstrapped solely from puppet. ## Overview -donate-review was created as part of tpo/web/donate-neo#6 and tpo/tpa/team#41108. Donate-review's purpose is to provide a review app deploy target for tpo/web/donate-neo. Most of the other tpo/web sites are static lektor sites, and can be easily deployed to a review app target as simple static sites fronted by Apache. But because donate-neo is a django application, it needs a specially-created deploy target for review apps. donate-review clones the donate-neo codebase, checks out the right commit, sets up the applications virtuel environment, dependencies, database and static files, and then deploys an Apache vhost for the new review app. +donate-review was created as part of tpo/web/donate-neo#6, +tpo/tpa/team#41108 and refactored as part of tpo/web/donate-neo#21. + +Donate-review's purpose is to provide a review app deploy target for +tpo/web/donate-neo. Most of the other tpo/web sites are static lektor sites, +and can be easily deployed to a review app target as simple static sites +fronted by Apache. But because donate-neo is a django application, it needs a +specially-created deploy target for review apps. ## Security and risk assessment @@ -148,39 +137,16 @@ real sandboxing or containerization. There was an attempt to set up the runner using systemd-nspawn, but it was taking too long and we [eventually decided against it][]. -The current security measures include the basic things we get for free from -debian (file permissions, apparmor), as well as some access control in the -gitlab-ci file. The deploy-review job has to be run manually, and the job's -script starts by [checking if the user who ran the job is a repo maintainer][]. - -A proper security and risk assessment should be prioritized to find and -mitigate potential security problems. +Currently, project members with `Developper` permission or above in the +donate-neo project may edit the CI configuration to execute arbitrary commands +as the `gitlab-runner` user on the machine. Since these users are all trusted +contributors, this should pose no problem. However, care should be taken to +ensure no untrusted party is allowed to gain this privilege. [eventually decided against it]: <https://gitlab.torproject.org/tpo/tpa/team/-/issues/41108#note_2913481> -[checking if the user who ran the job is a repo maintainer]: <https://gitlab.torproject.org/tpo/web/donate-neo/-/blob/2344c59a/.gitlab-ci.yml#L46> - ## Technical debt and next steps -donate-neo instances (files, databases, and gunicorn instances) are not cleaned -up automatically. Cleanup only happens when someone runs the `stop-review` job. -If this job isn't run, These instances will not be cleaned up. Currently, there -is no solution for this. There's an [open Gitlab ticket][] to implement running -an `after_script` when a job is cancelled, and as of January 2024 is "targeting near term resolution" for the ticket. A potential solution is to write a cron job that just cleans up -old gunicorn processes, and removes old builds. - -[open Gitlab ticket]: <https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4843> - -## Known bugs - -### stop-review sometimes doesn't remove old deployments - -The stop-review job is supposed to read the `gunicorn.pid` file, kill the gunicorn process, tear down the apache config, and remove the deployed code. Sometimes gunicorn dies early though, and the stop-review job fails early, leaving old configs and code lying around. - -### Outdated databases - -Donate Review has a known issue with updating existing MRs. The updated code -will get pulled to the review app target, but necessary setup won't be re-run. -This can lead to situations where a new Django app is installed, but the app's -migrations won't be run, leading to an outdated database. This bug was -discovered in tpo/tpa/team#41491. +The next step here is to make the donate-review service fully generic to allow +other web projects with special runtime requirements to deploy review apps in +the same manner. -- GitLab