Verified Commit 172d9cc0 authored by Jérôme Charaoui's avatar Jérôme Charaoui 🕯️
Browse files

update service/donate-review (tpo/web/donate-neo#45)

parent 61707337
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -21,7 +21,7 @@ Those are services managed by TPA directly.
| [dangerzone][]       | Sanitize untrusted documents      | N/A                                 | TPA                 | 100%       | LDAP Nextcloud                                                      |
| [dns][]              | domain name service               | N/A                                 | TPA                 | 10%        | N/A                                                                 |
| [documentation][]    | documentation (this wiki)         | <https://help.torproject.org/>      | TPA                 | 10%        | see GitLab                                                          |
| [donate-review][]    | donate-neo review apps            | `*.donate-review.torproject.org`    | TPA                 | 25%        | N/A                                                                 |
| [donate-review][]    | donate-neo review apps            | `*.donate-review.torproject.net`    | TPA lavamind        | 90%        | N/A                                                                 |
| [drbd][]             | disk redundancy                   | N/A                                 | TPA                 | 10%        | N/A                                                                 |
| [email][]            | @torproject.org emails services   | N/A                                 | TPA                 | 0%         | LDAP Puppet                                                         |
| [forum][]            | Tor Project community forums      | <https://forum.torproject.net>      | TPA hiro gus duncan | 50%        | yes                                                                 |
+68 −102
Original line number Diff line number Diff line
Donate-review is a gitlab runner and review-app deployment target for [donate-neo](gitlab.torproject.org/tpo/web/donate-neo).
Donate-review is a GitLab runner and Review App deployment target for
[donate-neo](gitlab.torproject.org/tpo/web/donate-neo).

[[_TOC_]]

@@ -6,40 +7,22 @@ Donate-review is a gitlab runner and review-app deployment target for [donate-ne

## Starting a review app

1. Push some changes to a branch (other than main) on tpo/web/donate-neo
1. Create an MR
1. Wait for the `lint`, `mypy`, and `test` jobs to finish
1. Run the `deploy-review` job, and wait for it to finish
1. Click the "View app" button
1. When done reviewing, click "Stop environment" or run the `stop-review` job
1. Merge changes
Pushing a commit on a non-main branch in the project repository will trigger a
CI pipeline that includes `deploy-review` job. This job will deploy a review
app hosted at `<branchname>.donate-review.torproject.net`.

# How-to

<!-- more in-depth procedure that may require interpretation -->

## Pager playbook

### deploy-review job fails with "The user who started this CI job does not have permission to deploy"

The donate-review runner is a shell runner, it doesn't use any kind of sandboxing or containerization. To try to mitigate potential security issues with created by letting any contributor run a shell script on the donate-review machine, a very simple "access control" was implemented in the gitlab CI script.

```bash
access_level="$(curl --header "PRIVATE-TOKEN: $CI_PROJECT_ACCESS_TOKEN" "$CI_API_V4_URL/projects/$CI_PROJECT_ID/members/$GITLAB_USER_ID" | jq .access_level)"
if [ "$access_level" -ne 50 ]; then
  echo The user who started this CI job does not have permission to deploy
  echo "Required level: 50. Actual level: $access_level"
  exit 1
fi
```
Commits to the `main` branch will be deployed to a review app by the
`deploy-staging` job. The deployment process is similar except the app will be
hosted at `staging.donate-review.torproject.net`.

This script checks if the user is an owner of the repository (access level 50). Anyone who is *not an owner* won't be allowed to run the deploy job. If a user's jobs are failing with the above error message, make sure they're an owner, not a maintainer, developer, reporter, or guest.
All review apps are automatically stopped and cleaned up once the associated
branch is deleted.

### Static assets don't load on deployed review apps
# How-to

An issue we've seen several times is that static assets sometimes don't load. The issue was tracked back to donate-neo handling its own static assets. To fix this, the gitlab CI script was changed to run `manage.py collectstatic` and configure donate-review apache vhosts to serve the static files directly. This issue shouldn't rear its head again, but in the event that it does, start by ensuring that django's collectstatic is being run, that the static files are deployed to the correct location, and that the review app vhost is configured to serve the static files from the correct directory, with `Require all granted`.
## Pager playbook

## Disaster recovery
# Disaster recovery

In the event that the box is compromised, it should be rebuilt from scratch.
See [Installation](#installation) below.
@@ -49,68 +32,64 @@ See [Installation](#installation) below.
## Installation

1. bootstrap a new vm, and set up puppet
1. install gitlab runner on the machine and configure a [shell executor][]
1. add the `role: donate_review` parameter to the new machine in hiera
1. add the `role: donate_review` parameter to the new machine in hiera-enc
1. run puppet
1. add the runner to a gitlab project

[shell executor]: <https://docs.gitlab.com/runner/executors/shell.html>

## Upgrades

Upgrades are performed automatically through Debian packages. Gitlab runner is
Upgrades are performed automatically through Debian packages. `gitlab-runner` is
excluded from unattended-upgrades and must be upgraded manually.

Packages used by the review-app instances (gunicorn, django, etc.) are
installed into a virtualenv as part of the CI process. These packages will be
updated as part of donate-neo maintenance.

## SLA
The review apps are not upgraded unless new commits appear in their branch to
trigger a rebuild.

<!-- this describes an acceptable level of service for this service -->
TODO: The `renovate-cron` project should be enabled here to ensure timely
upgrades to the staging and production deployments.

## Design and architecture

There are two major pieces to donate-review. The donate-neo [gitlab-ci file][],
and a `setup-apache-config.sh` script that lives in puppet. Here's what happens
when the deploy-review job runs:
There are three major pieces to donate-review:

- the donate-neo [.gitlab-ci.yml file][]
- the `review-app.conf` apache2 configuration file
- the `ci-reviewapp-generate-vhosts` script

When a new feature branch is pushed to the project repository, the CI pipeline
will build a new container and store it in the project's container registry.

If tests are successful, the pipeline will then run a job on the shell executor
to create (or update) a rootless podman container in the `gitlab-runner` user
context. This container is set up to expose its internal port 8000 to a random
outside port on the host.

1. check the access-level of the user who ran the job
1. install dependencies, gunicorn, create and run migrations
1. start gunicorn on port 0 (kernel chooses), and write the gunicorn pid to a file
1. use the gunicorn pid to find what port the kernel chose
1. run the setup-apache-config script as root (with a `NOPASSWD` sudoers rule)
1. the script writes an apache site config file for this review-app instance
1. reload apache
Finally, the `ci-reviewapp-generate-vhosts` script is executed via `sudo`. It
will inspect all the running review app containers and create a configuration
file where each line will instantiate a virtual host macro. These virtual hosts
will proxy incoming connections to the appropriate port where the container is
listening.

[gitlab-ci file]: <https://gitlab.torproject.org/tpo/web/donate-neo/-/blob/main/.gitlab-ci.yml>

### Subdomains and TLS Certs

the apache configuration uses subdomain-based vhosts. the review-apps use
domains like `<ci commit ref slug>.donate-review-01.torproject.org`.
Letsencrypt won't give us a wildcard cert for
`*.donate-review-01.torproject.org` because it's too many nested subdomains.
Instead, we use apache's [mod\_md][modmd] to dynamically request Letsencrypt
certificates as-needed for new review-apps

[modmd]: <https://httpd.apache.org/docs/trunk/mod/mod_md.html>
A wildcard certificate for `*.donate-review.torproject.net` is used for all
review apps virtual host configurations.

## Services

- gitlab-runner
- apache
- gunicorn (one instance per review app)

## Storage

Each donate-neo instance has its own sqlite database. The database resides
on-disk (as opposed to in-memory).
- podman containers (one per review app)

## Authentication

The donate-review *runner* uses token authentication to pick up jobs from
Gitlab. Each donate-neo instance handles its own authentication.
GitLab. To access the review apps, HTTP basic authentication is required to
prevent passers-by from stumbling onto the review apps and to keep indexing
bots at bay. The username is `tor-www` and the password is blank.

The Django-based review apps don't handle authentication, as there are no
management users created by the app.

## Issues

@@ -126,20 +105,30 @@ There is no issue tracker specifically for this the donate-review runner.

## Users

Anyone contributing to tpo/web/donate-neo
Anyone contributing to [tpo/web/donate-neo][].

[tpo/web/donate-neo]: https://gitlab.torproject.org/tpo/web/donate-neo

## Logs

The only relevant logs are from apache, logrotate rotates them every 14 days.
The review app logs may be accessed using the `podman logs <container>` command
as the `gitlab-runner` user.

## Backups

donate-review has no special backup needs. All the donate-neo instances are
This service has no special backup needs. All the donate-neo instances are
ephemeral, and a new system can be bootstrapped solely from puppet.

## Overview

donate-review was created as part of tpo/web/donate-neo#6 and tpo/tpa/team#41108. Donate-review's purpose is to provide a review app deploy target for tpo/web/donate-neo. Most of the other tpo/web sites are static lektor sites, and can be easily deployed to a review app target as simple static sites fronted by Apache. But because donate-neo is a django application, it needs a specially-created deploy target for review apps. donate-review clones the donate-neo codebase, checks out the right commit, sets up the applications virtuel environment, dependencies, database and static files, and then deploys an Apache vhost for the new review app.
donate-review was created as part of tpo/web/donate-neo#6,
tpo/tpa/team#41108 and refactored as part of tpo/web/donate-neo#21.

Donate-review's purpose is to provide a review app deploy target for
tpo/web/donate-neo. Most of the other tpo/web sites are static lektor sites,
and can be easily deployed to a review app target as simple static sites
fronted by Apache. But because donate-neo is a django application, it needs a
specially-created deploy target for review apps.

## Security and risk assessment

@@ -148,39 +137,16 @@ real sandboxing or containerization. There was an attempt to set up the runner
using systemd-nspawn, but it was taking too long and we
[eventually decided against it][].

The current security measures include the basic things we get for free from
debian (file permissions, apparmor), as well as some access control in the
gitlab-ci file. The deploy-review job has to be run manually, and the job's
script starts by [checking if the user who ran the job is a repo maintainer][].

A proper security and risk assessment should be prioritized to find and
mitigate potential security problems.
Currently, project members with `Developper` permission or above in the
donate-neo project may edit the CI configuration to execute arbitrary commands
as the `gitlab-runner` user on the machine. Since these users are all trusted
contributors, this should pose no problem. However, care should be taken to
ensure no untrusted party is allowed to gain this privilege.

[eventually decided against it]: <https://gitlab.torproject.org/tpo/tpa/team/-/issues/41108#note_2913481>

[checking if the user who ran the job is a repo maintainer]: <https://gitlab.torproject.org/tpo/web/donate-neo/-/blob/2344c59a/.gitlab-ci.yml#L46>

## Technical debt and next steps

donate-neo instances (files, databases, and gunicorn instances) are not cleaned
up automatically. Cleanup only happens when someone runs the `stop-review` job.
If this job isn't run, These instances will not be cleaned up. Currently, there
is no solution for this. There's an [open Gitlab ticket][] to implement running
an `after_script` when a job is cancelled, and as of January 2024 is "targeting near term resolution" for the ticket. A potential solution is to write a cron job that just cleans up
old gunicorn processes, and removes old builds.

[open Gitlab ticket]: <https://gitlab.com/gitlab-org/gitlab-runner/-/issues/4843>

## Known bugs

### stop-review sometimes doesn't remove old deployments

The stop-review job is supposed to read the `gunicorn.pid` file, kill the gunicorn process, tear down the apache config, and remove the deployed code. Sometimes gunicorn dies early though, and the stop-review job fails early, leaving old configs and code lying around.

### Outdated databases

Donate Review has a known issue with updating existing MRs. The updated code
will get pulled to the review app target, but necessary setup won't be re-run.
This can lead to situations where a new Django app is installed, but the app's
migrations won't be run, leading to an outdated database. This bug was
discovered in tpo/tpa/team#41491.
The next step here is to make the donate-review service fully generic to allow
other web projects with special runtime requirements to deploy review apps in
the same manner.