Changes

anarcat · 1543f1ec
--- a/service/dangerzone.md
+++ b/service/dangerzone.md
+The "dangerzone" service is a documentation sanitization system based
+on the [Dangerzone][] project, using [Nextcloud][] as a frontend.
+[Dangerzone]: https://dangerzone.rocks/
+[Nextcloud]: service/nextcloud
+[[_TOC_]]
+# Tutorial
+## Sanitizing untrusted files in Nextcloud
+Say you receive resumes or other untrusted content and you actually
+*need* to open those files because that's part of your job. What do
+you do?
+ 1. make a folder in [Nextcloud][]
+ 2. upload the untrusted file in the folder
+ 3. share the folder with the `dangerzone-bot` user
+ 4. after a short delay, the file *disappears* (*gasp*! do not worry,
+    they actually are moved to the `dangerzone-processing/` folder!)
+ 5. then after another delay, if [Dangerzone][] succeeded, sanitized
+    appear in a `safe/` folder and the original files are moved into a
+    `dangerzone-processed/` folder
+ 6. if that didn't work, files end up in `dangerzone-rejected/` and no
+    new file appear in the `safe/` folder
+A few important guidelines:
+ * files are processed every minute
+ * do **NOT** upload files **directly** in the `safe/` folder unless
+   you are *ABSOLUTELY* certain the files really are safe.
+ * some files cannot be processed by dangerzone, `.txt` files in
+   particular, are known to be rejected. those are possibly safe to
+   upload directly in `safe/`
+ * the bot recreates the directory structure you use in your shared
+   folder, so, for example, you could put your `resume.pdf` file in
+   `Candidate 42/resume.pdf` and the bot will put it in
+   `safe/Candidate 42/resume.pdf` when done
+# How-to
+This section is mostly aimed at service administrators maintaining the
+service. It will be of little help for regular users.
+## Pager playbook
+### Stray files in `processing`
+The service is known to be slightly buggy, and crash midway, leaving
+files in the `dangerzone/processing` directory (see [issue
+14](https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/issues/14)). Those files should normally be skipped, but the `processing`
+directory can be flushed if no bot is currently running (see below to
+inspect status).
+### Inspecting service status and logs
+The service is installed under `dangerzone-webdav-processor.service`,
+to look at the status, use `systemd`:
+    systemctl status dangerzone-webdav-processor
+To see when the bot will run next:
+    systemctl status dangerzone-webdav-processor.timer
+To see the logs:
+    journalctl -u dangerzone-webdav-processor
+## Disaster recovery
+Service has little to no non-ephemeral data and should be rebuildable
+from scratch by following the installation procedure.
+It depends on the availability of the WebDAV service ([Nextcloud][]).
+# Reference
+This section goes into how the service is setup in depth.
+## Installation
+The service was installed manually on the `dangerzone-01` virtual
+server, based on the [installation instructions from the
+README](https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/blob/cfb4ad754f975383a6cd4648194427b39df08899/README.md#installation). The actual instructions were followed mostly to the letter
+which means we use an app-specific token and the `dangerzone-bot` user
+in Nextcloud.
+## SLA
+There are no service level guarantees for the service, but during
+hiring it is expected to process files before hiring committees meet,
+so it's possible HR people pressure us to make the service work in
+those times.
+## Design
+This is built with [dangerzone-webdav-processor](https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/), a Python script
+which does this:
+ 1. periodically check a Nextcloud (WebDAV) folder (called
+    `dangerzone`) for new files
+ 2. when a file is found, move it to a `dangerzone/processing` folder
+    as an ad-hoc locking mechanism
+ 3. download the file locally
+ 4. process the file with the [`dangerzone-converter`][] Docker container
+ 5. on failure, delete the failed file locally, and move it to a
+    `dangerzone/rejected` folder remotely
+ 6. on success, upload the sanitized file to a `safe/` folder, move
+    the original to `dangerzone/processed`
+[`dangerzone-converter`]: https://github.com/firstlookmedia/dangerzone-converter/
+The above is copied verbatim from [the processor README file](https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/blob/cfb4ad754f975383a6cd4648194427b39df08899/README.md#new-mechanism).
+The processor is written in Python 3 and has minimal dependencies
+outside of the standard library and the [webdavclient][] Python
+library (`python3-webdavclient` in Debian). It obviously depends on
+the [`dangerzone-converter`][] Docker image, but could probably be
+reimplemented without it somewhat easily.
+[webdavclient]: https://pypi.org/project/webdavclient/
+### Queues and storage
+In that sense, the WebDAV share acts both as a queue and storage. The
+dangerzone server itself (currently `dangerzone-01`) stores only
+temporary copies of the files, and actively attempts to destroy those
+on completion (or crash). Files are stored in a temporary directory
+and should not survive reboots, at the very least.
+### Authentication
+Authentication is delegated to [Nextcloud][]. Nextcloud users grant
+access to the `dangerzone-bot` through the filesharing interface. The
+bot itself authenticates with Nextcloud with an app password token.
+### Configuration
+The WebDAV URL, username, password, and command line parameters are
+defined in `/etc/default/dangerzone-webdav-processor`. Since the
+processor is short lived, it does not need to be reloaded to reread
+the configuration file.
+The timer configuration is in systemd (in
+`/etc/systemd/system/dangerzone-webdav-processor.timer`), which needs
+to be reloaded to change the frequency, for example.
+## Issues
+Issues with the processor code should be filed in the [project][]
+[issue tracker][]
+[project]: https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/
+[issue tracker]: https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/issues
+If there is an issue with the running service, however, it is probably
+better to [file][] or [search][] for issues in the [team issue
+tracker][search].
+ [file]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/new
+ [search]: https://gitlab.torproject.org/tpo/tpa/team/-/issues
+## Maintainer, users, and upstream
+The processor was written by anarcat, and is maintained by kez and
+anarcat. Upstream is maintained by Micah Lee.
+## Monitoring and testing
+There is no monitoring of this service. Unit tests are
+[planned](https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/issues/12). There is a procedure to setup a local development
+environment in the [README file](https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/blob/cfb4ad754f975383a6cd4648194427b39df08899/README.md#installation).
+## Logs and metrics
+Logs of the service are stored in systemd, and may contain privately
+identifiable information (PII) in the form of file names, which, in
+the case of hires, often include candidates names.
+There are no metrics for this service, other than the server-level
+monitoring systems.
+## Backups
+No special provision is made for backing up this server, since it does
+not keep "authoritative" data and can easily be rebuilt from scratch.
+## Other documentation
+ * [Dangerzone WebDAV processor project][project]
+ * [Dangerzone project][Dangerzone]
+ * [Dangerzone converter Docker image project][`dangerzone-converter`]
+# Discussion
+The goal of this project is to provide an automated way to sanitize
+content inside TPA.
+## Overview
+The project was launched as part of [issue 40256][], which included a
+short iteration over a possible user story, which has been reused in
+the Tutorial above (and the project's README file).
+[issue 40256]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40256
+Two short security audits were performed after launch (see [issue
+5](https://gitlab.torproject.org/tpo/tpa/dangerzone-webdav-processor/-/issues/5)) and minor issues were found, some fixed. It is currently
+assumed that files are somewhat checked by operators for fishy things
+like weird filenames.
+A major flaw with the project is that operators still receive raw,
+untrusted files instead of having the service receive those files
+themselves. An improvement over this process would be to offer a web
+form that would accept uploads directly.
+Unit tests and CI should probably be deployed for this project to not
+become another piece of legacy infrastructure. Merging with upstream
+would also help: they have been working on improving their commandline
+interface and are considering rolling out their own [web service](https://github.com/firstlookmedia/dangerzone/issues/110)
+which might make the WebDAV processor idea moot.
+## History
+I was involved in the hiring of two new sysadmins at the Tor Project
+in spring 2021. To avoid untrusted inputs (i.e. random PDF files from
+the internet) being open by the hiring committee, we had a tradition
+of having someone sanitize those in a somewhat secure environment,
+which was typically some Qubes user doing ... whatever it is Qubes
+user do.
+Then when a new hiring process started, people asked me to do it
+again. At that stage, I had expected this to happen, so I partially
+automated this as a [pull request against the dangerzone project](https://github.com/firstlookmedia/dangerzone-converter/pull/7),
+which grew totally out hand. The automation wasn't quite complete
+though: i still had to upload the files to the sanitizing server, run
+the script, copy the files back, and upload them into Nextcloud.
+But by then people started to think I had magically and fully
+automated the document sanitization routine (hint: not quite!), so I
+figured it was important to realize that dream and complete the work
+so that I didn't have to sit there manually copying files around.
+## Goals
+Those were established after the fact.
+### Must have
+ * process files in an isolated environment somehow (previously was
+   done in Qubes)
+ * automation: TPA should not have to follow all hires
+### Nice to have
+ * web interface
+### Non-Goals
+ * perfect security: there's no way to ensure that
+## Approvals required
+Approved by gaba and vetted (by silence) of current hiring committees.
+## Proposed Solution
+See [issue 40256][issue 40256] and the design section above.
+## Cost
+Staff time, one virtual server.
+## Alternatives considered
+### Manual Qubes process
+Before anarcat got involved, documents were sanitized by other staff
+using [Qubes](https://www.qubes-os.org/) isolation. It's not exactly clear what that process
+was, but it was basically one person being added to the hiring email
+alias and processing the files by hand in Qubes.
+### Manual Dangerzone process
+The partial automation process used by anarcat before automation was:
+ 1. get emails in my regular tor inbox with attachments
+ 2. wait a bit to have some accumulate
+ 3. save them to my local hard drive, in a `dangerzone` folder
+ 4. rsync that to a remote virtual machine
+ 5. run a modified version of the [`dangerzone-converter`][] to save
+    files in a "`safe`" folder (see [batch-convert](https://github.com/anarcat/dangerzone-converter/blob/6a37f48dec67412c44f8814f122ea7977a658334/batch-convert.py) in [PR 7](https://github.com/firstlookmedia/dangerzone-converter/pull/7))
+ 6. rsync the files back to my local computer
+ 7. upload the files into some Nextcloud folder
+### Email-based process
+An alternative, email-based process was also suggested:
+ 1. candidates submit their resumes by email
+ 2. the program gets a copy by email
+ 3. the program sanitizes the attachment
+ 4. the program assigns a unique ID and name for that user
+    (e.g. Candidate 10 Alice Doe)
+ 5. the program uploads the sanitized attachment in a Nextcloud folder
+    named after the unique ID
+My concern with that approach was that it exposes the sanitization
+routines to the world, which opens the door to Denial of service
+attacks, at the very least. Someone could flood the disk by sending a
+massive number of resumes, for example. I could also think of ZIP
+bombs that could have "fun" consequences.
+By putting a user between the world and the script, we have some
+ad-hoc moderation that alleviates that issues, and also ensures a
+human-readable, meaningful identity can be attached with each
+submission (say: "this is Candidate 7 for job posting foo").
+The above would also not work with resumes submitted through other
+platforms (e.g. Indeed.com), unless an operator re-injects the resume,
+which might make the unique ID creation harder (because the From will
+be the operator, not the candidate).