automate document sanitization routines

Since it seems people think I have magically and fully automated the document sanitization routine (hint: not quite!), it seems important to realize that dream and complete the work so that I don't have to sit there manually copying files around.

Current process

Right now, the process only partially automated and, more crucially, almost undocumented.

The process, for me, is basically:

get emails in my regular tor inbox with attachments
wait a bit to have some accumulate
save them to my local hard drive, in a dangerzone folder
rsync that to a remote virtual machine (dangerzone-01.torproject.org)
run a modified version of the dangerzone-converter to save files in a "safe" folder (https://github.com/firstlookmedia/dangerzone-converter/pull/7)
rsync the files back to my local computer
upload the files into some Nextcloud folder

Proposed mechanism

I think the right mechanism might be better centered around Nextcloud folders:

periodically check a Nextcloud (WebDAV?) folder (called dangerzone) for new files
when a file is found, move it to a dangerzone/processing folder as an ad-hoc locking mechanism
download the file locally
process the file with the dangerzone-converter
on failure, delete the failed file locally, and move it to a dangerzone/rejected folder remotely
on success, upload the sanitized file to a safe/ folder, move the original to dangerzone/processed

I think the only glue missing here is the WebDAV bits, everything else already exists. We might probably need to create a role account in Nextcloud for this to be detached from a specific user.

User story

From a user's perspective, things would happen like this:

I want to process untrusted files
I upload the untrusted files in a folder specially made for this, when I receive them
I share the folder with the dangerzone-bot user
after some delay, if "dangerzone" succeeded, files magically appear in a safe/ folder and the original files are moved into a dangerzone-processed folder, telling me they have been correctly processed.
if that didn't work, they end up in dangerzone-rejected and no new file appear in the safe/ folder

Step 3 I'm unsure about: it could be simpler to implement if it's actually:

i create a folder, share it with the "dangerzone" user and let TPA know about it.

But then we have to keep state in the client, and that might introduce yet again more friction. The upside is that it might be simpler to implement.

/cc @erin @gaba

does that sound good for people here?

Edited Jun 03, 2021 by anarcat