automate document sanitization routines
Since it seems people think I have magically and fully automated the document sanitization routine (hint: not quite!), it seems important to realize that dream and complete the work so that I don't have to sit there manually copying files around.
Current process
Right now, the process only partially automated and, more crucially, almost undocumented.
The process, for me, is basically:
- get emails in my regular tor inbox with attachments
- wait a bit to have some accumulate
- save them to my local hard drive, in a
dangerzone
folder - rsync that to a remote virtual machine (
dangerzone-01.torproject.org
) - run a modified version of the dangerzone-converter to save files in a "
safe
" folder (https://github.com/firstlookmedia/dangerzone-converter/pull/7) - rsync the files back to my local computer
- upload the files into some Nextcloud folder
Proposed mechanism
I think the right mechanism might be better centered around Nextcloud folders:
- periodically check a Nextcloud (WebDAV?) folder (called
dangerzone
) for new files - when a file is found, move it to a
dangerzone/processing
folder as an ad-hoc locking mechanism - download the file locally
- process the file with the dangerzone-converter
- on failure, delete the failed file locally, and move it to a
dangerzone/rejected
folder remotely - on success, upload the sanitized file to a
safe/
folder, move the original todangerzone/processed
I think the only glue missing here is the WebDAV bits, everything else already exists. We might probably need to create a role account in Nextcloud for this to be detached from a specific user.
User story
From a user's perspective, things would happen like this:
- I want to process untrusted files
- I upload the untrusted files in a folder specially made for this, when I receive them
- I share the folder with the
dangerzone-bot
user - after some delay, if "dangerzone" succeeded, files magically appear in a
safe/
folder and the original files are moved into adangerzone-processed
folder, telling me they have been correctly processed. - if that didn't work, they end up in
dangerzone-rejected
and no new file appear in thesafe/
folder
Step 3 I'm unsure about: it could be simpler to implement if it's actually:
- i create a folder, share it with the "dangerzone" user and let TPA know about it.
But then we have to keep state in the client, and that might introduce yet again more friction. The upside is that it might be simpler to implement.
/cc @erin @gaba
does that sound good for people here?