Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • TPA team TPA team
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 175
    • Issues 175
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • The Tor Project
  • TPA
  • TPA teamTPA team
  • Issues
  • #40256
Closed
Open
Created May 17, 2021 by anarcat@anarcatOwner

automate document sanitization routines

Since it seems people think I have magically and fully automated the document sanitization routine (hint: not quite!), it seems important to realize that dream and complete the work so that I don't have to sit there manually copying files around.

Current process

Right now, the process only partially automated and, more crucially, almost undocumented.

The process, for me, is basically:

  1. get emails in my regular tor inbox with attachments
  2. wait a bit to have some accumulate
  3. save them to my local hard drive, in a dangerzone folder
  4. rsync that to a remote virtual machine (dangerzone-01.torproject.org)
  5. run a modified version of the dangerzone-converter to save files in a "safe" folder (https://github.com/firstlookmedia/dangerzone-converter/pull/7)
  6. rsync the files back to my local computer
  7. upload the files into some Nextcloud folder

Proposed mechanism

I think the right mechanism might be better centered around Nextcloud folders:

  1. periodically check a Nextcloud (WebDAV?) folder (called dangerzone) for new files
  2. when a file is found, move it to a dangerzone/processing folder as an ad-hoc locking mechanism
  3. download the file locally
  4. process the file with the dangerzone-converter
  5. on failure, delete the failed file locally, and move it to a dangerzone/rejected folder remotely
  6. on success, upload the sanitized file to a safe/ folder, move the original to dangerzone/processed

I think the only glue missing here is the WebDAV bits, everything else already exists. We might probably need to create a role account in Nextcloud for this to be detached from a specific user.

User story

From a user's perspective, things would happen like this:

  1. I want to process untrusted files
  2. I upload the untrusted files in a folder specially made for this, when I receive them
  3. I share the folder with the dangerzone-bot user
  4. after some delay, if "dangerzone" succeeded, files magically appear in a safe/ folder and the original files are moved into a dangerzone-processed folder, telling me they have been correctly processed.
  5. if that didn't work, they end up in dangerzone-rejected and no new file appear in the safe/ folder

Step 3 I'm unsure about: it could be simpler to implement if it's actually:

  1. i create a folder, share it with the "dangerzone" user and let TPA know about it.

But then we have to keep state in the client, and that might introduce yet again more friction. The upside is that it might be simpler to implement.

/cc @erin @gaba

does that sound good for people here?

Edited Jun 03, 2021 by anarcat
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking