Use PDF.js to sanitize saved PDFs
PDF files often have malicious content within itself, which can be used to compromise the security of the system. Rendering PDF file with PDF.js is often slow and broken, which makes the users to open the files with native readers. Unfortunately, there is no good sanitizers: they are mostly written in script languages (s.a. Python and Ruby) and require their runtime. It will be very useful to have a tool to remove malicious content from downloaded PDF implemented in JS right in browser. Fortunately, Firefox already has PDF parsing library inside its PDF.js engine.
- Use PDF.js to parse PDF into internal representation, but do not render it.
- Decompress and destream it.
- Remove all potentially malicious tags (this should be tweakable in popup window similar to "Clear Recent History"): JS, fonts, flash (and other objects calling plugins), 3d, forms, signatures, remote content, anything else not needed for rendering directly.
- Recreate PDF file from the internal representation recomputing all the recomputable fields to destroy memory corruption exploits.
First I asked abou it in PDF.js bug tracker, they refused because it is not the goal of that project.