Rethink how we handle issues while sanitizing bridge descriptors
The bridge descriptor sanitizer parses tarballs containing non-sanitized bridge descriptors, modifies their content by removing bridge IP addresses and other sensitive parts, and writes sanitized versions of those bridge descriptors to disk.
The sanitizer needs to recognize the lines contained in bridge descriptors to distinguish between lines that must be changed and others that can be kept unchanged, and it needs to be able to understand the exact format of certain lines in order to sanitize their contents.
This process can go wrong in various ways, and we need to decide how to handle those situations. Possible situations are:
- A tarball is malformed or can otherwise not be opened.
- A tarball contains one or more files that cannot be opened.
- A tarball file contains an unknown descriptor type.
- An internal problem prohibits sanitizing descriptor parts (e.g., missing secret for sanitizing IP address).
- A descriptor is missing parts that are required for properly sanitizing its contents.
- A descriptor contains an unrecognized line.
- A descriptor line doesn't follow the expected format, contains fewer or more arguments, etc.
Possible ways of handling such situations are:
A. Skip a line we don't understand and keep the rest of the descriptor. B. Skip a descriptor. C. Skip the file contained in the tarball and continue with the next. D. Abort processing the tarball. E. Skip the entire tarball, including discarding any descriptors processed before running into the problem, and attempt to process the tarball again in the next execution. F. Abstain from processing a given descriptor type until a problem has been resolved. G. Discard any descriptors processed in a tarball until running into the problem, abort the current execution, and refuse starting the next execution until the problem has been resolved. H. (in addition to A-G). Inform the operator by logging the problem. I. (in addition to A-G). Warn the operator and ask them to resolve the problem.
Looking at this list, I think that my preferred ways of handling problems would be something like:
- B+H in situations 5, 6, and 7;
- E+I in situations 1, 2, and 3; and
- G+I in situation 4.
That's not exactly what we're currently doing. And I'm not even sure if somebody else operating a CollecTor instance with the bridgedescs module would have the same preferences.
Let's discuss!