Skip to content

Write a specification for Tor web server logs

This document should answer the following questions:

  • What will the raw input data look like?
  • compressed logs
  • varying dates in log-lines despite the file being tagged with a single date
  • are there only GET log-lines of 200 responses to be expected?
  • size could be huge (in future)
  • exact input format (if possible to define)
  • meta-data is provided in paths and filenames
  • ...
  • What will sanitized stored (on disk) logs look like?
  • cleaned log-lines, define exact format, give examples (as this might deviate from the current python sanitation)
  • meta-data is provided in paths and filenames
  • should files be reassembled, i.e., only log lines of a given date in a descriptor for that log date?
  • should storage (on disk) be in compressed files (opposed to storing other descriptors uncompressed)?
  • Should such log be stored (on disk) in reasonably sized chunks (once a GB size is reached)?
  • ...

Please add more.