Specify a TorBEL archive data format
Sebastian says that TorBEL only gives out current scan results in its CSV file (and others). Once a relay disappears from the consensus, TorBEL removes scan results for the relay, so that its IP address can get unblocked by Wikipedia et al. as soon as possible. The CSV file is going to be updated every five minutes.
Makes sense for TorBEL's main use case. But we want to archive TorBEL's output files and use it as input for VisiTor, ExoneraTor, and similar tools. We'll want to know whether a relay was found to exit via a given IP address at a given time. But we want to avoid archiving every output files that TorBEL publishes, which would be highly redundant.
How about we define an archive data format that extends TorBEL's CSV format in Section 2.1 of data-spec.txt? The change would be that we never remove an entry for a given ExitAddress and Router ID. There could be many such entries, each with a distinct LastTestedTimestamp. The new uniqueness criterion would be (ExitAddress, RouterID, LastTestedTimestamp).
We could define this archive data format in the same spec or only on formats.html. We could implement the new format by making the TorBEL host create a copy of the CSV file whenever it changes and having metrics-db rsync and merge these files into the archive data format.
Once we have this archive data format, we'll have to update VisiTor and ExoneraTor to parse it.