Split up legacy module into more maintainable parts
Our legacy module is a mess. That code dates back to a time when we tried to use a single database for all our statistics and for a service called relay search, which was not the same service as today's relay search. While I'm not ruling out that we can make a single-database approach work for everything we want to do with our data, it's not going to be this database.
It's time to move away from this legacy database and take a similar approach as we're taking for the other modules, where we only store the relevant parts that we need for our graphs.
As of now, the legacy module provides data for the following graphs:
- In the Servers category:
- Relays and bridges
- Relays by relay flag
- Relays by tor version
- Relays by platform
- In the Traffic category:
- Total relay bandwidth
- Advertised and consumed bandwidth by relay flag
- Consumed bandwidth by Exit/Guard flag combination
- Bandwidth spent on answering directory requests
Viewed from a different perspective, these 8 graphs show 3 different metrics:
- Relay or bridge counts in graphs 1 to 4
- Advertised bandwidths in graphs 5 and 6
- Bandwidth histories in graphs 5 to 8
I could imagine that we make the following changes to split up the legacy module into more maintainable parts:
- Use existing data from the ipv6servers module for graph 1 and for the advertised bandwidth portions in graphs 5 and 6. This data already exists with only trivial differences affecting how we're treating missing data. We could just switch.
- Extend the ipv6servers module to also provide data for graphs 2 to 4. This extension would require us to reimport the entire archive, so it's more of a rewrite. But the ipv6servers module code is much cleaner and easier to extend than the legacy module code. And when we extend that module, we can relatively easily add bridge statistics and other relay metrics like consensus weight or path selection probabilities that we can use in new graphs later on. All in all not a trivial amount of work, but probably worth it.
- Keep the remaining parts of the legacy module for the bandwidth history parts in graphs 5 to 8. Bandwidth histories are going to be replaced by PrivCount data in the medium term anyway. We could keep the legacy module around for another year or two without planning to change much during that time. And when we shut it down, we can keep a copy of the aggregate data around, just like we're going to keep a static summary of the Tor Messenger statistics (legacy/trac#26047 (moved)).
I'll start working on the second suggested change above. The two other changes depend on whether that second change can be made successfully.