Provide a dir-spec implementation that serves sanitised descriptors
The Metrics Team currently performs sanitizing of bridge descriptors before publishing them on CollecTor (and subsequently feeding them into other software).
The published descriptors are detailed in:
https://metrics.torproject.org/collector.html#bridge-descriptors
The sanitizing steps are detailed here:
https://metrics.torproject.org/bridge-descriptors.html
The descriptors are transferred to the CollecTor host unsanitized by means of rsyncing a tarball. This violates one of the Tor Metrics principles in that this is a private interface and we are handling sensitive data. While the data is then sanitized and published, it is not possible for others to operate their own CollecTor instance that fetches data directly from the BridgeDB instance. Additionally, this increases code complexity in CollecTor as now we must treat the fetching of relay and bridge descriptors differently.
Ideally the sanitizing steps would be performed by BridgeDB and then we would be able to reuse (at least large chunks of) CollecTor code that currently fetches relay descriptors.
This is a project that would need co-ordination with the Metrics Team on the best way forward.