Bridge descriptors CollecTor's recent/ directory contain many duplicates
recent/ directory should only contain new descriptors, and ideally no duplicates. I just found that the latter is not the case:
$ grep -c "@type" recent/bridge-descriptors/server-descriptors/2014-07-22-07-04-02-server-descriptors 18175 $ grep -c "@type" recent/bridge-descriptors/extra-infos/2014-07-22-07-04-02-extra-infos 9723
Compare this to relay descriptors:
$ grep -c "@type" recent/relay-descriptors/server-descriptors/2014-07-22-07-05-52-server-descriptors 931 $ grep -c "@type" recent/relay-descriptors/extra-infos/2014-07-22-07-05-52-extra-infos 930 $ grep -c "@type" recent/relay-descriptors/microdescs/micro/2014-07-22-07-05-52-micro 30
The reason is that only novel relay descriptors will be downloaded and stored to disk, but the parsed bridge descriptor tarballs are full snapshots of Tonga's cached descriptor files. We need to add a check whether we already have a sanitized bridge descriptor and only store it if not.
Priority is minor, because this only adds some additional load on clients parsing descriptors more than once. But other than that it's mostly harmless.