Bridge descriptors CollecTor's recent/ directory contain many duplicates
The recent/
directory should only contain new descriptors, and ideally no duplicates. I just found that the latter is not the case:
$ grep -c "@type" recent/bridge-descriptors/server-descriptors/2014-07-22-07-04-02-server-descriptors
18175
$ grep -c "@type" recent/bridge-descriptors/extra-infos/2014-07-22-07-04-02-extra-infos
9723
Compare this to relay descriptors:
$ grep -c "@type" recent/relay-descriptors/server-descriptors/2014-07-22-07-05-52-server-descriptors
931
$ grep -c "@type" recent/relay-descriptors/extra-infos/2014-07-22-07-05-52-extra-infos
930
$ grep -c "@type" recent/relay-descriptors/microdescs/micro/2014-07-22-07-05-52-micro
30
The reason is that only novel relay descriptors will be downloaded and stored to disk, but the parsed bridge descriptor tarballs are full snapshots of Tonga's cached descriptor files. We need to add a check whether we already have a sanitized bridge descriptor and only store it if not.
Priority is minor, because this only adds some additional load on clients parsing descriptors more than once. But other than that it's mostly harmless.