BridgeDB re-assigns unallocated bridges from/to file buckets without need

It looks like BridgeDB re-assigns unallocated bridges from/to file buckets without need. That is, a bridge that keeps running from one network status to the next might be removed from a file bucket and replaced with another bridge. This leads to quick enumeration of all bridges in the unallocated pool when using file buckets.

A second bug seems to be that BridgeDB appends bridges to file buckets instead of overwriting these files. The result is that there are duplicate entries in files that external distributors use.

Here's how one can reproduce the problem using sanitized bridge descriptors. Yes, this description is lengthy and ugly, but it works for testing BridgeDB in general, even if one doesn't have the original bridge descriptors handy.

Download and extract the sanitized bridge descriptors from January 2009:

https://metrics.torproject.org/data/bridge-descriptors-2009-01.tar.bz2

Create a single bridge-descriptors file containing all bridge descriptors from that month.

$ cd bridge-descriptors-2009-01/ $ echo "@purpose bridge" > purpose $ echo "router-signature" > routersignature $ find server-descriptors/ -type f | xargs -I{} cat purpose {} routersignature > bridge-descriptors

Copy the new bridge-descriptors file to BridgeDB's working directory (here: ~/run/).

$ cd ~/run/ $ cp bridge-descriptors-2009-01/bridge-descriptors .

Also copy the sanitized network status file from 2009-01-10 00:07:04 to BridgeDB's working directory and rename it to networkstatus-bridges.

$ cp bridge-descriptors-2009-01/statuses/10/20090110-000704-4A0CCD2DDC7995083D73F5D667100C8A5831F16D networkstatus-bridges

Configure BridgeDB to write 4 bridges to file bucket twitter, otherwise keep the default configuration. Start BridgeDB (note that it may take 30 seconds to digest the 8.5M bridge-descriptors file) and dump bridges to file buckets.

The result is a new file twitter-2011-03-09.brdgs with this content (may vary for you):

10.134.79.249:443
10.236.199.173:443
10.116.76.140:9001
10.51.76.151:18443

It also writes this unallocated-2011-03-09.brdgs file (again, content may vary):

10.126.198.237:443
10.251.69.61:9003
10.31.186.235:49001
10.81.88.5:9001

Replace the networkstatus-bridges with the one from roughly 30 minutes later:

$ cp bridge-descriptors-2009-01/statuses/10/20090110-030709-4A0CCD2DDC7995083D73F5D667100C8A5831F16D networkstatus-bridges

Give BridgeDB a HUP, wait at least 30 seconds, and tell it to dump bridges to file buckets again.

Here's my new twitter-2011-03-09.brdgs file:

10.134.79.249:443
10.236.199.173:443
10.116.76.140:9001
10.51.76.151:18443
10.51.76.151:18443
10.237.143.0:443
10.241.115.62:443
10.239.76.198:443

And my unallocated-2011-03-09.brdgs file:

10.126.198.237:443
10.251.69.61:9003
10.31.186.235:49001
10.81.88.5:9001
10.126.198.237:443
10.251.69.61:9003
10.31.186.235:49001
10.81.88.5:9001
10.134.79.249:443
10.236.199.173:443
10.116.76.140:9001

There are two bugs:

  • Why was 10.236.199.173:443 (2nd line in twitter-2011-03-09.brdgs) removed from this file bucket and put back in the unallocated ring again (last but one line in unallocated-2011-03-09.brdgs)? I confirmed that this is the same bridge using the new pool assignments patch that is not merged yet. This is the first bug described above.

  • Why are IP:port lines appended to these files? This is the second bug described above.