Export BridgeDB's pool assignments
Following the (actually newly introduced) tradition of summarizing IRC discussions in Trac or email, here's what Roger and I discussed today:
I'm interested in learning whether keeping a certain fraction of bridges unassigned, that is not distributing them via email or HTTP, is a good idea. AIUI, the idea was to have a small set of fresh bridges in case we come up with a new distribution channel or want to give out fresh bridges manually. This idea might fail if people who run a bridge that ends up in the unallocated pool decide that their bridge is not being useful. They might turn off their bridge or delete their keys in order to get a new fingerprint and end up in another pool. If many people do so, we might better allocate all bridges to pools directly and start a new pool whenever there's a new distribution channel. Given the high churn of bridges, we might have a sufficient set of fresh bridges in that pool very soon. Also, if we want to give out bridges manually, we might give out bridges from the other pools which may have higher uptime than bridges in the unallocated pool. Allocating all bridges also means we don't have to explain to bridge operators why their bridge is also useful even if it doesn't have any users right now.
So, we need to export pool assignments from BridgeDB somehow. Currently, we have log files of the following format:
Jan 10 01:41:14 [DEBUG] Leaving bridge 22.214.171.124:443 dddddddddddddddddddddddddddddddddddddddd unallocated Jan 10 01:41:14 [DEBUG] Adding bridge 126.96.36.199:443 eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee to IP ring 1 (port-443 subring) Jan 10 01:41:14 [DEBUG] Adding bridge 188.8.131.52:443 eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee to IP ring 1 (stable subring) Jan 10 01:41:14 [DEBUG] Adding bridge 184.108.40.206:443 eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee to IP ring 1
If we want to analyze bridge pool assignments we need a better data format than this log format. Here's a proposed data format for bridge pool assignments (with sanitized IP addresses and fingerprints):
bridge-pool-assignment 2011-01-10 01:41:14 b 127.0.0.1:443 abcdef0123456789abcdef0123456789abcdef01 b 127.0.0.1:443 0123456789abcdef0123456789abcdef01234567 s IP ring 1 (port-443 subring) s IP ring 1 (stable subring) s IP ring 1
The timestamp in the bridge-pool-assignment line is the time when the assignment is written to disk (twice an hour). Lines starting with b contain IP address, port, and fingerprint of a bridge. For sanitizing purposes, I replaced bridge IP addresses with 127.0.0.1 and bridge identities with their SHA-1 hashes. That's the same approach that we take for sanitizing bridge descriptors. Lines starting with s contain the rings or subrings that a bridge is allocated to.
Possible questions that I'm trying to answer with these data are:
Do bridges ever switch pools?
Is bridge uptime affected by the pool assignment?
Are there reasons not to publish the sanitized versions of these bridge pool assignments? Are any sensitive data left that we need to remove?
Can we change the BridgeDB code to export its pool assignment in this format (without the sanitizing which I would do on a metrics machine)?