Sanitize TCP ports in bridge descriptors
We should consider sanitizing TCP ports in bridge descriptors. Let's add a new sanitizing step between 3 and 4 here:
https://collector.torproject.org/#bridge-descriptors
- Replace TCP port with TCP port hash: It may be less obvious that TCP ports need to be sanitized, but an unusual TCP port used by a high-value bridge might still stand out and provide yet another way to locate and block the bridge.
- Each non-zero TCP port is replaced with
H(port | bridge identity | secret)[:2] % 65535 + 1
written as decimal number. The inputport
is the 2-byte long binary representation of the TCP port. Thebridge identity
is the 20-byte long binary representation of the bridge's long-term identity fingerprint. Thesecret
is a 33-byte long secure random string that changes once per month for all descriptors and statuses published in that month.H()
is SHA-256. The[:2]
operator means that we pick the 2 most significant bytes of the result. TCP ports that are 0 in the original descriptor are left unchanged.
In order to make this change we'll need to write and test the code and re-process all bridge descriptors since 2008. The last part is going to take at least a week, maybe longer.