Teach bridgedb how to handle descriptors with IPv6 addresses

added component::circumvention/bridgedb ipv6 owner::aagbsn parent::4563 priority::medium resolution::fixed status::closed type::enhancement labels

I just looked at Bridges.py and ran a quick test to see what BridgeDB does when giving it a bridge descriptor or bridge network status as described in proposal 186. The result is that BridgeDB simply ignores the "or-address" and "a" lines as described in proposal 186. BridgeDB continues to give out the bridge's IPv4 address.

That's good news in that we can deploy the proposal 186 changes without worrying about BridgeDB.

But of course, if we want BridgeDB to give out IPv6 addresses, we'll have do what the ticket subject says.

Trac:
Type: task to enhancement

Trac:
Owner: N/A to ln5
Status: new to accepted

Trac:
Parent: N/A to #3563 (moved)

progress update:

code: https://github.com/aagbsn/bridgedb/tree/4297-ipv6-bridges

live examples, (populated with fake bridges): https://tor.extc.org https://tor.extc.org/ipv6 note that address/port combinations returned by bridgedb are selected from the or-address lines of the bridge. The randomly generated examples are fully populated (8 or-address lines, 16 portspec entries), so the bridge lines appear random but actually aren't. Perhaps this is the wrong approach. Comments? remaining: ipv6 FILE_BUCKETS, ipv6 client connections (untested), better tests for class PortList

blocking: sanity check/code review. Can we allocate a VM for soft-launch/testing?

Trac:
Status: accepted to needs_review

Can we allocate a VM for soft-launch/testing? ponticum.tpo (thanks Peter!)

karsten, aagbsn: do you know the current status of #4297 (moved) [12:57] ln5: when I last tested it, it gave out ipv6 bridges via https, but not via email. [12:58] also, it took the ipv6 addresses from or-address lines, not from a lines (which are not contained in statuses), [12:59] karsten: can i quote you in the ticket? and I think bridgedb didn't make ipv6 addresses persistent in its database. I might be wrong about the latter. sure

Trac:
Parent: #3563 (moved) to #4563 (moved)

Aaron, I think you should own this ticket rather than me. Please grab it if you agree.

Trac:
Cc: N/A to aagbsn

Trac:
Owner: ln5 to aagbsn
Status: needs_review to accepted

Is branch 4297-ipv6-bridges-rebased-2 of user/aagbsn/bridgedb.git the right thing to test?

Yes, that's the most recent work.

However, during the course of development for #5027 (moved) (continuing from #4097 (moved), and not in parallel) several bugs were found and fixed in the #5027 (moved) branch.

e.g.

  master
        \__4097-ipv6-bridges
                            \__5027-allocate-bridges-by-country

What needs to happen:

Cleanup/backport of fixes will need to occur if 4097 is to be deployed in advance of 5027. This was started; those -rebased* branches are work-in-progress.
Read ipv6 addresses from "a" lines, rather than or-address lines. I don't think there are any such 'a' lines in networkstatus-bridges or bridge-descriptors yet. Is that right?
Make ipv6 addresses persistent in BridgeDB's database. The one place where the Bridge address seems to matter is in Bucket.py. Presently BridgeDB does not store ipv6 addresses in its database; probably an oversight. One solution would be to add a new table in BridgeDB's database for or-addresses in order to accommodate variable-length or-addresses.Presently Bucket.dumpBridges() just writes an address:port on each line, and each line represents a single bridge. Bucket.dumpBridges() could be modified to write multiple lines per bridge. Will it be a problem that a single bridge may be represented by multiple lines without any indication that this is the case?

Replying to aagbsn:

However, during the course of development for #5027 (moved) (continuing from #4097 (moved), and not in parallel) several bugs were found and fixed in the #5027 (moved) branch.

Whoops, that should be #4297 (moved), not #4097 (moved) throughout

Replying to aagbsn:

Make ipv6 addresses persistent in BridgeDB's database. The one place where the Bridge address seems to matter is in Bucket.py. Presently BridgeDB does not store ipv6 addresses in its database; probably an oversight. One solution would be to add a new table in BridgeDB's database for or-addresses in order to accommodate variable-length or-addresses.Presently Bucket.dumpBridges() just writes an address:port on each line, and each line represents a single bridge. Bucket.dumpBridges() could be modified to write multiple lines per bridge. Will it be a problem that a single bridge may be represented by multiple lines without any indication that this is the case?

Storing IPv6 addresses in the database probably makes sense.

With respect to file buckets, hmm. We don't use file buckets right now, do we? That means we'll have to speculate how a hypothetical user would want the files to look like. I could imagine that a single line per bridge with "<ipv4 address:port>[ <ipv6 address:port>]*" would be a useful format, but I'm not even a hypothetical user. The single-line-per-bridge format also makes sense for #5482 (moved) where we're going to add stability and reachability information to file buckets, and those are per-bridge, too. What does arma think about this all?

Trac:
Cc: aagbsn to aagbsn, arma

Replying to karsten:

Replying to aagbsn:

Make ipv6 addresses persistent in BridgeDB's database. The one place where the Bridge address seems to matter is in Bucket.py. Presently BridgeDB does not store ipv6 addresses in its database; probably an oversight. One solution would be to add a new table in BridgeDB's database for or-addresses in order to accommodate variable-length or-addresses.Presently Bucket.dumpBridges() just writes an address:port on each line, and each line represents a single bridge. Bucket.dumpBridges() could be modified to write multiple lines per bridge. Will it be a problem that a single bridge may be represented by multiple lines without any indication that this is the case?

Storing IPv6 addresses in the database probably makes sense.

With respect to file buckets, hmm. We don't use file buckets right now, do we? That means we'll have to speculate how a hypothetical user would want the files to look like. I could imagine that a single line per bridge with "<ipv4 address:port>[ <ipv6 address:port>]*" would be a useful format, but I'm not even a hypothetical user. The single-line-per-bridge format also makes sense for #5482 (moved) where we're going to add stability and reachability information to file buckets, and those are per-bridge, too. What does arma think about this all?

What do we do with bridges that listen on multiple ports or multiple addresses? (Or both?) Do you mean, they should be on a single line? Do we want to give out all the listening addresses and ports to a single client? Doesn't that circumvent the whole point of having multiple addresses and ports per bridge?

We want to avoid a scenario where single bridge operator could represent a majority of bridges by listening on a few thousand ports. For that reason, BridgeDB does not treat each address:port as a bridge, but selects a valid address:port from the bridge returned by the bridge distributor (https, email). Perhaps we should do something similar here, and write a single line per bridge, along with stability and reachability information. Unfortunately, that information could be different for each address and the current implementation does not ensure that the same requesting (ip, email) will get the same address:port (Hmm. #5948 (moved) )

BridgeDB will also need a patch to support 'is blocked' status for each valid address (or even address:port, as a compact representation or in a database - 65535 ports * 4 bytes * 8 address lines could add up in a hurry) #5949 (moved)

Replying to aagbsn:

What do we do with bridges that listen on multiple ports or multiple addresses? (Or both?) Do you mean, they should be on a single line? Do we want to give out all the listening addresses and ports to a single client? Doesn't that circumvent the whole point of having multiple addresses and ports per bridge?

We're talking about buckets here, right? That means we export bridges in the unallocated ring to a file to be mailed to people distributing them somehow. I don't know if these people would prefer a single line per bridge with all addresses/ports or one line per address/port.

Note that the number of addresses/ports per bridge is limited. Proposal 186 says there can be at most additional 8 addresses times 16 ports. Linus' implementation only allows for 1 additional address with 1 port, AFAIK.

We want to avoid a scenario where single bridge operator could represent a majority of bridges by listening on a few thousand ports. For that reason, BridgeDB does not treat each address:port as a bridge, but selects a valid address:port from the bridge returned by the bridge distributor (https, email). Perhaps we should do something similar here, and write a single line per bridge, along with stability and reachability information.

Without knowing how bucket files will be used, I could imagine that selecting 1 IPv4 and 1 IPv6 address per bridge would be sufficient for most use cases.

Unfortunately, that information could be different for each address and the current implementation does not ensure that the same requesting (ip, email) will get the same address:port (Hmm. #5948 (moved) )

So, staying in the bucket case, two subsequent runs shouldn't include different addresses for the same bridge in the file. We could simply pick the first address for any given IP version or transport.

BridgeDB will also need a patch to support 'is blocked' status for each valid address (or even address:port, as a compact representation or in a database - 65535 ports * 4 bytes * 8 address lines could add up in a hurry) #5949 (moved)

I wouldn't worry too much about database size here. But you're right that blocking information should be at bridge:address:port detail. If that makes things too complex, BridgeDB could only look at the bridge or bridge:address part.

Replying to karsten:

Replying to aagbsn:

What do we do with bridges that listen on multiple ports or multiple addresses? (Or both?) Do you mean, they should be on a single line? Do we want to give out all the listening addresses and ports to a single client? Doesn't that circumvent the whole point of having multiple addresses and ports per bridge?

We're talking about buckets here, right? That means we export bridges in the unallocated ring to a file to be mailed to people distributing them somehow. I don't know if these people would prefer a single line per bridge with all addresses/ports or one line per address/port.

I believe they should get a list of lines that can be fed into a Tor client. Cut-n-paste, keep it simple.

Note that the number of addresses/ports per bridge is limited. Proposal 186 says there can be at most additional 8 addresses times 16 ports. Linus' implementation only allows for 1 additional address with 1 port, AFAIK.

Correct me if I'm wrong, but doesn't the spec provide for port ranges?

      or-address SP ADDRESS ":" PORTLIST NL

      ADDRESS = IP6ADDR | IP4ADDR
      IPV6ADDR = an ipv6 address, surrounded by square brackets.
      IPV4ADDR = an ipv4 address, represented as a dotted quad.
      PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
      PORTSPEC = PORT | PORT "-" PORT
      PORT = a number between 1 and 65535 inclusive.

BridgeDB #4297 (moved) supports port ranges, at any rate.

We want to avoid a scenario where single bridge operator could represent a majority of bridges by listening on a few thousand ports. For that reason, BridgeDB does not treat each address:port as a bridge, but selects a valid address:port from the bridge returned by the bridge distributor (https, email). Perhaps we should do something similar here, and write a single line per bridge, along with stability and reachability information.

Without knowing how bucket files will be used, I could imagine that selecting 1 IPv4 and 1 IPv6 address per bridge would be sufficient for most use cases.

Yes. Most bridges probably wont listen on multiple ports at first -- although it would be handy if the tor cloud images support multiple listening ports and/or addresses -- especially considering that more and more cloud providers offer ipv6 connectivity.

Unfortunately, that information could be different for each address and the current implementation does not ensure that the same requesting (ip, email) will get the same address:port (Hmm. #5948 (moved) )

So, staying in the bucket case, two subsequent runs shouldn't include different addresses for the same bridge in the file. We could simply pick the first address for any given IP version or transport.

That means that a bridge that gets (un)assigned to the bucket distributor will not utilize any of the additional addresses. Although, if the first address in the list gets blocked, it could be marked 'as blocked' and the next address in the list selected (if available).

BridgeDB will also need a patch to support 'is blocked' status for each valid address (or even address:port, as a compact representation or in a database - 65535 ports * 4 bytes * 8 address lines could add up in a hurry) #5949 (moved)

I wouldn't worry too much about database size here. But you're right that blocking information should be at bridge:address:port detail. If that makes things too complex, BridgeDB could only look at the bridge or bridge:address part.

BridgeDB presently only looks at the bridge, because bridges only had one address:port.

I don't think it's too complex, but this enhancement shouldn't block deployment of #4297 (moved) if it turns out to be harder than anticipated.

Replying to aagbsn:

I believe they should get a list of lines that can be fed into a Tor client. Cut-n-paste, keep it simple.

Makes sense.

Correct me if I'm wrong, but doesn't the spec provide for port ranges? {{{ or-address SP ADDRESS ":" PORTLIST NL

  ADDRESS = IP6ADDR | IP4ADDR
  IPV6ADDR = an ipv6 address, surrounded by square brackets.
  IPV4ADDR = an ipv4 address, represented as a dotted quad.
  PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
  PORTSPEC = PORT | PORT "-" PORT
  PORT = a number between 1 and 65535 inclusive.

}}}

That's an older version of the proposal/spec. The current dir-spec.txt doesn't allow the PORT "-" PORT part anymore.

Replying to karsten:

Replying to aagbsn:

I believe they should get a list of lines that can be fed into a Tor client. Cut-n-paste, keep it simple.

Makes sense.
Correct me if I'm wrong, but doesn't the spec provide for port ranges? {{{ or-address SP ADDRESS ":" PORTLIST NL
  ADDRESS = IP6ADDR | IP4ADDR
  IPV6ADDR = an ipv6 address, surrounded by square brackets.
  IPV4ADDR = an ipv4 address, represented as a dotted quad.
  PORTLIST = PORTSPEC | PORTSPEC "," PORTLIST
  PORTSPEC = PORT | PORT "-" PORT
  PORT = a number between 1 and 65535 inclusive.
}}}
That's an older version of the proposal/spec. The current dir-spec.txt doesn't allow the PORT "-" PORT part anymore.

Oh dear. I thought that was a nice feature and fun to implement. Is it likely to come back in the future?

Replying to aagbsn:

Replying to karsten:

That's an older version of the proposal/spec. The current dir-spec.txt doesn't allow the PORT "-" PORT part anymore.

Oh dear. I thought that was a nice feature and fun to implement. Is it likely to come back in the future?

Probably not. See Nick's commit where he took port ranges out.

Trac:
Resolution: N/A to fixed
Status: accepted to closed

closed

mentioned in issue #4771 (moved)

mentioned in issue #5949 (moved)

mentioned in issue #6126 (moved)

mentioned in issue #12505 (moved)

mentioned in issue #15517 (moved)

moved to tpo/anti-censorship/bridgedb#4297 (closed)

mentioned in issue tpo/anti-censorship/bridgedb#15517 (closed)

Teach bridgedb how to handle descriptors with IPv6 addresses

Child items ...

Activity