Let bridges indicate when they don't want BridgeDB to distribute their address

changed milestone to %Tor: 0.3.1.x-final in legacy/trac

added component::core tor/tor in Legacy / Trac milestone::Tor: 0.3.1.x-final in Legacy / Trac owner::nickm in Legacy / Trac points::.2 remaining in Legacy / Trac priority::medium in Legacy / Trac resolution::implemented in Legacy / Trac reviewer::isis in Legacy / Trac severity::normal in Legacy / Trac sponsor::M in Legacy / Trac status::closed in Legacy / Trac stem in Legacy / Trac tor-bridge in Legacy / Trac type::enhancement in Legacy / Trac labels

Definitely related to, and possibly a duplicate of legacy/trac#13727 (moved) and legacy/trac#13504 (moved).

I completely forgot that we had a discussion on this topic back in 2014. https://lists.torproject.org/pipermail/tor-dev/2014-October/007610.html I'm trying to remember what caused me to bring it up. At the time, we were in the middle of active probing research, and we definitely didn't want the public connecting to our honeypot bridges, so we set PublishServerDescriptor 0. I don't know why I would have cared about metrics for that experiment, though. Maybe it was some other reason.

Somewhat related are legacy/trac#7349 (moved) and https://lists.torproject.org/pipermail/tor-dev/2014-December/008028.html. A bridge might only want at PT port exposed through BridgeDB, because the presence of an easily detectable ORPort connection could blow the cover of, say, obfs4 on the same IP address.

The use case for this time is pretty narrow: testing whether censors scrape default bridges from the bundles, and how long it takes. The hypothesis that if a censor can scrape BridgeDB, then it can scrape bundles (therefore there is no reason to keep bundle bridges out of BridgeDB) is a reasonable one, but we can't assume it's true while running an experiment to test it. (It also would not surprise me if it is false: think of the times you unnecessarily elaborate engineering because you didn't adequately understand the problem; censors are not all world-class and they have these constraints too.) For this use case, I think the workaround of setting PublishServerDescriptor 1 while firewalling closed the ORPort is good enough.

My position is: all the Tor Browser default bridges should publish statistics, or else we are greatly undercounting users (the same goes for any bridge with a lot of users). Whether the Tor Browser default bridges also appear in BridgeDB is mostly immaterial, except in special cases like we're doing now. There are "default bridges" other than those in Tor Browser, for example those in Orbot; they don't have to be all the same or get burned at the same time. If we require all default bridges to be present in BridgeDB, and a censor is capable of scraping BridgeDB, then BridgeDB is a common weak link that will eventually get all the bridges blocked; whereas if they are not in BridgeDB, then it's as if we have different pools of bridges, the same way BridgeDB has smtp, http, etc. pools where the compromise of one doesn't lead to the compromise of another.

Trac:
Cc: asn, dcf, isis to asn, dcf, isis, mrphs

Trac:
Milestone: N/A to Tor: 0.3.0.x-final
Reviewer: N/A to N/A

My feature18329 branch implements this feature.

There's a new BridgeDistribution torrc option, and it passes along its argument into the new bridge-distribution-request line in the bridge descriptor.

I've left it deliberately opaque what the syntax is for the string -- the man page suggests that "none" is a good way to tell bridgedb you don't want it to give out your bridge address.

I've tested it on a running bridge and examined the resulting bridge descriptor, and it seems to work as intended.

Trac:
Status: new to needs_review

Trac:
Keywords: N/A deleted, review-group-12 added

Looks fine; needs a spec patch though.

Also, I think that promising some the semantics for "none" would be smart.

And we shouldn't actually merge this IMO until we get BridgeDB to respect it. Otherwise we'll be telling people that they can do something that won't actually work yet.

Trac:
Status: needs_review to needs_revision

Trac:
Reviewer: N/A to nickm

Trac:
Keywords: review-group-12 deleted, N/A added

Trac:
Points: N/A to .2 remaining

This should have a Stem patch to get BridgeDB to easily be able to respect the new line. Otherwise, IIRC, there's an "unrecogised_fields" dict for descriptors parsed with Stem, and I think currently this would be ending up in there.

Trac:
Keywords: N/A deleted, stem added

Replying to isis:

This should have a Stem patch to get BridgeDB to easily be able to respect the new line. Otherwise, IIRC, there's an "unrecogised_fields" dict for descriptors parsed with Stem, and I think currently this would be ending up in there.

I've created legacy/trac#21177 (closed) for tracking this.

We are about to freeze 030 and the patch from arma looks good to me. I agree with Nick that once BridgeDB has that capability (and documented!), we should merge this.

In the meantime, I'm moving this to 031 because I doubt very much that all the things will lined up by tomorrow. Also, here is an attempt at the spec patch: ticket18329_01

Trac:
Status: needs_revision to needs_review
Milestone: Tor: 0.3.0.x-final to Tor: 0.3.1.x-final

Hi David. For what it's worth a link is a lot handier than just saying it. Here's the patch...

https://gitweb.torproject.org/user/dgoulet/torspec.git/commit/?h=ticket18329_01

First thought is that this needs to formally define what 'info' looks like. See the other fields for examples.

Trac:
Status: needs_review to needs_revision

Hey, minor feedback: can we have the "info" be a uint_8 or a uint_16 instead of an arbitrary string? We're not really expecting there to be more than three or four distribution methods, plus "none", so "none" could just be "BridgeDistribution 0" and we can assign arbitrary integers to the others, like "1" means "distribute through however you want", 2" means "distribute through email", "3" means "distribute through https", and etc. for future distributors. Then the default is just "BridgeDistribution 1" and we tell people who want to turn it off to set it to 0, and all the other wingnuttier uses can be found if people RTM.

Hm. We do other stuff by number (consensus methods, protocol versions), so I think there's a precedent for that. What's the rationale?

Hm, you meant in the torrc too though? For that, usability does matter, though I may be forgetting the lesson of the bikeshed.

Replying to nickm:

Hm, you meant in the torrc too though? For that, usability does matter, though I may be forgetting the lesson of the bikeshed.

Yes, I meant in the torrc too, mostly under the assumption that most people that most bridge operators won't need to touch it. I'm no UX expert though, so I can default to whatever the results of the bikeshed were.

In my role as bikeshed factotum, I'd suggest that we just go with names here, with Tor restricted to only accept recognized names if they appear in torrc.

Looks like we're trying to make this ticket more complicated than it is.

The feature18329 branch is almost entirely about the new torrc option, so yes, it is all user-facing. I think we should stick with words, like "none", "email", "https", which makes it easily extensible to more values, without messing with Tor at all, if the future brings us new distribution strategies.

(If we used numbers instead, we would need to let the user configure a number that this Tor version doesn't know about, so people can choose other distribution approaches later. But then you'd need to keep everybody synchronized on what the numbers mean, and people could still list numbers that you haven't picked a definition for yet. tl;dr, I don't think it would solve much to use numbers, and I think it would indeed hurt the usability.)

The BridgeDB behavior can be very simple: if the field is there and bridgedb recognizes the value in the field, do the right thing with it, else if the field is there and bridgedb doesn't recognize it, don't distribute that bridge.

I agree with Damian that we could be a bit clearer on the torspec side. I think what dgoulet was going for was the same spec as the Contact field has. From Tor's perspective, it is essentially an "anything goes" string. We could change it to say "set of 'key and optional value' fields" or something if we liked, but I think the only effect of trying to constrain it here will be producing bugs in stem later if we decide to change it. Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.

Nick, what did you mean by "with Tor restricted to only accept recognized names if they appear in torrc"? You're hoping that we teach Tor more about the possible values of the field, and that it says something like "nope, never heard of that one, you can't set it" for ones it doesn't know about? Does that mean that people running Tor on (say) version 0.3.1.x will need to upgrade in order to list a name that we started using while (say) Tor 0.3.2.x was current? Why would we do that?

Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.

Hi Roger, the contact and platform fields would be particularly bad ones to emulate. They're the only fields in the spec that allow arbitrary bytes (they don't even need to be any form of recognizable text... and due to that in practice sometimes aren't).

It was a mistake to allow the contact and platform fields to be so nebulous. Hopefully we won't do that with any new ones. ;)

Replying to atagar:

It was a mistake to allow the contact and platform fields to be so nebulous. Hopefully we won't do that with any new ones. ;)

What was the mistake?

The mistake was letting these fields be so ill defined that they can be anything. Even garbage that is completely nonsensical and unparseable.

I'm strongly considering a proposal where any document we produce must be valid utf-8. This has a pretty small fallout currently (only 2 relays' descriptors would be rejected). I think we should just fix that mistake once and for all.

I'd love that. Definitely has my vote! :P

Replying to Sebastian:

I'm strongly considering a proposal where any document we produce must be valid utf-8. This has a pretty small fallout currently (only 2 relays' descriptors would be rejected). I think we should just fix that mistake once and for all.

+1

The character set stuff should probably move to legacy/trac#18938 (moved)?

Trac:
Cc: asn, dcf, isis, mrphs to asn, dcf, isis, mrphs, catalyst

Replying to arma:

[…]

I don't think it would solve much to use numbers, and I think it would indeed hurt the usability.

That's fair; you've got me convinced that the numbers thing would be bad usability-wise.

However, I'm still against "any bytes, any length, yolo."

The BridgeDB behavior can be very simple: if the field is there and bridgedb recognizes the value in the field, do the right thing with it, else if the field is there and bridgedb doesn't recognize it, don't distribute that bridge.

Why would we waste bandwidth passing around descriptors with junk in them? Why not just filter the junk during descriptor creation?

Nick, what did you mean by "with Tor restricted to only accept recognized names if they appear in torrc"? You're hoping that we teach Tor more about the possible values of the field, and that it says something like "nope, never heard of that one, you can't set it" for ones it doesn't know about? Does that mean that people running Tor on (say) version 0.3.1.x will need to upgrade in order to list a name that we started using while (say) Tor 0.3.2.x was current? Why would we do that?

Not to speak for Nick, but I think he meant just as you described: tor will say, "If you try to set a value which isn't in the list of things I recognise, I'm going to not include it in your descriptor."

To make this simpler moving forward, the whitelist should currently include the following recognised values: none, any, https, email, moat, and hyphae. The default is any.

This covers all current distributors, and the two which are planned for this year. (moat is the new distributor which sends Tor Launcher a CAPTCHA via a meek channel, and essentially acts exactly like the https distributor, but in XUL rather than HTML, in Tor Launcher. hyphae is the social distributor mentioned in legacy/trac#7520 (moved) and detailed here.) If we need to add a new distributor in the future — which I don't expect — you're correct that we'll need to get it added to tor's distributor whitelist. This is solvable by opening a ticket to get the new name added to the list, before starting work to build the distributor; this way, the name is likely recognised by the time the distributor is deployed.

I agree with Damian that we could be a bit clearer on the torspec side. I think what dgoulet was going for was the same spec as the Contact field has. From Tor's perspective, it is essentially an "anything goes" string. We could change it to say "set of 'key and optional value' fields" or something if we liked, but I think the only effect of trying to constrain it here will be producing bugs in stem later if we decide to change it. Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.

Here's draft text describing the torrc option:

"bridge-distribution-request" SP method NL

   [At most once]

   Each "method" describes how a Bridge address is distributed by BridgeDB. 
   Recognized methods are: "none", "any", "https", "email", "moat", "hyphae".
   If set to "none", BridgeDB will avoid distributing your bridge address.
   If set to "any", BridgeDB will choose how to distribute your bridge
   address. Choosing any of the other methods will tell BridgeDB to distribute
   your bridge via a specific method:

      * "https" specifies distribution via the web interface at
         https://bridges.torproject.org;
      * "email" specifies distribution via the email autoresponder at
        bridges@torproject.org;
      * "moat" specifies distribution via an interactive menu inside Tor
        Browser; and
      * "hyphae" specifies distribution via a cryptographically-secure,
        invitation-based system.

   (Default: "any")

Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)

Trac:
Cc: asn, dcf, isis, mrphs, catalyst to asn, dcf, isis, mrphs

Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)

Hi Isis, looks good except that we should specify what characters future methods can consist of. For instance, can future method names have spaces in it?

Each "method" describes how...

Use of the word 'each' here confuses me since the field can only appear once, and according to this can have only one value on this line.

(Oops, I accidentally removed catalyst! Sorry!)

Trac:
Cc: asn, dcf, isis, mrphs to asn, dcf, isis, mrphs, catalyst

Replying to arma:

My feature18329 branch implements this feature.

There's a new BridgeDistribution torrc option, and it passes along its argument into the new bridge-distribution-request line in the bridge descriptor.

https://gitweb.torproject.org/arma/tor.git/commit/?h=feature18329&id=da46e73142bd522f8ac7dfc9d1f113c6281aea85

Do we care about potentially having different distribution settings for the potentially multiple transports that may be defined in torrc? If someone has both ServerTransportPlugin obfs3 and ServerTransportPlugin obfs4, say, there would be no way to give them different BridgeDistribution settings.

Currently some torrc options are keyed on transport name (e.g. ServerTransportPlugin, ServerTransportListenAddr, ServerTransportOptions). Ought BridgeDistribution to be like that too, or is it too niche a use case?

By transport name would be the finest possible granularity. It would be cool if you could run multiple instances of obfs4 with different BridgeDistribution options, but you can't actually express multiple instances of the same transport name in torrc (see legacy/trac#11211 (moved)). So if someone wanted to do that, they would have to run multiple instances of tor with multiple fingerprints anyway.

Trac:
Cc: asn, dcf, isis, mrphs, catalyst to asn, dcf, isis, mrphs

Replying to atagar:

Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)

Hi Isis, looks good except that we should specify what characters future methods can consist of. For instance, can future method names have spaces in it?

Ah, good point. I'll specify printable ascii, no whitespace, plus - and _. So (KeywordChar | _)+ (and we'll optionally lowercase the letters input from the torrc). Does that sound okay?

Each "method" describes how...

Use of the word 'each' here confuses me since the field can only appear once, and according to this can have only one value on this line.

You're right, that should be taken out. (I first started writing something where the user could specify multiple comma-separated methods, e.g. email,https would choose email or https, but the logic for this is complicated and confusing, so I revised to allow just one "method".)

Here's the new draft:

    "bridge-distribution-request" SP Method NL                                  
                                                                                
        [At most once]                                                          
                                                                                
        The "Method" describes how a Bridge address is distributed by           
        BridgeDB. Recognized methods are: "none", "any", "https", "email",      
        "moat", "hyphae". If set to "none", BridgeDB will avoid distributing    
        your bridge address. If set to "any", BridgeDB will choose how to       
        distribute your bridge address. Choosing any of the other methods will  
        tell BridgeDB to distribute your bridge via a specific method:          
                                                                                
          - "https" specifies distribution via the web interface at             
             https://bridges.torproject.org;                                    
          - "email" specifies distribution via the email autoresponder at       
            bridges@torproject.org;                                             
          - "moat" specifies distribution via an interactive menu inside Tor    
            Browser; and                                                        
          - "hyphae" specifies distribution via a cryptographically-secure,     
            invitation-based system.                                            
                                                                                
        Potential future "Method" specifiers must be as follows:                
            Method = (KeywordChar | "_") +                                      
                                                                                
        (Default: "any")

Thanks Isis, looks good to me.

Replying to atagar:

Thanks Isis, looks good to me.

Okay, thanks for reviewing! The torspec patch is in my ticket18329_02 branch (based on David's branch from above).

So the next step is to go over Roger's patch again and see what needs tweaking.

Trac:
Milestone: Tor: 0.3.1.x-final to Tor: 0.3.2.x-final

Defer all needs_revision non-spec enhancements to 0.3.3.

Trac:
Milestone: Tor: 0.3.2.x-final to Tor: 0.3.3.x-final

Replying to isis:

Replying to atagar:

Thanks Isis, looks good to me.

Okay, thanks for reviewing! The torspec patch is in my ticket18329_02 branch (based on David's branch from above).

Hey nickm, do you think you could merge the spec patch so that atagar (or someone) can start on legacy/trac#21177 (closed)? Thanks!

Trac:
Status: needs_revision to needs_review
Milestone: Tor: 0.3.3.x-final to Tor: 0.3.2.x-final

Merged the spec patch, with an addendum about implementation status. Please let me know if it needs changes.

review-group-24 is now open.

Trac:
Keywords: N/A deleted, review-group-24 added

In my branch feature18329_029, I've rebased Roger's branch onto 0.2.9, and added some additional checks and documentation.

Most notably, I've made it so a bridge will always insert this kind of entry in its descriptor, whether or not the user has changed it from the default.

Trac:
Status: needs_review to accepted
Owner: N/A to nickm

Trac:
Status: accepted to needs_review

Trac:
Sponsor: N/A to SponsorM

Trac:
Reviewer: nickm to N/A

Oh shoot. I also wrote a patch for this last week, but didn't update the ticket.

I'll gladly review and then see if there's anything left from my patch that's useful (maybe unittests are still a useful addition). If so, I'll pick those parts out and put them in a new commit on top.

Trac:
Reviewer: N/A to isis

BTW, this also makes it trivial to fix legacy/trac#16564 (moved), since DirAuths can check for a "bridge-distribution" line in descriptors submitted to them and not include in the consensus those which do have it.

Great! Also feel free to declare that your patch is just plain better, if you think that's the right call.

Let bridges indicate when they don't want BridgeDB to distribute their address

Child items 0

Activity