Right now, bridges can decide whether they want to be a public bridge that gets distributed via BridgeDB or a private bridge that only gets used by clients who learn its address via some other, private channel. The default is that a bridge is a public bridges, unless it sets PublishServerDescriptor 0 in its torrc file. This works fine with respect to BridgeDB not distributing private bridges. But a lesser known problem is that a bridge that doesn't publish its descriptor also does not contribute to bridge usage statistics on Metrics that are based on bridge extra-info descriptors.
The major use case that comes to mind is a bundled bridge whose address is shipped together with Tor Browser or another application. In the past we tried to remind operators of these bridges to also publish descriptors, so that their statistics are included on Metrics. But it turns out that some censors, who carefully scrape bridge addresses from BridgeDB, do not extract bridge addresses from the various bundles. Still, bundled bridges see a large number of bridge users and we should really include them in the statistics.
Another use case could be private bridges that somebody sets up for themselves and their friends. Maybe these operators would be fine contributing to the statistics if that doesn't automatically mean they need to share their bridge with other users.
I think this feature is relatively easy to build. We would need:
a new descriptor line "bridgedb off", or something even more intuitive and extensible, that tells BridgeDB that this bridge's address should not be distributed,
a new torrc option or extension of an existing option, maybe "PublishServerDescriptor bridge-auth" or, again, something more intuitive, to include the line above in the descriptor, and
an extension of BridgeDB to ignore bridges with this line when parsing descriptors.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I completely forgot that we had a discussion on this topic back in 2014.
https://lists.torproject.org/pipermail/tor-dev/2014-October/007610.html
I'm trying to remember what caused me to bring it up. At the time, we were in the middle of active probing research, and we definitely didn't want the public connecting to our honeypot bridges, so we set PublishServerDescriptor 0. I don't know why I would have cared about metrics for that experiment, though. Maybe it was some other reason.
The use case for this time is pretty narrow: testing whether censors scrape default bridges from the bundles, and how long it takes. The hypothesis that if a censor can scrape BridgeDB, then it can scrape bundles (therefore there is no reason to keep bundle bridges out of BridgeDB) is a reasonable one, but we can't assume it's true while running an experiment to test it. (It also would not surprise me if it is false: think of the times you unnecessarily elaborate engineering because you didn't adequately understand the problem; censors are not all world-class and they have these constraints too.) For this use case, I think the workaround of setting PublishServerDescriptor 1 while firewalling closed the ORPort is good enough.
My position is: all the Tor Browser default bridges should publish statistics, or else we are greatly undercounting users (the same goes for any bridge with a lot of users). Whether the Tor Browser default bridges also appear in BridgeDB is mostly immaterial, except in special cases like we're doing now. There are "default bridges" other than those in Tor Browser, for example those in Orbot; they don't have to be all the same or get burned at the same time. If we require all default bridges to be present in BridgeDB, and a censor is capable of scraping BridgeDB, then BridgeDB is a common weak link that will eventually get all the bridges blocked; whereas if they are not in BridgeDB, then it's as if we have different pools of bridges, the same way BridgeDB has smtp, http, etc. pools where the compromise of one doesn't lead to the compromise of another.
There's a new BridgeDistribution torrc option, and it passes along its argument into the new bridge-distribution-request line in the bridge descriptor.
I've left it deliberately opaque what the syntax is for the string -- the man page suggests that "none" is a good way to tell bridgedb you don't want it to give out your bridge address.
I've tested it on a running bridge and examined the resulting bridge descriptor, and it seems to work as intended.
Also, I think that promising some the semantics for "none" would be smart.
And we shouldn't actually merge this IMO until we get BridgeDB to respect it. Otherwise we'll be telling people that they can do something that won't actually work yet.
This should have a Stem patch to get BridgeDB to easily be able to respect the new line. Otherwise, IIRC, there's an "unrecogised_fields" dict for descriptors parsed with Stem, and I think currently this would be ending up in there.
This should have a Stem patch to get BridgeDB to easily be able to respect the new line. Otherwise, IIRC, there's an "unrecogised_fields" dict for descriptors parsed with Stem, and I think currently this would be ending up in there.
We are about to freeze 030 and the patch from arma looks good to me. I agree with Nick that once BridgeDB has that capability (and documented!), we should merge this.
In the meantime, I'm moving this to 031 because I doubt very much that all the things will lined up by tomorrow. Also, here is an attempt at the spec patch: ticket18329_01
Trac: Status: needs_revision to needs_review Milestone: Tor: 0.3.0.x-final to Tor: 0.3.1.x-final
Hey, minor feedback: can we have the "info" be a uint_8 or a uint_16 instead of an arbitrary string? We're not really expecting there to be more than three or four distribution methods, plus "none", so "none" could just be "BridgeDistribution 0" and we can assign arbitrary integers to the others, like "1" means "distribute through however you want", 2" means "distribute through email", "3" means "distribute through https", and etc. for future distributors. Then the default is just "BridgeDistribution 1" and we tell people who want to turn it off to set it to 0, and all the other wingnuttier uses can be found if people RTM.
Hm, you meant in the torrc too though? For that, usability does matter, though I may be forgetting the lesson of the bikeshed.
Yes, I meant in the torrc too, mostly under the assumption that most people that most bridge operators won't need to touch it. I'm no UX expert though, so I can default to whatever the results of the bikeshed were.
In my role as bikeshed factotum, I'd suggest that we just go with names here, with Tor restricted to only accept recognized names if they appear in torrc.
Looks like we're trying to make this ticket more complicated than it is.
The feature18329 branch is almost entirely about the new torrc option, so yes, it is all user-facing. I think we should stick with words, like "none", "email", "https", which makes it easily extensible to more values, without messing with Tor at all, if the future brings us new distribution strategies.
(If we used numbers instead, we would need to let the user configure a number that this Tor version doesn't know about, so people can choose other distribution approaches later. But then you'd need to keep everybody synchronized on what the numbers mean, and people could still list numbers that you haven't picked a definition for yet. tl;dr, I don't think it would solve much to use numbers, and I think it would indeed hurt the usability.)
The BridgeDB behavior can be very simple: if the field is there and bridgedb recognizes the value in the field, do the right thing with it, else if the field is there and bridgedb doesn't recognize it, don't distribute that bridge.
I agree with Damian that we could be a bit clearer on the torspec side. I think what dgoulet was going for was the same spec as the Contact field has. From Tor's perspective, it is essentially an "anything goes" string. We could change it to say "set of 'key and optional value' fields" or something if we liked, but I think the only effect of trying to constrain it here will be producing bugs in stem later if we decide to change it. Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.
Nick, what did you mean by "with Tor restricted to only accept recognized names if they appear in torrc"? You're hoping that we teach Tor more about the possible values of the field, and that it says something like "nope, never heard of that one, you can't set it" for ones it doesn't know about? Does that mean that people running Tor on (say) version 0.3.1.x will need to upgrade in order to list a name that we started using while (say) Tor 0.3.2.x was current? Why would we do that?
Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.
Hi Roger, the contact and platform fields would be particularly bad ones to emulate. They're the only fields in the spec that allow arbitrary bytes (they don't even need to be any form of recognizable text... and due to that in practice sometimes aren't).
It was a mistake to allow the contact and platform fields to be so nebulous. Hopefully we won't do that with any new ones. ;)
I'm strongly considering a proposal where any document we produce must be valid utf-8. This has a pretty small fallout currently (only 2 relays' descriptors would be rejected). I think we should just fix that mistake once and for all.
I'm strongly considering a proposal where any document we produce must be valid utf-8. This has a pretty small fallout currently (only 2 relays' descriptors would be rejected). I think we should just fix that mistake once and for all.
I don't think it would solve much to use numbers, and I think it would indeed hurt the usability.
That's fair; you've got me convinced that the numbers thing would be bad usability-wise.
However, I'm still against "any bytes, any length, yolo."
The BridgeDB behavior can be very simple: if the field is there and bridgedb recognizes the value in the field, do the right thing with it, else if the field is there and bridgedb doesn't recognize it, don't distribute that bridge.
Why would we waste bandwidth passing around descriptors with junk in them? Why not just filter the junk during descriptor creation?
Nick, what did you mean by "with Tor restricted to only accept recognized names if they appear in torrc"? You're hoping that we teach Tor more about the possible values of the field, and that it says something like "nope, never heard of that one, you can't set it" for ones it doesn't know about? Does that mean that people running Tor on (say) version 0.3.1.x will need to upgrade in order to list a name that we started using while (say) Tor 0.3.2.x was current? Why would we do that?
Not to speak for Nick, but I think he meant just as you described: tor will say, "If you try to set a value which isn't in the list of things I recognise, I'm going to not include it in your descriptor."
To make this simpler moving forward, the whitelist should currently include the following recognised values: none, any, https, email, moat, and hyphae. The default is any.
This covers all current distributors, and the two which are planned for this year. (moat is the new distributor which sends Tor Launcher a CAPTCHA via a meek channel, and essentially acts exactly like the https distributor, but in XUL rather than HTML, in Tor Launcher. hyphae is the social distributor mentioned in legacy/trac#7520 (moved) and detailed here.) If we need to add a new distributor in the future — which I don't expect — you're correct that we'll need to get it added to tor's distributor whitelist. This is solvable by opening a ticket to get the new name added to the list, before starting work to build the distributor; this way, the name is likely recognised by the time the distributor is deployed.
I agree with Damian that we could be a bit clearer on the torspec side. I think what dgoulet was going for was the same spec as the Contact field has. From Tor's perspective, it is essentially an "anything goes" string. We could change it to say "set of 'key and optional value' fields" or something if we liked, but I think the only effect of trying to constrain it here will be producing bugs in stem later if we decide to change it. Damian, what would you like to see the spec for the Contact field look like? Then we can use that here too.
Here's draft text describing the torrc option:
"bridge-distribution-request" SP method NL [At most once] Each "method" describes how a Bridge address is distributed by BridgeDB. Recognized methods are: "none", "any", "https", "email", "moat", "hyphae". If set to "none", BridgeDB will avoid distributing your bridge address. If set to "any", BridgeDB will choose how to distribute your bridge address. Choosing any of the other methods will tell BridgeDB to distribute your bridge via a specific method: * "https" specifies distribution via the web interface at https://bridges.torproject.org; * "email" specifies distribution via the email autoresponder at bridges@torproject.org; * "moat" specifies distribution via an interactive menu inside Tor Browser; and * "hyphae" specifies distribution via a cryptographically-secure, invitation-based system. (Default: "any")
Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)
Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)
Hi Isis, looks good except that we should specify what characters future methods can consist of. For instance, can future method names have spaces in it?
Each "method" describes how...
Use of the word 'each' here confuses me since the field can only appear once, and according to this can have only one value on this line.
There's a new BridgeDistribution torrc option, and it passes along its argument into the new bridge-distribution-request line in the bridge descriptor.
Do we care about potentially having different distribution settings for the potentially multiple transports that may be defined in torrc? If someone has both ServerTransportPlugin obfs3 and ServerTransportPlugin obfs4, say, there would be no way to give them different BridgeDistribution settings.
By transport name would be the finest possible granularity. It would be cool if you could run multiple instances of obfs4 with different BridgeDistribution options, but you can't actually express multiple instances of the same transport name in torrc (see legacy/trac#11211 (moved)). So if someone wanted to do that, they would have to run multiple instances of tor with multiple fingerprints anyway.
Damian, does that look okay from Stem's point of view? (Also, does this text make enough sense to the type of bridge operator who wants to micromanage their bridge?)
Hi Isis, looks good except that we should specify what characters future methods can consist of. For instance, can future method names have spaces in it?
Ah, good point. I'll specify printable ascii, no whitespace, plus - and _. So (KeywordChar | _)+ (and we'll optionally lowercase the letters input from the torrc). Does that sound okay?
Each "method" describes how...
Use of the word 'each' here confuses me since the field can only appear once, and according to this can have only one value on this line.
You're right, that should be taken out. (I first started writing something where the user could specify multiple comma-separated methods, e.g. email,https would choose email or https, but the logic for this is complicated and confusing, so I revised to allow just one "method".)
Here's the new draft:
"bridge-distribution-request" SP Method NL [At most once] The "Method" describes how a Bridge address is distributed by BridgeDB. Recognized methods are: "none", "any", "https", "email", "moat", "hyphae". If set to "none", BridgeDB will avoid distributing your bridge address. If set to "any", BridgeDB will choose how to distribute your bridge address. Choosing any of the other methods will tell BridgeDB to distribute your bridge via a specific method: - "https" specifies distribution via the web interface at https://bridges.torproject.org; - "email" specifies distribution via the email autoresponder at bridges@torproject.org; - "moat" specifies distribution via an interactive menu inside Tor Browser; and - "hyphae" specifies distribution via a cryptographically-secure, invitation-based system. Potential future "Method" specifiers must be as follows: Method = (KeywordChar | "_") + (Default: "any")
In my branch feature18329_029, I've rebased Roger's branch onto 0.2.9, and added some additional checks and documentation.
Most notably, I've made it so a bridge will always insert this kind of entry in its descriptor, whether or not the user has changed it from the default.
Trac: Status: needs_review to accepted Owner: N/Ato nickm
Oh shoot. I also wrote a patch for this last week, but didn't update the ticket.
I'll gladly review and then see if there's anything left from my patch that's useful (maybe unittests are still a useful addition). If so, I'll pick those parts out and put them in a new commit on top.
BTW, this also makes it trivial to fix legacy/trac#16564 (moved), since DirAuths can check for a "bridge-distribution" line in descriptors submitted to them and not include in the consensus those which do have it.