Proposal: Support for Dynamic IP obfs4 bridges with unattended proxy information update(aka "Subscription")

changed the description

changed milestone to %Sponsor 96: Rapid Expansion of Access to the Uncensored Internet through Tor in China, Hong Kong, & Tibet

added Backlog Sponsor 96 labels

Yes! This is a great plan.

It was actually part of the original vision for Tor bridges, from back in 2006. The original feature still works that you can look up the descriptor of your bridge (with its current IP address) by querying the bridge authority with the bridge's fingerprint. The idea was that as long as you still have one working bridge, you can use that to reach the bridge authority and refresh your knowledge about your other bridges even if they have moved to new addresses:
https://svn-archive.torproject.org/svn/projects/design-paper/blocking.html#subsec:relay-together

This is what the UpdateBridgesFromAuthority torrc option is all about:
https://gitweb.torproject.org/torspec.git/tree/attic/bridges-spec.txt#n203

We never finished implementing that part of the design though, partly because domain fronting didn't exist back then so there wasn't a magic signaling channel to use if all of your bridges fail, and partly because we never sorted out how to organize the state and user experience properly on the Tor client side (see below).

Based on the original vision, let me suggest two improvements to the subscription idea here:

Rather than needing a separate ticket design, you can simply use the bridge's identity fingerprint as proof that you know about it. All components in Tor (e.g. metrics) are careful to keep the fingerprint secret, because the bridge authority lets you convert the fingerprint into an IP address.
If we want to get fancier, we could notice when a bridge goes offline completely (as opposed to changing its IP address or getting censored) and then when users ask for an update about that fingerprint, we could automatically give them a fresh different bridge, again without needing any captcha or other interaction. The reasoning is that their bridge went offline through no fault of the user, so we give them a "free" replacement.

That second improvement is something we need for systems like Salmon too, where we want a way to distinguish between "bridge got blocked" and "bridge went away" and we need to auto replace bridges that went away.

And here are two design / logistical decisions we'll want to think through:

For the server side, do we want to make use of the already-existing functionality in the bridge authority, or do we want to build the feature again into bridgedb/rdsys? The reason to use the existing one in the bridge auth is because it's already there, but the reasons to move it to bridgedb are (a) because we have been slowly deprecating the whole idea of a bridge authority (e.g. the bridge auth no longer knows how to properly test reachability, since modern bridges use PTs like obfs4), and (b) then we could extend the idea more easily to do improvements like the 'replace with a fresh different bridge' approach above. So, I think I lean toward adding the functionality to bridgedb. Who knows, if we keep shrinking what the bridge auth does, maybe we can replace it with an onion service that just receives bridge descriptors and passes them to bridgedb. :)
For the client side, should we build this logic into Tor itself, or into a component of Tor Browser? It turned out to be quite awkward to build the original vision directly into Tor -- Tor thinks of the config file as what it should do, and the state file as what has happened to it, so if you learn about a fresh bridge address, do you modify the config file (no), or do you add it to the state file and then override what you see in the config file, or what? It's doable but we didn't find a way to make it elegant. That plus the fact that we want to deprecate C-Tor and replace it with Arti one day makes it likely that nobody will want to do major architectural changes inside C-Tor for this. But doing it in Tor Browser means that every new Tor-using app that wants the feature will have to build it too. Overall, I lean toward putting it in whatever component of Tor Browser knows how to talk moat.

I would assume that in most cases if the bridge got blocked the censor not only knows the IP of the bridge but also the fingerprint (or whatever mechanism we use to renew bridges). So getting the new IP with the fingerprint is not so useful as the censor might use this mechanism to update the firewall with the new IP.

But what @arma says makes a lot of sense, to let you renew bridges if you know the fingerprint of some working bridges. That sounds very similar to salmon and for that kind of mechanism might be better to follow salmon or any other protocol that had some serious research and some work on simulating it and see what is best to reduce the options of the censor.

I see it might be useful to let people host bridges in dynamic IP addresses, home internet connections are getting pretty decent and it might help to enlarge the bridge pool. So being able to get the new IP address of a bridge from the fingerprint is still useful by itself even if is not so useful against the censor. And it might be worth it explore how to make it work.

Could we make it so a bridgeline is consisting only on the fingerprint and the PT type rest of the data we get it from moat or something? I guess we don't want to do that on the tor side, as it will require being able to do domain fronting. Will not be hard to implement in TB or to do a simple client to which tor talks to (as a PT?).

The work we are doing to automatize censorship circumvention (bridgedb#40025 (closed)) is providing bridges without captcha, so it might reduce the need to have stable IP addresses by making it easier to get new ones. But we might want to see how this goes before relaying on it too much. Also might be nice to be able to have bridges with dynamic IP address for other distribution mechanisms too.

Regarding Allow Adversary to Get New IP Address

An adversary can block any new IP address as it acquires them, but it is also possible for the operator to switch IP addresses quickly once we have this system. The current Captcha-IP based bridge allocation system can be circumvented by resourceful adversaries using prisoner labour and proxy provider by ISP. If the address remains static, then they can render most of the bridge remain inaccessible with comparatively low costs. The adversary takes time to block any IP address, but the user can connect instantly. So long as the process of changing IP addresses is automated, we are at an advantage. A similar approach method is used by SoftEther VPN's VPNGate project. It has used an interesting method to prevent adversaries to block by automatically obtaining IP addresses from the subscription. On the bottom part of the page, it said:

Using the VPN Server List of VPN Gate Service as the IP Blocking List of your country's Censorship Firewall is prohibited by us.
The VPN Server List sometimes contains wrong IP addresses.
If you enter the IP address list into your Censorship Firewall, unexpected accidents will occur on the firewall.
Therefore you must not use the VPN Server List for managing your Censorship Firewall's IP blocking list.

I wonder if they have placed some important IP addresses on it and resulting in incidents like this.

Location For Processing Dynamic Server Address Logic

I think there are two ideal locations to place this logic in.

moat: There could be two types of refresh: attended refresh in which user get new bridge information. unattended refresh in which the previous bridge lines are submitted to moat for an unattended refresh. This means no additional storage or specific logic is required, and minimal change from the tor browser is required.
Chained PT: we could create a new PT that updates server information(including the information about the subscription server), stores this information, send service availability feedback to the subscription server, and invoke other PTs to finish the actual proxy work. This enables us to create a more advanced server information update and availability observe method that would in theory works for all kinds of PTs.

Salmon or other research-based approaches

I believe Salmon is a good way for us to allocate the bridges to users in the first place like providing the fingerprint of the bridge, and the user can then use a subscription system to update the bridge information. So Salmon supplement Captcha-IP based bridge allocation system, but subscription compliment these two bridge allocation system equally.

added Roadmap::Future label and removed Backlog label

Discussion on irc:

[3:26:48 pm] <shelikhoo> armadev: Yes, and right now the only obstacle of refreshing bridges more frequently is we don't have a subscription system that allow users to receive new bridge info without any hassle
[3:28:08 pm] <+armadev> we have the moat set-up, and we have rdsys
[3:28:17 pm] <+armadev> we could add a "give me bridges, here's a bridge i knew" query to rdsys
[3:28:30 pm] <+armadev> and then you send in yesterday's bridge fingerprints and it tells you something new if there is something new"
[3:28:48 pm] <+armadev> that is, we are close to having that working. "simple matter of programming", etc
[3:29:24 pm] <+armadev> rdsys would need to track the mapping from "if you knew that bridge, here is the bridge you should know now" but one simple version of that mapping could be "it maps to the same place in the hash ring"
[3:38:10 pm] <+armadev> i think we are going to want that subscription feature,
[3:38:19 pm] <+armadev> because it is a building block toward future other designs, like a reputation based one
[3:38:29 pm] <+armadev> but also we can use it as-is so dynamic bridges are more useful
[3:39:37 pm] <+armadev> we also need to be careful to distinguish between "the bridge went away, so you get a new one" from "the bridge got blocked, so no, you don't get a new one"
[3:39:52 pm] <+armadev> otherwise we give a lot of power back to the censor
[3:40:37 pm] <shelikhoo> Yes... there are a lot of details we need to discuss like where this function will be implemented, but I prefer we have it  
[3:41:27 pm] <shelikhoo> especially since this is an anti-censorship industry standard features now....
[3:41:39 pm] <+armadev> yep
[3:41:56 pm] <+armadev> (we had it, back in 2006ish, but then we lost it when we made other changes)
[3:42:24 pm] <+armadev> (this is what UpdateBridgesFromAuthority did)
[3:42:39 pm] <+armadev> (sort of ;)
[3:44:37 pm] <shelikhoo> (yes...)

I wonder if they have placed some important IP addresses on it

From David Fifield's thesis :

VPN Gate employed this idea [144 §4.2], mixing into the their public proxy list the addresses of root DNS servers and Windows Update servers. <...> After VPN Gate deployed the countermeasure of mixing high-collateral-damage servers into their proxy list, the firewall stopped blocking for two days, then resumed again, with an additional check that an IP addresses really was a VPN Gate proxy before blocking it.

So no, it doesn't stop motivated adversaries.

added S96-maybe-O2 label

added UX label

@duncan @gus @pier this proposal could be something we can consider for Q4 this year.

@shelikhoo: here is a design question we're going to want an answer for at some point: when we're using the subscription model, and bridge A goes down, do we send all the people who used bridge A to the same next bridge B? Or do we scatter the users of bridge A over all of the bridges in that same pool?

From a "total surface area exposed" perspective, it seems clear that we should map each bridge to a single "successor" bridge, and when the bridge goes away, we send all of its users to that single successor bridge. This way we maintain the "one proof of scarce resource gets you one working bridge" (e.g. one captcha solve gets you one bridge) goal.

Whereas in the GFW's view, it produces a weird phenomenon where batches of users stick together as they move from bridge to bridge. Is that dangerous or unsafe? It makes me nervous. I guess it isn't so bad if the only thing the users have in common is that they were originally assigned the same bridge. (It's not like in Salmon where maybe they have the same bridge because they are friends in real life.)

In conclusion, I think I have talked myself into the "single successor bridge" design simply because it's the only option that maintains the security properties.

But that choice leads to the next design question: how do we choose the successor for each bridge? I can imagine some naive approaches like "choose at random from all the bridges in that distribution pool at the moment", which will lead to larger and larger populations clumped around only a few bridges -- if the users of a given bridge are all of the users of any bridge who has ever had that bridge as its successor, then we'll end up with a very non-uniform assignment of users to current bridges.

In a world where users have (non-forgeable) accounts, we could use hash rings: your user id maps you to a unique place on the hash ring, and which bridge you should use is simply the next available bridge after your place on the hash ring. Then we would potentially split up users when reassigning them from a down bridge, but we would maintain proper load balancing across the network over time, and the number of bridges you can learn about per proof-of-scarce-resource is still limited. But nobody so far on this ticket has been talking about persistent non-forgeable client-side identities.

Second conclusion: there are still a lot of details to be worked out here, in terms of how exactly the subscription model should work. Somebody needs to think them all through and put together a self-consistent design.

mentioned in issue rdsys#56 (closed)

I think the "single successor bridge" would work in our case.

Even if we are creating a video game, when the server address change, all users will then connect to a new server address. The thing different here is that the adversaries know that the old address is now blocked, so they can guess that the new address the client is connecting to serves a similar role, which is actually the aspect that needs to be worried about.

As for which bridge to be distributed as a replacement,

I suggest we could first try to distribute S96 dynamic bridges' reincineration(the bridge set up to replace the blocked bridge) as a way to get client-server protocol working in the real world, and then use this mechanism to distribute Salmon/Other bridges later.

Finally about non-resettable identifiers

I think we might wish to avoid this kind of thing when possible. Although we could collect things like push notification tokens, provable device identifiers (like a secure element attest key in Android, Apple, TPM devices) or other documents like ID or phone numbers. But this information could be used to identify users, and for example, phone number could have been the reason Ma Yaya was sent to prison, so it is not like we could collect this info without a proportional amount of precaution.

the adversaries know that the old address is now blocked

Huh? There is apparently an idea that blocked bridges aren't replaced, only gone ones.

@sready (and @meskio earlier) raise a good concern: are we planning to use the subscription model for shifting users to a new bridge when their current bridge gets blocked? If yes, we open up a new attack for the censors: find a bridge, block it, use the subscription feature to get a replacement, block it, iterate until they've found and blocked all the bridges. And if that's an intended use case, we need to have some plan for how we will handle that attack.

If I understand @shelikhoo's proposals above, this plan is:

Yes, that attack would work well to quickly learn all of our volunteer-run bridges. So when we're replacing blocked bridges, we need to replace them with @irl's dynamic automated bridges -- the ones that are super easy to spin up and spin down in an automated and non-interactive way, and which run on a large pool of addresses that the censors don't otherwise just want to block all of.
The reason why the censor won't just immediately learn and block all of those dynamic bridges is (a) it takes a while for the censor to actually add a new block to their firewall, so each iteration of the loop won't be instant, (b) we can spin up new bridges in response to censorship fast enough to keep up with this blocking, and (c) the address pool for these bridges is big enough that we will never run out.
For bridges from the more-static volunteer-run pool, we don't want to replace them with other bridges from this more-static pool in the case where the bridge is blocked, because yes we would quickly run out. In that case if they're blocked we can shift users to the dynamic automated pool; but if they simply go offline, we can replace them with other bridges from the more-static volunteer pool.

This way the same bridge replacement mechanism can be used to "fast flux" your bridges in the face of blocking, and also to provide a smoother experience when volunteer-run bridges are inconsistently available. And in the future we'll use the same flow when Salmon users need a new bridge.

@shelikhoo, does this write-up match what you had in mind?

Yes, that looks good! Thanks, Roger~.

I don't think we have the capacity to independently verify if any bridge is blocked as our current testing is limited to research purposes and dynamic bridges. Maybe we can just have new bridges mark which old bridges they are intended to replace, and make it easy for volunteers to run dynamic bridges with a detailed guide and automation scripts. The bridge can self detect censorship with an active test(like ping an IP behind GFW) or get this info passively(by looking at connecting user's IP address.) So we will get something immediately that works for dynamic bridges, and allow volunteers to utilize it in the same way as the funded project's bridges.

We could use the fingerprint for that. Ask people to maintain the fingerprint if a bridge is a sucesor of a previous bridge, we already get that from people that runs bridges on dynamic IP addresses as their IP changes but their fingerprint will be stable.

It is looking more and more like another snowflake
Are we breaking "distributed volunteer-run network" model this way? Dynamic bridges are already distributed by Moat in Russia and China and by Telegram for new accounts. Now we are going to seamlessly shift another bunch of people to them. Is it good for users? Is it good for bridges?
AFAIK, now dynamic bridges live for days. In past, China had delays of less than hour for blocking after active probes. If they do the same with autodiscovery - the system fails.

mentioned in issue tpo/operations/team#6 (closed)

mentioned in issue censorship-analysis#40035

marked this issue as related to censorship-analysis#40035

mentioned in issue #102

marked this issue as related to #102

mentioned in issue tpo/core/arti#717

mentioned in issue #113 (closed)

mentioned in issue #116 (closed)

removed Sponsor 96 label

removed S96-maybe-O2 label

removed milestone %Sponsor 96: Rapid Expansion of Access to the Uncensored Internet through Tor in China, Hong Kong, & Tibet

Proposal: Support for Dynamic IP obfs4 bridges with unattended proxy information update(aka "Subscription")

Context

Proposal

Additional Backgrounds

Child items ...

Activity

Regarding Allow Adversary to Get New IP Address

Location For Processing Dynamic Server Address Logic

Salmon or other research-based approaches