Right now there is one snowflake bridge, and its fingerprint is hard-coded in tor browser.
Eventually we will have enough load, and/or want more resiliency, that we want to set up a second snowflake bridge.
To be able to do that, I think we need changes at the client, changes at the snowflake, and changes at the broker.
[Edit 2022-03: the three items I list next are no longer quite the best way to do this ticket. See the extensive and ongoing discussions below for what we currently think is the best plan.]
(A) At the snowflake side, the snowflake needs to tell the broker which bridge(s) it is willing to send traffic to. Additionally, we either want to declare that each snowflake sends to only one bridge, or we need to add a way for the client to tell the snowflake which bridge it wants to reach.
(B) At the broker side, we need it to be able to learn from snowflakes which bridge(s) they use, and we need it to be able to learn from clients which bridge they want to use, and we need it to match clients with snowflakes that will reach that bridge.
(C) At the client side, we need it to tell the broker which bridge it wants to use, and (depending on our design choice in A above) we might also need the client to be able to tell the snowflake which bridge it wants to use.
(There is an alternative approach, where we assume that every snowflake is always running the newest javascript, so it is willing to reach every bridge on our master list. Then the broker doesn't need to do anything new, and we just need to add a way for the client to tell the snowflake which bridge it wants. I don't have a good handle on how realistic this assumption is.)
Edited
Designs
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
You could simplify further by removing (A). Don't have every proxy keep a whitelist of bridges; rather let it be willing to connect to any address the broker gives it. How this would work is: the client sends a bridge fingerprint or other identifier to the broker; the broker looks up the fingerprint in its own whitelist mapping fingerprint to IP:port; the broker gives the IP:port to the proxy.
What you would lose with this design is a measure of proxies' self-defense against a malicious broker. The broker could get a proxy to initiate a WebSocket connection to any destination.
Another design alternative, requiring changes in core tor: let a bridge line describe not just a single bridge fingerprint, but a set of them. The client is satisfied if any fingerprint in the set matches. The broker (or the proxy) knows the current set of bridges, and randomly selects one without any control by the client.
Adding a new bridge to the set would require pushing out new bridge lines to users (i.e., making a new Tor Browser release). But if new bridges are only needed to increase capacity, that should be a frequent enough pace.
I don't think we need to make any major design changes to snowflake or Tor.
Instead, we can achieve what we want by configuring Tor and snowflake (and perhaps adding a small amount of code).
Another design alternative, requiring changes in core tor: let a bridge line describe not just a single bridge fingerprint, but a set of them. The client is satisfied if any fingerprint in the set matches. The broker (or the proxy) knows the current set of bridges, and randomly selects one without any control by the client.
Tor isn't really built to have more than one fingerprint per bridge. Instead, if Tor is configured with multiple bridge lines, it tries to connect to all of the bridges, then selects between available bridges at random.
Here's the current design:
each client bridge line has a broker, bridge, and (maybe?) STUN server
each broker knows its corresponding bridge
each proxy is allocated to a broker/bridge
This design can be gracefully upgraded to:
a multi-bridge client, by distributing different bridge lines with different brokers, bridges, and (at least 2) different STUN servers
a multi-bridge broker, by using a different port on the broker for each bridge
a multi-broker/bridge proxy, by having the proxy connect to multiple brokers, then assign client offers from each broker to the corresponding bridge
alternately, each proxy can choose a single bridge/broker at random
Adding a new bridge to the set would require pushing out new bridge lines to users (i.e., making a new Tor Browser release). But if new bridges are only needed to increase capacity, that should be a frequent enough pace.
New bridges are also needed if one of the bridges goes down.
Tor isn't really built to have more than one fingerprint per bridge.
Yes, I realize that. That is why I said "requiring changes in core tor." I'm only brainstorming.
Instead, if Tor is configured with multiple bridge lines, it tries to connect to all of the bridges, then selects between available bridges at random.
Here's the current design:
each client bridge line has a broker, bridge, and (maybe?) STUN server
each broker knows its corresponding bridge
each proxy is allocated to a broker/bridge
If I understand you, this would use multiple bridge lines in torrc, one for every valid possibility of bridge/broker. So for example, if there were one broker and three bridges with fingerprints 1234..., 5555..., and ABCD...:
What is potentially unexpected about this approach is that, in my experience, tor does not select just one of its many bridge lines at random; rather it selects several and tries all of the simultaneously. So here, the snowflake-client would simultaneously send out three registration messages (over domain fronting or something else). I guess is isn't too big a problem, but it makes me worry a bit more about fingerprinting the registration process—especially if there are two brokers with two different domain fronts, connecting to them both at the same time could be a tell that is not present in normal traffic.
Here is a torrc file that demonstrates that tor selects more than one of its bridge lines:
In the log you will see connections made to both bridges. This is why I was trying to think of a design that only requires one bridge line.
Nov 29 10:38:41.000 [info] connection_ap_make_link(): Making internal direct tunnel to 85.31.186.26:443 ...Nov 29 10:38:41.000 [info] connection_ap_make_link(): Making internal direct tunnel to 216.252.162.21:46089 ...Nov 29 10:38:41.000 [info] connection_read_proxy_handshake(): Proxy Client: connection to 216.252.162.21:46089 successfulNov 29 10:38:42.000 [info] connection_read_proxy_handshake(): Proxy Client: connection to 85.31.186.26:443 successfulNov 29 10:38:42.000 [info] add_an_entry_guard(): Chose $0DB8799466902192B6C7576D58D4F7F714EC87C1~noisebridge01 at 216.252.162.21 as new entry guard.Nov 29 10:38:43.000 [info] add_an_entry_guard(): Chose $91A6354697E6B02A386312F68D82CF86824D3606~zipfelmuetze at 85.31.186.26 as new entry guard.
This design can be gracefully upgraded to:
a multi-bridge client, by distributing different bridge lines with different brokers, bridges, and (at least 2) different STUN servers
a multi-bridge broker, by using a different port on the broker for each bridge
I don't understand you here, "a different port on the broker." We envision the client connecting to the broker over some covert channel, like domain fronting or DNS, that doesn't allow control of the destination port. Why encode the selected bridge in transport-layer metadata anyway? The client registration message is basically a blob—already around 1000 bytes because of ICE metadata—that can encode k=v pairs, so you can augment it to contain the name or fingerprint of the desired bridge.
a multi-broker/bridge proxy, by having the proxy connect to multiple brokers, then assign client offers from each broker to the corresponding bridge
alternately, each proxy can choose a single bridge/broker at random
I get that you're going for redundancy and resilience with multiple brokers. It is a good idea to have multiple brokers running, but there may not be a need to actually encode knowledge of this fact at the client. The difficulty with bridge lines is that each one contains a fingerprint—so a client really does have to store locally a list of every bridge it may want to connect to. But with the broker we have additional layers of indirection. Current clients are using the broker !https://snowflake-broker.bamsoftware.com/, but the string "snowflake-broker.bamsoftware.com" doesn't appear anywhere on the client—the actual server the broker is on could change its name or IP address without clients needing to know about it. For example, if one broker goes down, we can change the CDN configuration and point the domain front at a backup one. Or with DNS registration, we can change the IP address of the authoritative DNS server, or potentially even round-robin across multiple brokers, all on the backend. If the system ever gets really big, then the broker doesn't even have to be one thing: it can be a distributed system with e.g. a shared database and its own internal redundancy. I feel that these are implementation decisions that can achieve resilience without needing to be exposed to clients.
if Tor is configured with multiple bridge lines, it tries to connect to all of the bridges, then selects between available bridges at random.
According to some other issues, if there are multiple bridges that have the same fingerprint, Tor will actually only try connecting to one of them. But this behavior is apparently not by design and may change.
You could simplify further by removing (A). Don't have every proxy keep a whitelist of bridges; rather let it be willing to connect to any address the broker gives it. How this would work is: the client sends a bridge fingerprint or other identifier to the broker; the broker looks up the fingerprint in its own whitelist mapping fingerprint to IP:port; the broker gives the IP:port to the proxy.
I haven't seen this discussed further, and I want to sketch out the idea more fully so we can decide if its worth pursuing. I see the concern for potential abuse if the broker becomes compromised, and there are a few other potential issues as well. I'm not necessarily advocating for it, just documenting it.
Design
The design for this change is pretty simple, and passing information from the client to proxies is easy as is validating this information at the broker. We do it with the SDP offer and the client's NAT type.
Now that we've added logic to read in SOCKS arguments (#40059 (closed)), we can easily configure bridge lines with the bridge's IP address by introducing a new SOCKS argument:
We then modify the client-broker and then broker-proxy message protocols to pass the server address to the proxy. Once the proxy receives the offer from the broker, they'll make a connection to the server address so that it's ready by the time the client connects. This could even be done before sending the answer so that difficulty connecting can be reported back to the client.
Backward compatibility
This can be made backwards compatible for old clients easily by having proxies stick with the current default bridge if no server address is given. However, new clients matched with old proxies will have some difficulty. One option is to push the update out to proxies, wait a reasonable amount of time for the updates to be applied, and then kick the old proxies out of the network. We've done a similar kick before by bumping and checking the version numbers for the proxy-broker protocol.
Allow listing
Letting client's use our infrastructure to connect to arbitrary endpoints has drawbacks. If no validation/allow-listing is done on server addresses, Snowflake infrastructure can be easily used for non-Tor purposes and clients could point proxies towards any arbitrary destination.
The easiest place to implement allow-listing is at broker since it's centralized and easy to update. However, the point was raised above about the potential for a compromised or malicious broker to use proxies to connect to arbitrary locations.
Allow-listing at the proxies, as originally proposed, makes the broker matching behaviour more complicated. Clients would now have to matched against proxies with compatible NATs and with a compatible set of allowed server addresses. Another way to do this is to enact multiple stages of the backward incompatible rollout as described above. When a new bridge is spun up that we want to add to our allow list we: 1) push the update to the proxy allowlists, 2) wait a while, 3) kick out old proxies, and 4) advertize the new bridge to clients. This is slow, but mitigates the concern of bad broker behaviour.
If we perform allow listing at the proxies, it would be easy to implement it additionally at the broker so that there are two filters for bad/unknown destiations.
Proxy-bridge connectivity issues
We introduced a snowflake server probe check into our web based proxies (#31391 (closed)) to try and reduce the failure rate of matched proxies by weeding out proxies that are unable to connect to the bridge before they poll the broker. This is not possible if we allow connections to arbitrary destinations.
It is still achievable if proxies contain a finite allow list of bridges, but not trivial. For example, if one bridge in the allow list is down (as happens often with any one of our built in obfs4 bridges), how does the snowflake know whether this connection failure is a problem on its side vs the bridge side? Do we allow it to poll? What if a client tries to connect to the downed bridge? We don't know how much removing this feature will affect the failure rate of matched proxies.
Letting client's use our infrastructure to connect to arbitrary endpoints has drawbacks. If no validation/allow-listing is done on server addresses, Snowflake infrastructure can be easily used for non-Tor purposes and clients could point proxies towards any arbitrary destination.
Instead of an IP:port, the bridge line could use some more abstract identifier. Since the bridge fingerprint is the critical thing, the fingerprint could be an identifier.
The broker could maintain a mapping from fingerprint to IP:port, which would simultaneously serve as an allowlist.
I was discussing this with @arlo and he suggested that the torrc file could be dynamically written (by Tor Browser), so that it contains a randomly selected snowflake bridge line, but only one.
we want to move into a design where there are multiple snowflake-servers and multiple bridges
idea: partition snowflake-servers and bridges into non-overlapping "pools"
i.e., snowflake-server A forwards to a set of bridges, snowflake-server B forwards to its set of bridges, and they share no bridges in common
(which is a generalization of the current case, where there is 1 snowflake-server and 1 bridge)
(there could be distinct brokers for different pools, but that is orthogonal)
partitioning the set of bridges has this effect: when a broker/proxy wants to connect a client's traffic to a snowflake-server, for a specific bridge there is one and only one snowflake-server that is associated with that bridge
therefore multiple connections in the same session, which use the same bridge, will also use the same snowflake-server, and therefore not lose turbo tunnel state
it can work like this:
the client torrc specifies the desired bridge fingerprint in its bridge line as a SOCKS param (redundantly with the fingerprint that is already there as part of the syntax)
e.g. Bridge snowflake 1111222233334444aaaabbbbccccdddd 192.0.2.3:1 url=https://snowflake.torproject.net/ front=front.example.com ice=... fingerprint=1111222233334444aaaabbbbccccdddd
the client sends the desired bridge fingerprint along with its offer to the broker
the broker has a mapping of bridge fingerprints to snowflake-server WebSocket URLs. when the broker matches the client with a proxy, the broker informs the proxy of the WebSocket URL to connect to (i.e., which snowflake-server to connect to). it will always be the same snowflake-server URL for the same bridge fingerprint, because the mapping is consistent.
the proxy connects to the WebSocket URL provided by the broker, and it includes the bridge fingerprint in the URL query string when it makes the connection (the same way the client IP address is communicated: e.g. ?client_ip=1.2.3.4:5678&fingerprint=1111222233334444aaaabbbbccccdddd)
the snowflake-server has its own mapping of bridge fingerprint to ExtORPort addresses. if it's an existing turbo tunnel session, it just resumes the ExtORPort TCP it already has. if it's a new session, it connects to the ExtORPort address corresponding to the bridge fingerprint.
such a design alleviates the state-sharing concerns with multiple snowflake-servers. as long as a client uses a consistent bridge fingerprint during a session (which it will), it will get mapped to the same snowflake-server.
arlolra is planning to write a patch for including the bridge fingerprint as a bridge line SOCKS param, and passing it to the broker, which is a necessary step for any of this
then there can be multiple snowflake bridge lines in torrc, each with different bridge fingerprints. load balancing will come from the tor client's local random selection.
there is a "thundering herd" concern with the way tor currently uses multiple bridge lines. tor will attempt to connect to all of them at once, and keep using only one of them. This means N broker transactions and N STUN exchanges, and possibly N−1 proxies held idle.
maybe it's not such a big problem, if N is not too large
an alternative would be a modification to tor where it shuffles its bridge list, and tries only one at a time
another alternative is to write the torrc file dynamically: choose a random bridge line, then write torrc containing only that one randomly selected bridge line
need to consider backward compatibility: a client that doesn't communicate its desired fingerprint gets mapped to the existing
The most important point, to me, was sparked by something @shelikhoo said, about having more than one "pool" of bridges and snowflake-servers. Separately from the problem of the client informing the broker/proxy/snowflake-server what bridge it wants to use, there is the problem of how to handle more than one snowflake-server. (At some point we will want more than one snowflake-server, for scaling reasons.) An idea that would not work is that the client randomly selects its bridge, and the broker or proxy randomly selects the snowflake-server. The reason is doesn't work is that snowflake-server is where the client's turbo tunnel session state. When a client's proxy connection dies, if its reconnection doesn't connect it to the same snowflake-server, it will not have the correct state.
The idea is to partition the set of available bridges, so that each snowflake-server has its own subset of bridges it is associated with, and the different subsets do not overlap. This way, there is only one snowflake-server associated with each bridge fingerprint. A bridge fingerprint serves as a key that identifies not only a bridge, but also the snowflake-server that is responsible for that bridge. A client randomly chooses one of the available bridge fingerprints, and as long as it uses the same bridge fingerprint throughout its session, it will be mapped to the same snowflake-server and all its session state.
In short, don't do this:
Do this:
A sketch of how it could work:
The client's bridge line includes its desired bridge fingerprint as a SOCKS param (redundantly with the bridge fingerprint that is already part of the bridge line):
The client's rendezvous message to the broker includes the bridge fingerprint.
The broker has a mapping of bridge fingerprint → snowflake-server WebSocket URL. When a proxy arrives to serve the client, the broker gives the proxy not only the client's WebRTC offer, but also the WebSocket URL of the snowflake-server it should connect to.
The proxy connects to the given WebSocket URL provided by the broker, including a query string parameter for the bridge fingerprint in its HTTP request: ?client_ip=1.2.3.4:5678&fingerprint=1111222233334444aaaabbbbccccdddd.
snowflake-server has its own mapping of bridge fingerprint → ExtORPort address. (Here, the "ExtORPort address" is probably actually a haproxy listening port that load-balances over multiple instances of tor, but that doesn't matter for this discussion.) If the connection's clientID names an existing turbo tunnel session, snowflake-server resumes its already connected TCP connection. If it's a new session, snowflake-server connects to the ExtORPort that the bridge fingerprint maps to.
Our current setup, with only one snowflake-server, already naturally satisfies the "one snowflake-server per bridge fingerprint" propery. So we don't need to worry about the topology until we reach the point of needing a second snowflake-server instance. The part about having a bridge fingerprint as a SOCKS param, and forwarding the fingerprint along to snowflake-server, will be useful anyway for multiple bridges.
For backward compatibility, we can treat a client that doesn't include a bridge fingerprint with its offer as if it wants the existing 2B280B23E1107BB62ABFC40DDCC8824814F80A72. But a client that does specify a fingerprint needs to be paired with a proxy that knows how to forward the ?fingerprint= URL query parameter to snowflake-server. Alternatively, we could make the bridge fingerprint part of the end-to-end protocol between the client and snowflake-server (like the turbo tunnel token the client prefixes to its streams), so that proxies don't need to be aware of it.
I was discussing this with @arlo and he suggested that the torrc file could be dynamically written (by Tor Browser), so that it contains a randomly selected snowflake bridge line, but only one.
I opened the Tor ticket tpo/core/tor#40578 for better handling the situation where we have many snowflake bridge lines but we don't want every snowflake user to reach out to every snowflake bridge each time. I think that's the right long term design, since having other snowflake bridge lines to fall back to if the first one fails is really useful for resiliency. Otherwise any time a bridge goes down, 1/k of the Snowflake population gets broken, and we're always going to be wondering each time a Snowflake user has a problem whether one of the bridges is flaky or what.
Since I'm trying to be more sustainable and not-do-all-the-things in Q2, I won't be doing that patch in Q2. But maybe it will fit in my Q3, or maybe somebody else will want to do it first. And hey, with luck maybe we will have more than one snowflake bridge line by then but not yet fifty of them. :)
The client tells the broker what bridge fingerprint it wants to use (!81 (merged), in progress)
Proxies tell the broker what bridge fingerprints they know about in their poll messages (@arlo offered to do it) OR proxies signal a willingness to connect to any WebSocket URL (see discussion below)
@shelikhoo will consider the options and make a decision
The broker has a list of known bridge fingerprints
The broker matches clients with proxies, considering not only NAT compatibility but also bridge fingerprint
The broker tells the proxy what snowflake-server to connect to in some fashion (i.e., as an explicit WebSocket URL, or as a bridge fingerprint that the proxy itself maps to a WebSocket URL)
A proxy that does not signal awareness of bridge fingerprints can be matched only with clients that either: do not specify a bridge fingerprint, or specify the current default 2B280B23E1107BB62ABFC40DDCC8824814F80A72
Proxies forward the requested bridge fingerprint to the snowflake-server (i.e., as fingerprint=XXXX in the URL)
snowflake-server has its own mapping of bridge fingerprint → local ExtORPort address
Needs a configuration file or something, the ExtORPort can no longer come from TOR_PT_EXTENDED_SERVER_PORT since there may be more than one
The more I think about it, the more I think the last two points (which enable there to be multiple bridge fingerprints per snowflake-server) are unneeded. We agreed in the meeting that these points are the lowest priority and not a blocker for anything else, but it may be preferable not to do them at all. What this would mean is that there is one and only one bridge fingerprint per snowflake-server. (Compare with the diagram in #28651 (comment 2783541): under S1 there would be only B1; under S2 there would be only B2.) What this means practically is that anyone who operates a Tor bridge (i.e., with a distinct fingerprint) would also operate the snowflake-server frontend to that bridge. The design is less flexible and more tightly coupled, but simpler.
An important design question came up: how should the broker instruct proxies what WebSocket server to connect to? The obvious solutions are that the broker sends an explicit WebSocket URL, or else a bridge fingerprint that the proxy itself maps to a WebSocket URL. The real critical question is: is a proxy is willing to connect to any destination the broker tells it to connect to; or does each proxy have its own allowlist of WebSocket URLs it is willing to connect to? The former approach is simpler and more flexible; the latter permits proxies to defend themselves against a malicious or compromised broker. This also affects the matter of where the mapping of bridge fingerprint → WebSocket URL is stored. Is it stored only at the broker (broker does the mapping and tell the proxy the result), or is a copy stored with every proxy? During the meeting, I assumed that there were considerations surrounding the WebExtension permissions manifest, but I was mistaken: you don't need to list the possible WebSocket servers in the manifest and our current manifest does not. @shelikhoo is going to make the call here. (@shelikhoo I am coming around to your point of view about proxies not having their own allowlist, especially as I was mistaken about the permission manifest.)
If proxies are willing to connect to any WebSocket URL, there is no need for proxies to send a list of known bridge fingerprints / WebSocket servers. But proxies do still need to signal, somehow, that they are recent enough to know about the possibility of multiple bridges, and do not simply hardcodewss://snowflake.freehaven.net/. The signal could be, for example, a boolean field, or a minimum polling protocol version number.
If proxies enforce their own allowlist of snowflake-servers, an uncomplicated way to do client–proxy matching is to simply reject any proxy that does not support the complete list of bridge fingerprints the broker knows about.
I have considered all options we have. Instead of choosing any existing option, I decided to fuse these two options to create a matchmaking system that protects the proxy from being instructed to connect to an arbitrary location while eliminating the need to constantly synchronize the allowed proxy WebSocket address.
The broker still informs the proxy about the WebSocket address it will connect to. The proxy checks the FQDN of the domain name part of that WebSocket address to see if has a suffix of allowed domain name suffix like snowflake.torproject.org. or an exact match if that allowed domain name 'suffix' starts with caret ^ . In this way, a malfunctioning broker cannot trick the client into connecting to an arbitrary location, while removing the need to constantly update an allow-list. A client will connect to snowflake.torproject.org, eu-snowflake.torproject.org, but not example.org.
In order to update gracefully for the client, the proxy will need to send its allowed domain name suffix to the server, and without this info, the server assumes ^snowflake.torproject.org. The broker will reject this request unless the proxy's allowed suffix is shorter than that of broker's. This rejects if not match approach reduced the need to create more than one pool set that has shared proxies on the broker(which can significantly increase code complexity, considering once a proxy is matched it need to be withdrawn from all other pools). However, because of the design, this reduced availability only occurs when updating proxy, not when updating the list of snowflake servers.
The rollout process will be like this:
update broker with allowed domain name suffix of ^snowflake.torproject.org.
existing proxy(last generation proxy) will keep working as their assumed allowed domain name suffix is ^snowflake.torproject.org.
update proxy with allowed domain name suffix of snowflake.torproject.org.
new proxy(current generation proxy) will work as its allowed domain name is a suffix of broker's one
wait enough time to get almost everyone to update
update broker with allowed domain name suffix of snowflake.torproject.org.
last generation proxy will be rejected, all proxies in the pool can understand the broker's WebSocket URL instruction, and will allow that connection.
update clients to use alternative bridges.
existing clients are not influenced as they have an assumed request fingerprint of the default bridge.
In this rollout process, no transition only code are introduced. Instead, everything is conducted with a generally useful proxy compatibility check.
I think that everywhere you have snowflake.torproject.org, you actually mean snowflake.torproject.net.
It's probably easier for administration if the allowed domain names are domain suffixes rather than string suffixes. That is, use names like eu.snowflake.torproject.net, not eu-snowflake.torproject.net.
The ^ notation is potentially confusing, because it looks like a regular expression beginning-of-string anchor, while the matching operation is actually the opposite: the domain suffix needs to be at the end of the matched string.
Yes. You are correct. I discovered that I should have used snowflake.torproject.net during the testing and fixed this already in the code.
I didn't use domain suffixes since it won't work with some wildcard certificates. If the certificate is for the tld.org,*.tld.org domain then it will work with snowflake.tld.org, eu-snowflake.tld.org, but not eu.snowflake.tld.org. Many third-party software/services won't work well if the domain is not something dot public suffix, we are not using them now but if one day we need them we should be able to use them.
Yes. My initial rationale is that there is an implicit $ at the end of the string. I have documented this, but suggestions are welcomed.