Set up a Snowflake bridge staging server
The goals of this issue are:
- Share practical experience setting up a snowflake bridge.
- Write a Snowflake Bridge Installation Guide, like the existing Snowflake Broker Installation Guide.
- Install an experimental load balancing configuration in an attempt to increase the capacity of the bridge.
If the load balancing configuration works, we can then apply it to the production bridge. (Or change DNS so as to swap production and staging.)
We talked about scaling the snowflake bridge and load balancing ideas at the 2022-01-06 team meeting.
Meeting notes
- Scaling Snowflake bridge
- Already increased from 4 to 8 CPUs
- We should profile snowflake-server to reduce its CPU use
- But dcf feels that the bottleneck is not the snowflake-server process (which is multithreaded and can use multiple CPU cores), but the tor process (which can only use 1 CPU core and is already constantly at 100%). See https://lists.torproject.org/pipermail/tor-relays/2021-December/020156.html
- It turns out that it is possible to run multiple instances of tor with the same identity keys (hence the same fingerprint), either on the same host or on different hosts: https://lists.torproject.org/pipermail/tor-relays/2021-December/020157.html. Multiple instances of tor is a way of scaling tor beyond 1 CPU core, with snowflake-server distributing traffic over the available instances: https://lists.torproject.org/pipermail/tor-relays/2021-December/020182.html
- With another shim (similar to moat-shim) you can keep ExtORPort metrics reporting in the load-balanced configuration:
- https://gitlab.torproject.org/dcf/extor-static-cookie
- https://lists.torproject.org/pipermail/tor-relays/2022-January/020183.html
- Though I'm not sure, if Metrics gets multiple descriptors per day with the same fingerprint, whether it will sum all of them, or only keep one of them
- dcf has a bridge running now (with obfs4proxy, not snowflake-server) with this extor-static-cookie + load balancing configuration
- https://metrics.torproject.org/rs.html#details/07B9C6D7BE9685E83FA8C7A4FEB34CAD6CB77503
- Though I have not tested Roger's caveat about DEFAULT_ONION_KEY_LIFETIME_DAYS yet https://lists.torproject.org/pipermail/tor-relays/2022-January/020196.html
- We could apply this same thing to the snowflake bridge. But it is kind of a big configuration change: we need to stop snowflake-server being managed by tor, and instead run it independently (i.e., using runit or systemd) and have it talk to the tor instances' static ExtORPorts through a load balancer. It would be best to do this on a staging server separate from production first.
- we tentatively plan to do this next tuesday, 2022-01-11. dcf will be in touch and have a host ready for installation.
- Once the tor bottleneck is removed, though, we are not too far from the next likely bottleneck, which is total CPU of the host, shared between tor and snowflake-server. It might be time to think about migrating to different hosting for the snowflake bridge. Or we can experiment with running snowflake-server on one host, and N instances of tor on another (preferably nearby) host.