Upgrade snowflake broker machine from Debian 10

changed due date to June 01, 2024

marked this issue as related to #40329 (closed)

@dcf has agreed to provision a new VM for it and @shelikhoo will handle the installation adding a reverse proxy to address #40329 (closed).

The installation guide: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Broker-Installation-Guide

I provisioned a new VPS.

37.218.242.175
RSA: 3072 SHA256:rvKw6QleY2arxnM1SDVK/sbrHtM4s3QUT8C8UdwnHUo
ECDSA: 256 SHA256:58+L8TIvge80wI2N7IeLBEqfAiOJ+KgxLsYhpaKSbAQ
ED25519: 256 SHA256:1Jkdhvq2lDSCgjQu5z3RV3BSvt+XdW2C3rV7JvIvpqc

The root account has my and @shelikhoo's public keys copied from the existing broker in the authorized_keys file. Other that than I didn't do any configuration.

ssh -i ~/.ssh/broker-key root@37.218.242.175

I had to configure the VPS with less RAM and CPU than the existing broker, because of resource limits on the account. What we can do, is get the new VPS all installed and configured, then restart the old broker with less resources and restart the new one with more, when it's time to do the migration.

We have a resource limit of 220.0 units. The current broker uses 184.1 units:

32 GB
8 CPU cores
10 GiB disk

That leaves us with 35.9 units. This is what I was able to provision under that limit, costing 26.2 units:

4 GB
2 CPU cores
20 GiB disk

The RAM and CPU allocation is easy to change, but it requires powering off the VPS.

I have finished setting up nginx, acme and https forwarder on the machine. This is what I did:

# Firstly, install nginx with stream plugin
apt install nginx libnginx-mod-stream

# Setup APLN based tls acme: no need to worry about what to put on the http site anymore
# COPY common/conf/alpn_proxy_nginx /etc/nginx/modules-enabled/99-000-tlsacmeapln.conf

# Install curl and acme
apt install curl
useradd -m acmeworker
sudo -u acmeworker bash
curl https://get.acme.sh | sh -s email=shelikhoo@torproject.org

# The domain name is only used for testing, and only letsencrypt supports alpn based domain validation
./acme.sh --issue --domain 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io  --server letsencrypt --alpn --tlsport 11443

# and finally restart nginx
curl https://ssl-config.mozilla.org/ffdhe2048.txt > /etc/nginx/ffdhe2048.txt
systemctl enable nginx
systemctl start nginx
nginx -s reload

The content of config files:

alpn_proxy_nginx:

stream {    
    map $ssl_preread_alpn_protocols $tls_address {
      ~\bacme-tls/1\b 127.0.0.1:10443;
      default 127.0.0.1:2443;
    }

    server {
      listen 443;
      listen [::]:443;
      proxy_pass $tls_address;
      ssl_preread on;
      proxy_protocol on;
    }
    
    server {
      listen 127.0.0.1:10443 proxy_protocol;
      proxy_pass 127.0.0.1:11443;
      ssl_preread on;
      proxy_protocol off;
    }
}

nginx-https-config:

server {
    listen 127.0.0.1:2443 ssl http2 proxy_protocol;

    ssl_certificate fullchain.pem;
    ssl_certificate_key priv.key;
    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;  # about 40000 sessions
    ssl_session_tickets off;

    # curl https://ssl-config.mozilla.org/ffdhe2048.txt > /path/to/dhparam
    ssl_dhparam /etc/nginx/ffdhe2048.txt;

    # intermediate configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305;
    ssl_prefer_server_ciphers off;

    # HSTS (ngx_http_headers_module is required) (63072000 seconds)
    add_header Strict-Transport-Security "max-age=63072000" always;

    # OCSP stapling
    ssl_stapling on;
    
    include sites-available/https/*.rconf;
    include sites-available/https/*.conf;
}

added Doing label and removed Roadmap::Future label

I have finished working on initial deployment of broker and basic testing of it with client and standalone proxy. The SQS and amp cache signaling channel is not tested yet.

# Create a user just for webapps
useradd -m webapp

# install additional packages required for the environment; if a different slim installion was used, it may require different set of packages to be installed manually. Please do not lost hope when systemd failed in wired way and the error message yield no result with search engine
apt install systemd-container libpam-systemd

# sudo is not sufficient here as we are interacting systemd and there is many environment varibles needs to be set correctly
machinectl shell --uid=webapp

# Copy binaries to .config/broker

# Copy broker.service to .config/systemd/user

# Fetch geoip files
curl https://archive.torproject.org/tor-package-archive/torbrowser/13.0.14/tor-expert-bundle-linux-x86_64-13.0.14.tar.gz > tor-expert-bundle-linux-x86_64-13.0.14.tar.gz
tar -xzvf tor-expert-bundle-linux-x86_64-13.0.14.tar.gz
# Copy data files to .config/broker

systemctl enable --user broker.service
systemctl start --user broker.service

# return to root shell

# Copy broker.conf to /etc/nginx/sites-available/https
nginx -s reload

content of config files:

broker.conf

        location ~ ((proxy)|(client)|(answer)|(metrics)|(prometheus)|(amp/client/.*)|(robots.txt)) {
            proxy_pass http://127.0.0.1:8080;
            proxy_http_version 1.1;
        }

broker.service

[Unit]
Description=Snowflake Broker

    
[Service]

ExecStart=%S/broker/broker --metrics-log metrics.log --bridge-list-path bridge_list.json --default-relay-pattern ^snowflake.torproject.net$ --allowed-relay-pattern snowflake.torproject.net$ --disable-tls --geoipdb geoip --geoip6db geoip6 --addr 127.0.0.1:8080
WorkingDirectory=%S/broker
RestartSec=5s
Restart=on-failure

StateDirectory=broker
    
[Install]
WantedBy=default.target

# Fetch geoip files
curl https://archive.torproject.org/tor-package-archive/torbrowser/13.0.14/tor-expert-bundle-linux-x86_64-13.0.14.tar.gz > tor-expert-bundle-linux-x86_64-13.0.14.tar.gz
tar -xzvf tor-expert-bundle-linux-x86_64-13.0.14.tar.gz
# Copy data files to .config/broker

IMO the distribution tor-geoipdb package should be preferred to a one-time download of a tarball. If it's not automated, the geoipdb will never be updated, practically speaking.

Setting up auto update for geoipdb:

# From tor's official setup guide https://support.torproject.org/apt/
curl https://deb.torproject.org/torproject.org/A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89.asc | gpg --dearmor | tee /usr/share/keyrings/deb.torproject.org-keyring.gpg >/dev/null
apt install tor-geoipdb deb.torproject.org-keyring

# Although we only selected tor-geoipdb, a tor daemon is also installed at the same time. Let's disable it 
systemctl disable tor.service
systemctl mask tor.service

Search for geoip files managed by apt:

dpkg -S geoip
tor-geoipdb: /usr/share/doc/tor-geoipdb
tor-geoipdb: /usr/share/tor/geoip6
tor-geoipdb: /usr/share/doc/tor-geoipdb/changelog.gz
tor-geoipdb: /usr/share/doc/tor-geoipdb/copyright
tor-geoipdb: /usr/share/lintian/overrides/tor-geoipdb
tor-geoipdb: /usr/share/doc/tor-geoipdb/changelog.Debian.gz
tor-geoipdb: /usr/share/tor/geoip

and then adjust broker.service

[Unit]
Description=Snowflake Broker

    
[Service]

ExecStart=%S/broker/broker --metrics-log metrics.log --bridge-list-path bridge_list.json --default-relay-pattern ^snowflake.torproject.net$ --allowed-relay-pattern snowflake.torproject.net$ --disable-tls --geoipdb /usr/share/tor/geoip --geoip6db /usr/share/tor/geoip6 --addr 127.0.0.1:8080
WorkingDirectory=%S/broker
RestartSec=5s
Restart=on-failure

StateDirectory=broker
    
[Install]
WantedBy=default.target

This issue has been waiting for information two weeks or more. It needs attention. Please take care of this before the end of 2024-06-07. ~"Needs Information" tickets will be moved to the Icebox after that point.

(Any ticket left in Needs Review, Needs Information, Next, or Doing without activity for 14 days gets such notifications. Make a comment describing the current state of this ticket and remove the Stale label to fix this.)

To make the bot ignore this ticket, add the bot-ignore label.

added Stale label

mentioned in issue #40329 (closed)

Tell me when you want me to swap the RAM/CPU allocation of the old and new broker. (To give the new broker more CPU and RAM and the old broker less.) The old broker can continue running until you're ready to do the actual DNS change.

tpo/tpa/team#40716 (closed) is an example of coordinating with the admin team to do a DNS change.

For Let's Encrypt autocert, you will need to either (1) copy the Let's Encrypt account credentials from the old broker to the new, or (2) ask the admin team to add a DNS CAA record that permits the Let's Encrypt account on the new broker to get new certificates.

Refer to the installation documentation:

The broker will automatically acquire a TLS certificate for the names given in --acme-hostnames the first time each name is accessed. If you use a subdomain of torproject.net, then you will need to get in touch with the Tor sysadmin team and ask to have a CAA DNS record created that authorizes a certain Let's Encrypt account to get certificates for that domain. See tpo/tpa/team#41462 (closed). You can use the autocert-account-id program to find the name of the account created in the /home/snowflake-broker/acme-cert-cache directory.

I have updated the nginx with rate limit settings(See also #40329 (closed)), this setting would be effective as soon as we switch to it to be the primary instance.

The exact limitation is adjustable, and one currently shown is not considered final. Currently we have no idea how common is NAT gateway.

/etc/nginx/sites-available/https/broker.conf

        location ~ ((proxy)|(client)|(answer)|(metrics)|(prometheus)|(amp/client/.*)|(robots.txt)) {
            limit_req zone=snowflake burst=3;
            proxy_pass http://127.0.0.1:8080;
            proxy_http_version 1.1;
        }

/etc/nginx/conf.d/rate_limit_zone.conf:

limit_req_zone $binary_remote_addr zone=snowflake:10m rate=1r/s;

@dcf Could you please remind me about the IPv6 address of the new broker? I was unable to locate it with ip address command. It is necessary to setup the nat type testing probetest and enable IPv6 support for the broker.

There are instructions for assigning an IPv6 address in the installation guide. But for this host you must use the prefix 2a00:c6c0:0:151:4::/80, not 2a00:c6c0:0:154:4::/80.

Set up an IPv6 address. You can use any address in the 2a00:c6c0:0:151:4::/80 prefix.

root# python -c 'import os; print ":".join(os.urandom(2).encode("hex") for _ in range(3))'
d8aa:b4e6:c89f
root# vi /etc/network/interfaces
	iface eth0 inet6 static
		address 2a00:c6c0:0:154:4:d8aa:b4e6:c89f
		netmask 64
		gateway 2a00:c6c0:0:154::1
root# etckeeper commit "Add IPv6 address."
root# reboot

I have generated the following IP and will configuration it now.

It is generated with https://www.random.org/colors/hex, yes.... Python 2 script is not better than this.

2a00:c6c0:0:151:4:ae99:c0a9:d585/64

I have added 2a00:c6c0:0:151:4:ae99:c0a9:d585/64 address and default gateway of 2a00:c6c0:0:151::1 to it. (BTW: the subnet mask is 64 not 80)

Porting that little script to python3 is not too hard:

python3 -c 'import os; print(":".join(os.urandom(2).hex() for _ in range(3)))'

I am not sure about the subnet thing. eclips.is support told me that we were "assigned" the IPv6 block 2a00:c6c0:0:151:4::/80 for all out instances. The example configuration they gave me had netmask 64. The /80 might have to do with their internal accounting, or something. The configuration looks fine.

I think what actually happened here is that each client are allocated/reserved /80 subnet for that client, and in network configuration: /64 network are the address of neighbors(reachable with broadcast message). So they are actually not the same thing.

As for the python... Yeah, it is not hard to port...

removed Stale label

This issue has been waiting for information two weeks or more. It needs attention. Please take care of this before the end of 2024-07-09. ~"Needs Information" tickets will be moved to the Icebox after that point.

(Any ticket left in Needs Review, Needs Information, Next, or Doing without activity for 14 days gets such notifications. Make a comment describing the current state of this ticket and remove the Stale label to fix this.)

To make the bot ignore this ticket, add the bot-ignore label.

added Stale label

Thanks @shelikhoo ! What is the next step before closing this ticket?

changed due date to July 15, 2024

removed Stale label

This issue has been waiting for information two weeks or more. It needs attention. Please take care of this before the end of 2024-07-25. ~"Needs Information" tickets will be moved to the Icebox after that point.

(Any ticket left in Needs Review, Needs Information, Next, or Doing without activity for 14 days gets such notifications. Make a comment describing the current state of this ticket and remove the Stale label to fix this.)

To make the bot ignore this ticket, add the bot-ignore label.

added Stale label

mentioned in issue #40374 (closed)

removed Stale label

I am currently setting up the Nat Type Test Helper(It is also named probetest, but I wish it could be renamed.)

The network namespace setup script was installed first:

/var/lib/probenattest/init-netns.sh

#!/bin/bash
ip netns add net0
ip link add veth-a type veth peer name veth-b
ip link set veth-a netns net0
ip netns exec net0 ip link set lo up
ip netns exec net0 ip address add 10.0.0.2/24 dev veth-a
ip netns exec net0 ip address add fc00::2/7 dev veth-a
ip netns exec net0 ip link set veth-a up
ip address add 10.0.0.1/24 dev veth-b
ip address add fc00::1/7 dev veth-b
ip link set veth-b up
ip netns exec net0 ip route add default via 10.0.0.1 dev veth-a
ip netns exec net0 ip route add default via fc00::1 dev veth-a
mkdir -p /etc/netns/net0/
ln -sf /etc/resolv.conf /etc/netns/net0/
ln -sf /etc/hosts /etc/netns/net0/
echo 1 > /proc/sys/net/ipv4/ip_forward
sysctl -w net.ipv6.conf.all.forwarding=1

and the following startup namespace setup service was setup and enabled:

/etc/systemd/system/probeNatTestSetup.service

[Unit]
Description=Probe NAT Test Setup
Before=ferm.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=%S/probenattest/init-netns.sh

[Install]
WantedBy=default.target
WantedBy=ferm.service

WARNING: edited, systemd can brick the system with the previous version of the config.

The following action was then taken:

apt install ferm
apt remove iptables-persistent

And the ferm is configured with:

/etc/ferm/ferm.conf

    # static public-facing ip addresses
    @def $IPv4_WORLD = 37.218.242.175;
    @def $IPv6_WORLD = 2a00:c6c0:0:151:4:ae99:c0a9:d585;

    # static private ip address
    @def $IPv4_PRIVATE = 10.0.0.2;
    @def $IPv6_PRIVATE = fc00::2;

	domain (ip ip6) {
	    table filter {
		chain INPUT {
                policy DROP;

                # connection tracking
                mod state state INVALID DROP;
                mod state state (ESTABLISHED RELATED) ACCEPT;

                # allow local packet
                interface lo ACCEPT;

                # respond to ping
                proto icmp ACCEPT; 

                # allow SSH connections
                proto tcp dport ssh ACCEPT;
		
	        # allow HTTP connections (for ACME HTTP-01 challenge)
	        proto tcp dport http ACCEPT;
                proto tcp dport https ACCEPT;


                # allow HTTPS-ALT connections
                proto tcp dport 8443 ACCEPT;
		}
                
                chain OUTPUT {
                policy ACCEPT;

                # connection tracking
                #mod state state INVALID DROP;
                mod state state (ESTABLISHED RELATED) ACCEPT;
                }
                chain FORWARD {
                policy DROP;

                # connection tracking
                mod state state INVALID DROP;
                mod state state (ESTABLISHED RELATED) ACCEPT;

                # forward packets to subnet
                @if @eq($DOMAIN, ip) {
                  daddr $IPv4_PRIVATE ACCEPT;
                  saddr $IPv4_PRIVATE ACCEPT;
                } @else {
                  daddr $IPv6_PRIVATE ACCEPT;
                  saddr $IPv6_PRIVATE ACCEPT;
                }
                }
            }
            # PRE- and POST- ROUTING rules for probetest
            table nat {
                chain POSTROUTING {
                @if @eq($DOMAIN, ip) {
                  saddr "$IPv4_PRIVATE/24" outerface eth0 SNAT to $IPv4_WORLD random;
                } @else {
                  saddr "$IPv6_PRIVATE/7" outerface eth0 SNAT to $IPv6_WORLD random;
                }
                }
                chain PREROUTING {
                @if @eq($DOMAIN, ip) {
                  proto tcp dport 8443 interface eth0 DNAT to $IPv4_PRIVATE;
                } @else {
                  proto tcp dport 8443 interface eth0 DNAT to $IPv6_PRIVATE;
                }
                }
            }
	}

And after that just reboot to confirm everything works! It will easily be most stressful part unless one have console access to that machine.

To setup the actual nat type test tool:

Add URL based forwarding to nginx:

/etc/nginx/sites-available/https/nattypetest.conf

        location ~ ((probe)) {
            limit_req zone=snowflake burst=3;
            proxy_pass http://10.0.0.2:8081;
            proxy_http_version 1.1;
        }

Listen on 8443 for tls incoming traffic:

/etc/nginx/modules-enabled/99-000-tlsacmeapln.conf

stream {    
    map $ssl_preread_alpn_protocols $tls_address {
      ~\bacme-tls/1\b 127.0.0.1:10443;
      default 127.0.0.1:2443;
    }

    server {
      listen 443;
      listen [::]:443;
      listen 8443;
      listen [::]:8443;
      proxy_pass $tls_address;
      ssl_preread on;
      proxy_protocol on;
    }
    
    server {
      listen 127.0.0.1:10443 proxy_protocol;
      proxy_pass 127.0.0.1:11443;
      ssl_preread on;
      proxy_protocol off;
    }
}

then reload the nginx with nginx -s reload

copy the binary "probetest"(https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/tree/main/probetest?ref_type=heads) binary to /var/lib/probenattestd/probenattestd and chmod +x probenattestd to allow it be run directly.

then setup its systemd unit:

/etc/systemd/system/probeNatTestd.service

[Unit]
Description=Snowflake Nat Type Test Daemon
After=probeNatTestSetup.service
    
[Service]

# --addr 10.0.0.2:8081 is added to ensure should NetworkNamespacePath failed to apply the unit will fail. Do NOT remove it unless the the reason to add it is understood.
ExecStart=%S/probenattestd/probenattestd -disable-tls --addr 10.0.0.2:8081
WorkingDirectory=%S/probenattestd
RestartSec=5s
Restart=on-failure

StateDirectory=probenattestd
NetworkNamespacePath=/var/run/netns/net0
    
[Install]
WantedBy=default.target

and then enable and start it with systemctl enable probeNatTestd.service and systemctl start probeNatTestd.service.

WARNING: in systemd, the failure to apply NetworkNamespacePath=/var/run/netns/net0 is silent, and the service will run anyway without it being applied if systemd was unable to apply it without even a line of warning. This behaviour is against the best practice, and should be aware of when operating it. --addr 10.0.0.2:8081 is there to make sure the service will fail and make a noise when the NetworkNamespacePath is not applied, be considerate of this when changing it.

--addr 10.0.0.2:8081 is there to make sure the service will fail and make a noise when the NetworkNamespacePath is not applied, be considerate of this when changing it.

Okay, please add this information as a comment in the service file itself.

Comment was added to the unit file(updated in place above).

After a network configuration mishap today, I reinstalled the OS to begin the installation process from scratch. The IP address is the same but the SSH keys have changed.

3072 SHA256:I6oTsaWAQDSs7+q2a1PwWmJTA49D0L2udjZLfJmy0Lc (RSA)
256 SHA256:zITkF8SJbygbz5ytoyMj5vnK70mCL+bKYNsFuxZKPwg (ECDSA)
256 SHA256:WTAUcaFrT8WgzPt+ynAG/HjIoph/6VJJDn0TpqRcWrw (ED25519)

I did the installation instructions up to (but not including) the "Set up a firewall" step.

There are dcf and shelikhoo accounts with the same SSH authorized_keys and passwords as the existing broker.

Thanks! The misconfiguration is about systemd's dependency, I introduced a dependency circle between ferm.service, network namespace setup(probeNatTestSetup.service), sysinit.target, multi-user.target. I have find out the root issue with a local vm, and resume the process to deploy to the snowflake broker.

Server setup above replayed on reinstalled server. ACME auto renew was setup with online resource https://github.com/unknowndevQwQ/acme.sh-systemd:

[Unit]
Description=Renew certificates acquired via acme.sh
After=network.target network-online.target nss-lookup.target
Wants=network-online.target nss-lookup.target
Documentation=https://github.com/acmesh-official/acme.sh/wiki

[Service]
# If the version of systemd is 240 or above,  then uncommenting Type=simple and commenting out Type=exec
#Type=exec
Type=simple
# The --home argument should be the location of the acme.sh configuration directory.
# This is the user unit, by default there is no need to set the --home folder
#ExecStart=/usr/bin/acme.sh --cron --home %h/.acme.sh
ExecStart=%h/.acme.sh/acme.sh --cron
# acme.sh returns 2 when renewal is skipped (i.e. certs up to date)
SuccessExitStatus=0 2
Restart=on-failure

Why acme.sh, and not certbot, which will be managed automatically by the system package manager? It's fine if there's a good reason, but if not, we should optimize for low maintenance and reducing external dependencies.

certbot should also remove the need to maintain a separate autoReloadNginx.service.

certbot is not my primary choice here as it

depends on a external runtime(Python) and dependencies, and when managed by Debian package manager, will always be out of date, (acme.sh on the other hand only need a bash and curl command)
and is not designed to run as an underprivileged user with a least privilege setup: this setup is designed so that each component only need the permission it really need. In this setup the acme component is designed to run as an unprivileged user acmeworker.
The autoReloadNginx.service is there so that the certificate refresh process from privileged context does not need to be initiated by unprivileged user acmeworker.

I can go ahead and switch to certbot, but it will not be able to have the same least privilege setup to drop acme process's root privilege. Should I go ahead and do that?

My preference is for certbot. Generally I think we should not have curl | bash anywhere in the setup procedure. My position is that we should optimize for low maintenance and following standard procedures whenever possible. Every little thing that is custom, is something that not only we have to maintain documentation for, it's something that somebody will have spend time learning before they can deal with a problem on the server. Generally: try to reduce the number of components, and install the components in a standard way when possible. We don't want to spend our "weirdness budget" on mundane sysadmin stuff, save that for our own software. The situation we want to avoid is one where you have set up things in a way that you like personally, but for anyone else to work with it they need to find you to find out how it works.

Installation instructions:

# This is not supported by certbot developer
apt install certbot
certbot register -m shelikhoo@torproject.org

# maybe replace this domain with the production domain name if necessary
certbot certonly -d 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io

Look for

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/privkey.pem

And apply them to nginx.

Click to expand

root@snowflake-broker-40349:/etc/nginx# certbot certonly -d 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io ^C
root@snowflake-broker-40349:/etc/nginx# certbot register -m shelikhoo@torproject.org
Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Please read the Terms of Service at
https://letsencrypt.org/documents/LE-SA-v1.4-April-3-2024.pdf. You must agree in
order to register with the ACME server. Do you agree?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: y

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Would you be willing, once your first certificate is successfully issued, to
share your email address with the Electronic Frontier Foundation, a founding
partner of the Let's Encrypt project and the non-profit organization that
develops Certbot? We'd like to send you email about our work encrypting the web,
EFF news, campaigns, and ways to support digital freedom.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(Y)es/(N)o: n
Account registered.
root@snowflake-broker-40349:/etc/nginx# certbot certonly -d 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io 
Saving debug log to /var/log/letsencrypt/letsencrypt.log

How would you like to authenticate with the ACME CA?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: Spin up a temporary webserver (standalone)
2: Place files in webroot directory (webroot)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 1
Requesting a certificate for 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/privkey.pem
This certificate expires on 2024-12-29.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you like Certbot, please consider supporting our work by:
 * Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
 * Donating to EFF:                    https://eff.org/donate-le
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
root@snowflake-broker-40349:/etc/nginx# systemctl list-timers
NEXT                        LEFT        LAST                        PASSED        UNIT                         ACTIVATES                     
Mon 2024-09-30 22:57:44 UTC 8h left     Mon 2024-09-30 14:08:56 UTC 11min ago     apt-daily.timer              apt-daily.service
Tue 2024-10-01 00:00:00 UTC 9h left     Mon 2024-09-30 00:00:00 UTC 14h ago       dpkg-db-backup.timer         dpkg-db-backup.service
Tue 2024-10-01 00:00:00 UTC 9h left     Mon 2024-09-30 00:00:00 UTC 14h ago       logrotate.timer              logrotate.service
Tue 2024-10-01 00:01:00 UTC 9h left     Mon 2024-09-30 00:01:01 UTC 14h ago       autoReloadNginx.timer        autoReloadNginx.service
Tue 2024-10-01 00:14:59 UTC 9h left     Mon 2024-09-30 06:43:49 UTC 7h ago        man-db.timer                 man-db.service
Tue 2024-10-01 02:12:30 UTC 11h left    -                           -             certbot.timer                certbot.service
Tue 2024-10-01 06:13:06 UTC 15h left    Mon 2024-09-30 06:37:56 UTC 7h ago        apt-daily-upgrade.timer      apt-daily-upgrade.service
Tue 2024-10-01 06:25:00 UTC 16h left    Mon 2024-09-30 06:25:01 UTC 7h ago        ntpsec-rotate-stats.timer    ntpsec-rotate-stats.service
Tue 2024-10-01 14:01:09 UTC 23h left    Mon 2024-09-30 14:01:09 UTC 19min ago     etckeeper.timer              etckeeper.service
Tue 2024-10-01 14:01:09 UTC 23h left    Mon 2024-09-30 14:01:09 UTC 19min ago     systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Sun 2024-10-06 03:10:06 UTC 5 days left Sun 2024-09-29 03:10:56 UTC 1 day 11h ago e2scrub_all.timer            e2scrub_all.service
Mon 2024-10-07 00:51:07 UTC 6 days left Mon 2024-09-30 01:27:56 UTC 12h ago       fstrim.timer                 fstrim.service

12 timers listed.
Pass --all to see loaded but inactive timers, too.
root@snowflake-broker-40349:/etc/nginx# certbot renew --dry-run
Saving debug log to /var/log/letsencrypt/letsencrypt.log

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing
/etc/letsencrypt/renewal/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Account registered.
Simulating renewal of an existing certificate for 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Congratulations, all simulated renewals succeeded: 
  /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/fullchain.pem (success)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

acme.sh based acme disabled.

systemctl stop --user acme.sh.timer
systemctl disable --user acme.sh.timer

Nginx auto reload configured, it will reload new certificates every day:

autoReloadNginx.service

[Unit]
Description=Auto Reload Config for nginx

[Service]
Type=simple
ExecStart=/usr/sbin/nginx -s reload

autoReloadNginx.timer

[Unit]
Description=Run nginx reload daily

[Timer]
OnCalendar=*-*-* 00:01:00
Persistent=true

[Install]
WantedBy=timers.target

Updated ferm.conf

    # static public-facing ip addresses
    @def $IPv4_WORLD = 37.218.242.175;
    @def $IPv6_WORLD = 2a00:c6c0:0:151:4:ae99:c0a9:d585;

    # static private ip address
    @def $IPv4_PRIVATE = 10.0.0.2;
    @def $IPv6_PRIVATE = fc00::2;

	domain (ip ip6) {
	    table filter {
		chain INPUT {
                policy DROP;

                # connection tracking
                mod state state INVALID DROP;
                mod state state (ESTABLISHED RELATED) ACCEPT;

                # allow local packet
                interface lo ACCEPT;

                # respond to ping
                proto icmp ACCEPT; 

                # allow SSH connections
                proto tcp dport ssh ACCEPT;
		
	        # allow HTTP connections (for ACME HTTP-01 challenge)
	        proto tcp dport http ACCEPT;
                proto tcp dport https ACCEPT;


                # allow HTTPS-ALT connections
                proto tcp dport 8443 ACCEPT;
		}
                
                chain OUTPUT {
                policy ACCEPT;

                # connection tracking
                #mod state state INVALID DROP;
                mod state state (ESTABLISHED RELATED) ACCEPT;
                }
                chain FORWARD {
                policy DROP;

                # connection tracking
                mod state state INVALID DROP;
                mod state state (ESTABLISHED RELATED) ACCEPT;

                # forward packets to subnet
                @if @eq($DOMAIN, ip) {
                  daddr $IPv4_PRIVATE ACCEPT;
                  saddr $IPv4_PRIVATE ACCEPT;
                } @else {
                  daddr $IPv6_PRIVATE ACCEPT;
                  saddr $IPv6_PRIVATE ACCEPT;
                }
                }
            }
            # PRE- and POST- ROUTING rules for probetest
            table nat {
                chain POSTROUTING {
                @if @eq($DOMAIN, ip) {
                  saddr "$IPv4_PRIVATE/24" outerface eth0 SNAT to $IPv4_WORLD random;
                } @else {
                  saddr "$IPv6_PRIVATE/7" outerface eth0 SNAT to $IPv6_WORLD random;
                }
                }
                chain PREROUTING {
                @if @eq($DOMAIN, ip) {
                 # proto tcp dport 8443 interface eth0 DNAT to $IPv4_PRIVATE;
                } @else {
                 # proto tcp dport 8443 interface eth0 DNAT to $IPv6_PRIVATE;
                }
                }
            }
	}

Adjusted nginx config to use ip address supplied by proxy protocol:

/etc/nginx/conf.d/logFormat.conf

    log_format combined_useproxy '$proxy_protocol_addr - $remote_user [$time_local] '
                        '"$request" $status $body_bytes_sent '
                        '"$http_referer" "$http_user_agent"';

/etc/nginx/conf.d/rate_limit_zone.conf

limit_req_zone $proxy_protocol_addr zone=snowflake:10m rate=1r/s;

/etc/nginx/sites-enabled/https-site

server {
    listen 127.0.0.1:2443 ssl http2 proxy_protocol;

    ssl_certificate fullchain.pem;
    ssl_certificate_key priv.key;
    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;  # about 40000 sessions
    ssl_session_tickets off;

    # curl https://ssl-config.mozilla.org/ffdhe2048.txt > /path/to/dhparam
    ssl_dhparam /etc/nginx/ffdhe2048.txt;

    # intermediate configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305;
    ssl_prefer_server_ciphers off;

    # HSTS (ngx_http_headers_module is required) (63072000 seconds)
    add_header Strict-Transport-Security "max-age=63072000" always;

    # OCSP stapling
    ssl_stapling on;
    
    access_log /var/log/nginx/access.https.log combined_useproxy;
        
    include sites-available/https/*.rconf;
    include sites-available/https/*.conf;
}

Adjusted nginx config to use ip address supplied by proxy protocol:

/etc/nginx/conf.d/logFormat.conf

What do you mean "the IP address supplied by the proxy protocol"?

Nginx should be configured to log nothing at all. I don't see the reason for specifying a log format. This makes me nervous – is there a chance Nginx could log client or proxy IP addresses? We must take proactive steps to prevent such logging from happening.

Nginx can be configured to log IP address. I am configuring logging here to verify the ip based rate-limiting is working. I will turn it off. Sorry I was a little too familiar with nginx and just followed routine procedure without thinking about the specific problem at hand...

I was think if the error log should be turned off as well. They will not remember anything from client during normal workflow, and could only be turned off globally.

Example Error log

, server: , request: "GET /blog/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:25 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/workspace/drupal/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /workspace/drupal/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:25 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/panel/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /panel/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:25 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/public/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /public/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:26 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/apps/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /apps/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:26 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/app/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /app/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:27 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /index.php?s=/index/\think\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=Hello HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:28 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/public/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /public/index.php?s=/index/\think\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=Hello HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:28 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /index.php?lang=../../../../../../../../usr/local/lib/php/pearcmd&+config-create+/&/<?echo(md5("hi"));?>+/tmp/index1.php HTTP/1.1", host: "37.218.242.175:443"
2024/09/26 10:57:28 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /index.php?lang=../../../../../../../../tmp/index1 HTTP/1.1", host: "37.218.242.175:443"

access_log off; added to /etc/nginx/nginx.conf and /etc/nginx/sites-enabled/https-site

I was think if the error log should be turned off as well.

Yes, as a precaution we should disable the nginx error log, until we think of a better way to handle it.

Replaced default error logging instruction with: error_log /dev/null;

Error logs are no longer recorded.

As a sanity check with regard to logging, I recommend running a comprehensive port scan against the host, and then grepping /var/log and journalctl to see if the source IP address of the scan appears anywhere. It's OK if it appears in SSH logs.

sudo nmap -v --min-rate 50 -sSUV -p- -oN snowflake-broker2.nmap 37.218.242.175

This will also uncover if there are services exposed to the Internet that you don't expect.

mentioned in issue tpo/tpa/team#41768 (closed)

As a final artifact of this process, I would like you to take all the notes you have taken and combine them into a single document, like Snowflake-Broker-Installation-Guide. You can overwrite the Snowflake-Broker-Installation-Guide wiki page directly, or put it on some other draft wiki page. The idea is to have a single set of instructions that anyone can follow in order to reinstall the broker or do maintenance on it. It will then become the reference documentation that we edit and maintain going forward.

Thanks for the suggestion. I am writing such a document and will update it to the wiki once it is finished.

(Yes, that gist is an improvised pad reversion system....)

The guide have been updated.

Domain name updated to: snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net

CAA record testing finished without incident.

Certificate is saved at: /etc/letsencrypt/live/snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net/privkey.pem

I'm trying the new broker. I set up a proxy to use it:

./proxy -nat-probe-server https://snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net:8443/probe -broker https://snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net/  -verbose

And use the following bridgeline:

snowflake 192.0.2.3:80 2B280B23E1107BB62ABFC40DDCC8824814F80A72 fingerprint=2B280B23E1107BB62ABFC40DDCC8824814F80A72 url=https://snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net/ ice=stun:stun.l.google.com:19302,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.com:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478 utls-imitate=hellorandomizedalpn

Everything seems to work fine running it from console, I see it connecting to my proxy. If I try this line in TB it works, but I never manage to connect to my proxy, I think is using the main broker, I guess there is something pinned there.

I was unable to reproduce the error you are seeing that you cannot connect to your proxy with this torrc file:

UseBridges 1
DataDirectory datadir

ClientTransportPlugin snowflake exec ./client -log snowflake.log

Bridge snowflake 192.0.2.3:80 2B280B23E1107BB62ABFC40DDCC8824814F80A72 fingerprint=2B280B23E1107BB62ABFC40DDCC8824814F80A72 url=https://snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net/ ice=stun:stun.l.google.com:19302,stun:stun.antisip.com:3478,stun:stun.bluesip.net:3478,stun:stun.dus.net:3478,stun:stun.epygi.com:3478,stun:stun.sonetel.com:3478,stun:stun.uls.co.za:3478,stun:stun.voipgate.com:3478,stun:stun.voys.nl:3478 utls-imitate=hellorandomizedalpn


SocksPort auto

I can see my client connected to the proxy I was running with the exact commands you have provided. Would you mind try again with tor command line version to see if the unexpected behaviour can be removed?

Sorry, I see I didn't explain myself correctly. It did work fine like you did running tor manually from command line. But I didn't manage to connect to it in TB, I assume somehow the broker is pinned in TB.

This issue has been waiting for information two weeks or more. It needs attention. Please take care of this before the end of 2024-11-07. ~"Needs Information" tickets will be moved to the Icebox after that point.

(Any ticket left in Needs Review, Needs Information, Next, or Doing without activity for 14 days gets such notifications. Make a comment describing the current state of this ticket and remove the Stale label to fix this.)

To make the bot ignore this ticket, add the bot-ignore label.

added Stale label

removed Stale label

changed due date to November 07, 2024

added Project 146 label

changed milestone to %Pluggable transports and bridges are reliable, resilient, diverse, and scalable

I have installed prometheus-node-exporter unattended-upgrades man-db screen rsync and there is no more blockers for swap brokers. Wiki was updated with the actual deployment step.

@dcf Please go ahead and adjust resource allocation for the old and new broker machine. Do NOT shutdown the old machine yet. Pending change: snowflake-webext!85 (merged)

The new broker host is restarted with the configuration 32 GiB / 8 CPU cores.

Because of the recent eclips.is / Greenhost changes, the web interface does not hard-limit me to a maximum amount of resources. I did not have to reduce the resource allocation of the current broker (or even restart it) in order to increase the resource allocation of the new broker. So both brokers are currently running with 32 GiB / 8 CPU cores. Of course, we'll want to shut down the current broker soon, so that we are not using double the resources for long.

My procedure:

"Console" tab
- 2024-11-12 19:46:17 Click "Turn off" button.
"Configuration" tab
- Click "Update VPS" button
- Select 32 GiB / 8 CPU cores
"Console" tab
- 2024-11-12 19:48:28 Click "Turn on" button.
I was able to log in again with SSH at 2024-11-12 19:49:18.

mentioned in issue #40412 (closed)

marked this issue as related to #40412 (closed)

I am ready to make the final adjustment, and following are the action I will take:

copy certificates in /home/snowflake-broker/acme-cert-cache for a smooth transition. and configure nginx to use it.
request TPA to switch domain record
request certificate from acme, and configure nginx to use these new certificate.
deploy sqs on new server.

SQS AWS credential copied.
TLS certificate copied.
acme setup over
sqs deployed

mentioned in issue tpo/tpa/team#41878 (closed)

@shelikhoo I think the survival guide needs updating: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Broker-Survival-Guide

And will be nice to give access to the broker to everybody in the team.

Yes, I will update the survival guide now.

Looks like the new broker didn't update it's metrics yesterday: https://snowflake-broker.torproject.net/metrics

The prometheus metrics show that the client polls plummeted around 16:30 UTC yesterday and they haven't recovered. This would coincide with the DNS change: tpo/tpa/team#41878 (comment 3134084)

I wonder if our prometheus server is still polling metrics from the old broker exporter and not the new one. As this graph is still flat, but looking at the actual exporter it looks like clients are connecting. Just russia has ~550k connections since monday:

snowflake_rounded_client_poll_total{cc="RU",nat="restricted",rendezvous_method="http",status="denied"} 3944
snowflake_rounded_client_poll_total{cc="RU",nat="restricted",rendezvous_method="http",status="matched"} 62840
snowflake_rounded_client_poll_total{cc="RU",nat="unknown",rendezvous_method="http",status="denied"} 8928
snowflake_rounded_client_poll_total{cc="RU",nat="unknown",rendezvous_method="http",status="matched"} 153656
snowflake_rounded_client_poll_total{cc="RU",nat="unknown",rendezvous_method="sqs",status="denied"} 8
snowflake_rounded_client_poll_total{cc="RU",nat="unknown",rendezvous_method="sqs",status="matched"} 16
snowflake_rounded_client_poll_total{cc="RU",nat="unrestricted",rendezvous_method="http",status="matched"} 344312

I created an issue on TPA side: tpo/tpa/team#41902 (closed)

Makes sense. I just checked the bridge metrics and I don't see a drop in usage, so that's probably it.

This is being fixed, now we see the new data:

We had a spike in 5XX errors from our CDN77 account starting some time between 11 and 12 UTC yesterday. Almost 80% of all requests are failing.

The number of requests seen from our CDN account did not drop with the number of client polls, so I'm guessing that's where the problem is/was (instead they spiked significantly, which is what I would expect if this is where the problem is).

Our CDN resource is configured to connect to the domain snowflake-broker.torproject.net. I wonder if some of the data centres had trouble with the DNS change. I just re-added the broker URL to try and force an update. It's probably too late to check today but we'll see if it did anything tomorrow.

I have update the max open file limit in systemd and nginx settings. I will keep observing the 5xx rate.

We've received some reports from snowflake proxy operators on IRC that they previously had an unrestricted NAT type, but are now failing to open their probetest data channel and are classified as restricted. Looking at the metrics posted 2024-11-27 12:43:50 for our new broker machine: https://snowflake-broker.torproject.net/metrics

snowflake-stats-end 2024-11-27 12:43:50 (86400 s)
snowflake-ips US=16,DE=15,CH=6,GB=6,??=4,CN=4,JP=4,FR=3,AR=2,CA=2,ES=2,NL=2,AE=1,AT=1,AU=1,BR=1,BY=1,EG=1,GI=1,IN=1,IT=1,OM=1,SE=1,TW=1
snowflake-ips-iptproxy 1
snowflake-ips-standalone 1
snowflake-ips-webext 75
snowflake-ips-badge 1
snowflake-ips-total 78
snowflake-idle-count 42932424
snowflake-proxy-poll-with-relay-url-count 45397176
snowflake-proxy-poll-without-relay-url-count 403032
snowflake-proxy-rejected-for-relay-url-count 403032
client-denied-count 73872
client-restricted-denied-count 73872
client-unrestricted-denied-count 0
client-snowflake-match-count 1941088
client-http-count 1590216
client-http-ips RU=684576,CN=261120,IR=247072,US=98688,DE=36544,NL=23664,PK=22832,BR=18400,FR=18064,GB=15816,IN=14416,PL=9600,SA=9032,CA=7632,EG=6904,AU=6440,ES=5480,IT=4960,HK=4832,BY=4616,TR=4472,UA=3800,MX=3528,JP=3520,CZ=2920,FI=2848,SE=2816,ID=2656,CH=2208,AE=2192,AT=2168,BE=2160,TH=2144,RO=1824,ZA=1784,KZ=1688,CO=1680,JO=1664,BD=1288,NG=1272,KE=1264,AR=1200,NO=1192,IL=1144,CL=1144,TW=1136,SG=1112,KR=1080,HU=1056,NZ=1056,BG=1040,PT=1024,LT=968,CM=936,LV=928,IE=912,PH=888,DK=872,DZ=840,MD=832,NP=744,AZ=720,MY=696,GR=688,MM=688,MZ=640,IQ=624,TN=624,SY=616,OM=600,VN=584,PE=568,RS=552,SK=504,GE=464,LU=456,EE=448,MA=424,TZ=408,VE=392,UZ=368,PR=360,IS=360,UG=344,GH=344,CR=336,CI=328,EC=296,ET=296,HR=272,AM=256,LY=256,GT=248,QA=240,PA=240,SI=224,LK=216,DO=216,PY=208,LB=200,KH=168,KW=160,SD=152,BA=136,CU=136,ML=128,SV=128,TJ=128,AL=120,BH=120,EU=112,AP=112,MO=104,MW=104,BO=104,ZM=104,MU=96,RW=96,SN=88,KG=80,BZ=80,??=80,IM=72,GQ=64,BW=64,ZW=64,CG=56,ME=48,MV=48,JM=48,CY=40,NC=40,AF=40,HN=40,MT=40,NI=40,FM=40,TT=40,MK=32,AO=24,BS=24,UY=24,GL=24,GY=24,TD=24,TM=24,MG=24,YE=24,HT=16,BN=16,NA=16,BT=16,GP=16,MN=16,PS=16,RE=16,BM=16,GA=16,SC=16,TG=16,BF=8,BJ=8,SO=8,AW=8,LA=8,NF=8,SR=8,AD=8,BB=8,FJ=8,GD=8,LI=8,PG=8
client-ampcache-count 424680
client-ampcache-ips ??=424064,US=616
client-sqs-count 64
client-sqs-ips IR=32,RU=24,NL=8,??=8,US=8
snowflake-ips-nat-restricted 22
snowflake-ips-nat-unrestricted 0
snowflake-ips-nat-unknown 54

This shows a low number of proxies in general, and zero unrestricted proxies.

Somewhat related, our prometheus metrics, which we believe to be coming from the old broker machine still, show that while client polls have dropped off almost completely, unrestricted proxy polls have only slightly dropped

I wonder if there is a similar problem going on with (standalone) proxies as with the prometheus scraper, where proxy operators need to restart in order to re-resolve the domain name to the new IP address? That doesn't explain the probetest failures, just the low number of unrestricted proxies.

This shows a low number of proxies in general, and zero unrestricted proxies.

These number can't be right can they?? 76 proxies could not have possibly resulted in snowflake-proxy-poll-with-relay-url-count 45397176, that would be 45397176 / 76 proxies / 24 hours / 60 minutes = ~414 polls per minute per proxy.

Also client-snowflake-match-count 1941088 is much higher than client-denied-count 73872

Could it be that util.GetClientIp() in proxyPolls is not working right? Maybe nginx doesn't set the headers in a way that we expect?

Update: hmmmm, OTOH client counts are sane, but IIRC the same code is used for client country stats.

https://snowflake-broker.torproject.net/prometheus snowflake_proxy_total numbers are also unnaturally low.

Update: hmmmm, OTOH client counts are sane, but IIRC the same code is used for client country stats.

On the third hand, clients contact the broker in a different, indirect way, as opposet to proxies...

After a prometheus restart, we're pulling metrics from the new broker, and sure enough unrestricted proxy polls have dropped to close to zero, which seems to indicate that most of them are still polling the old broker

Looks like it is standalone proxies that are disproportionately affected:

When prometheus was restarted to pull metrics from the new broker, the number of currently available standalone proxies dropped while all others rose from zero.

I can confirm that theory. I have a standalone proxy that I haven't restarted since before the switch and I see it connected to the old broker but not the new:

root@stuart:/opt# ss |grep 37.218.245.111
tcp   ESTAB      0      0                                                                              192.168.1.7:56048         37.218.245.111:https
root@stuart:/opt# ss |grep 37.218.242.175
root@stuart:/opt#

Should we send an email/forum post to encourage standalone proxy operators to restart?

I do wonder whether there is still a probetest issue, but that might be difficult to tell until we get more proxies to switch to the new broker.

Yes, I think we should poke operators.

Sent an email to tor-relays@ yesterday: https://lists.torproject.org/mailman3/hyperkitty/list/tor-relays@lists.torproject.org/thread/BOXJQSH4QHWRD6ZQL4DMSUN7RLGYD7EG/

The number of standalone proxies and unrestricted proxies is very slowly recovering. We seem to have enough proxy capacity at the moment.

There's still something up with the static metrics:

snowflake-stats-end 2024-11-28 14:37:46 (86400 s)
snowflake-ips US=18,DE=17,CH=8,CN=5,??=4,GB=4,FR=3,AT=2,BR=2,CA=2,ES=2,IT=2,JP=2,AR=1,AU=1,DZ=1,ID=1,OM=1,SE=1,SG=1,TW=1
snowflake-ips-standalone 1
snowflake-ips-webext 76
snowflake-ips-badge 1
snowflake-ips-iptproxy 1
snowflake-ips-total 79
snowflake-idle-count 48465632
snowflake-proxy-poll-with-relay-url-count 50875736
snowflake-proxy-poll-without-relay-url-count 541224
snowflake-proxy-rejected-for-relay-url-count 541224
client-denied-count 135272
client-restricted-denied-count 135272
client-unrestricted-denied-count 0
client-snowflake-match-count 1955736
client-http-count 1644064
client-http-ips RU=731752,CN=253408,IR=251904,US=96832,DE=38528,NL=23856,FR=22848,PK=18048,BR=16400,GB=16336,IN=15048,PL=11400,SA=9664,CA=8472,HK=8232,EG=6688,IT=5792,ES=5480,BY=5288,AU=4864,TR=4600,UA=4216,JP=3664,MX=3208,AT=3080,FI=3032,CH=3016,CZ=2648,SE=2496,ID=2384,AE=2336,KE=2136,ZA=2032,TH=1904,RO=1752,BE=1688,BD=1688,TW=1576,JO=1560,SG=1528,NO=1440,AR=1432,CO=1408,IE=1368,LV=1336,NG=1312,NZ=1232,BG=1232,CM=1176,LT=1128,DZ=1096,PH=1096,CL=1072,KR=1016,IL=992,KZ=968,HU=952,NP=880,MM=840,IQ=824,PT=792,SY=760,RS=720,MD=688,SK=672,PE=592,AZ=576,HR=552,MY=544,OM=544,EE=536,UG=528,DK=520,GR=504,VN=496,VE=464,CI=464,TN=464,UZ=448,MA=440,MZ=432,PR=384,EC=344,KH=312,GE=296,CR=288,BA=288,LU=280,SI=280,EU=272,GH=240,LK=232,LB=216,LY=200,AF=200,TZ=192,DO=192,BW=184,BO=184,BH=168,BS=168,MO=144,GT=144,QA=144,AM=144,JM=136,PA=136,IS=128,TT=112,KW=112,MU=104,ZW=104,SV=104,AP=96,CD=96,PY=96,KG=88,RW=88,AL=88,BN=88,SO=72,ET=72,ML=72,BZ=64,SN=64,CU=64,MV=64,SC=64,MT=64,TM=56,HT=56,NC=56,BJ=56,NF=48,CV=48,??=48,NA=48,SD=48,ME=48,NI=40,MN=40,ZM=40,PG=32,FM=32,CY=24,SZ=24,PS=24,RE=24,VC=24,GN=24,IM=24,MK=24,GQ=16,AO=16,WS=16,HN=16,MW=16,GP=16,NE=16,TG=16,AD=8,TD=8,UY=8,LA=8,MR=8,MG=8,GD=8,PF=8,SS=8,SL=8,LI=8,MC=8,SB=8,TJ=8,YE=8,BF=8,GI=8,LC=8,LR=8
client-ampcache-count 446904
client-ampcache-ips ??=445920,US=992
client-sqs-count 40
client-sqs-ips IR=32,RU=8,US=8
snowflake-ips-nat-restricted 20
snowflake-ips-nat-unrestricted 1
snowflake-ips-nat-unknown 56

I wonder if this because we switched to using a reverse proxy and aren't forwarding the proxy IP address correctly

@shelikhoo in the nginx config, do we need something like

        location ~ ((proxy)|(client)|(answer)|(metrics)|(prometheus)|(amp/client/.*)|(robots.txt)) {
            limit_req zone=snowflake burst=3;
            proxy_pass http://127.0.0.1:8080;
            proxy_http_version 1.1;
            proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
        }

I'm guessing the reason the client IP addresses aren't affected is because they already have an X-Forwarded-For header from the CDN, and the reason we have more than 1 proxy is because some proxies may be behind an additional proxy layer that is adding it.

Thanks for the suggestion, I have adjusted the nginx config to:

        location ~ ((proxy)|(client)|(answer)|(metrics)|(prometheus)|(amp/client/.*)|(robots.txt)) {
            limit_req zone=snowflake burst=30;
            proxy_pass http://127.0.0.1:8080;
            proxy_http_version 1.1;
            proxy_read_timeout 300;
            proxy_connect_timeout 300;
            proxy_send_timeout 300;

            proxy_set_header  X-Forwarded-For $http_x_forwarded_for_new_value;
        }

and

            map $http_x_forwarded_for $http_x_forwarded_for_new_value {
            default "$http_x_forwarded_for, $proxy_protocol_addr";
            "" $proxy_protocol_addr;
            }

Let's observe if the situation will improve.

It is improving, at least in this regard. https://snowflake-broker.torproject.net/prometheus shows sane snowflake_proxy_total now.

Updated the installation guide: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Broker-Installation-Guide#setup-nginx

The static metrics look good now:

snowflake-stats-end 2024-11-29 14:37:46 (86400 s)
snowflake-ips DE=32847,US=26824,IR=22153,IN=11919,BR=5154,RU=5104,GB=3912,FR=3862,NL=3347,PK=2961,CA=2616,CH=2349,IT=2101,AT=1693,MX=1580,BD=1530,JO=1341,AU=1331,ES=1326,PL=1307,JP=1241,SE=1167,ID=861,BE=855,UA=838,TR=830,EG=776,CO=723,NG=683,CZ=677,DZ=637,FI=571,SG=538,RO=512,MZ=507,MU=490,KE=459,MY=450,AR=426,VN=417,PT=383,CN=380,DK=380,NO=380,PH=379,TH=379,ZA=362,IE=353,MA=348,AE=345,GR=344,YE=316,CL=295,IQ=293,EU=292,KR=283,HU=273,NP=252,HK=246,KG=232,TM=231,NZ=228,RS=221,VE=215,TZ=212,SK=194,TW=194,MM=190,IL=181,BG=180,PE=175,LY=174,LU=171,EC=165,DO=158,ET=158,SN=153,OM=150,LK=149,LT=148,HR=146,KZ=144,TN=143,SD=118,UG=116,EE=112,UZ=103,BY=100,PR=97,CM=91,CI=90,CD=87,NI=73,LV=72,CU=67,PY=67,UY=67,GH=66,MD=65,GE=59,QA=59,SY=58,SA=56,ML=52,CR=49,GA=48,HN=47,PA=47,KH=46,LB=46,IS=43,SI=43,RW=41,SO=41,AL=40,BH=40,GT=39,HT=39,AM=35,KW=35,BO=33,MT=33,RE=31,TG=30,AF=29,BA=29,BZ=28,SV=27,MN=26,ME=24,MV=24,BF=23,AZ=22,CY=22,??=20,NA=20,ZW=20,AO=18,MK=18,ZM=16,BJ=15,JM=15,AP=14,LI=14,PS=14,DJ=13,MO=13,MW=12,BM=11,MG=11,MR=11,NC=9,SS=9,GN=8,PG=8,TJ=7,TT=7,BT=6,LA=6,NE=6,VU=6,AW=5,BB=5,DM=5,GM=5,LC=5,AD=4,BW=4,CG=4,GY=4,JE=4,LS=4,MC=4,NF=4,AG=3,BS=3,GP=3,GQ=3,IM=3,KY=3,MQ=3,SC=3,SL=3,VI=3,CV=2,FJ=2,GI=2,SZ=2,BI=1,BN=1,CF=1,GF=1,GG=1,GL=1,GU=1,KM=1,PF=1,TD=1
snowflake-ips-badge 563
snowflake-ips-iptproxy 94571
snowflake-ips-standalone 3012
snowflake-ips-webext 64531
snowflake-ips-total 162677
snowflake-idle-count 52288232
snowflake-proxy-poll-with-relay-url-count 54641176
snowflake-proxy-poll-without-relay-url-count 637824
snowflake-proxy-rejected-for-relay-url-count 637824
client-denied-count 392
client-restricted-denied-count 392
client-unrestricted-denied-count 0
client-snowflake-match-count 1972904
client-http-count 1605488
client-http-ips RU=702904,CN=216216,IR=208008,DE=116296,US=107208,NL=25264,FR=22640,GB=16896,BR=16320,IN=14472,PK=11768,PL=11376,SA=8456,CA=7112,EG=6824,IT=5520,ES=5472,BY=5168,AU=5000,HK=4992,TR=3744,MX=3720,UA=3448,JP=3224,SE=3088,AT=2904,FI=2696,CZ=2464,CH=2432,AE=2376,ID=2240,NG=2216,RO=1856,BD=1800,ZA=1784,SG=1680,JO=1616,NO=1584,TH=1432,CM=1416,BE=1256,TW=1192,KE=1184,HU=1168,IE=1152,AZ=1144,CL=1056,IL=1032,BG=896,CR=888,KZ=888,KR=888,LT=880,RS=880,CO=872,IQ=848,AR=832,UZ=800,DZ=792,MY=792,MM=792,LV=760,PE=648,DK=648,PH=640,VN=632,NZ=624,EE=600,SY=592,PT=568,OM=568,MD=560,PR=560,MZ=496,GR=464,IS=448,SK=440,DO=424,NP=416,VE=408,HR=408,GQ=408,CI=400,GE=392,TN=376,KH=344,GT=328,EC=320,UG=312,MA=304,BA=296,PA=288,LK=272,GH=272,ZM=264,CY=216,LU=216,AL=192,LY=168,KW=160,AP=152,SD=144,AM=136,SV=136,AO=128,BO=120,NC=120,BW=112,HT=112,ET=112,RE=112,EU=96,BH=96,QA=96,ZW=96,PY=96,LB=88,KG=88,SI=88,TZ=80,UY=80,MO=72,JM=64,AF=64,SN=64,CV=56,MG=56,BJ=48,CU=48,IM=48,TT=40,TM=40,MN=40,SO=40,MR=40,FM=32,MT=32,RW=32,HN=32,ME=32,TJ=32,AD=32,CW=32,MK=32,ML=32,MV=32,YE=32,MU=32,NA=32,TG=24,GG=24,GM=24,??=16,PS=16,CD=16,GA=16,NE=16,BZ=16,SC=16,VC=16,BS=16,GP=16,CG=16,GL=8,SZ=8,GI=8,LA=8,NI=8,SS=8,BN=8,GD=8,GN=8,MC=8,TD=8,GY=8,SX=8,GF=8
client-ampcache-count 367744
client-ampcache-ips US=344176,??=23544,DE=32
client-sqs-count 64
client-sqs-ips IR=56,US=8,CN=8
snowflake-ips-nat-restricted 97273
snowflake-ips-nat-unrestricted 2370
snowflake-ips-nat-unknown 62213

mentioned in issue #40419

The update of snowflake broker is finished. There are 2 remaining issues that will be tracked separately.

Gracefully decommission of the old broker machine #40412 (closed)
verify new deployment's metrics is working as expected

Is there an issue for the deployment metrics?

Can we close this issue?

marked this issue as related to #40427 (closed)

mentioned in issue #40428 (closed)

mentioned in issue #40429 (closed)

This issue has been waiting for information two weeks or more. It needs attention. Please take care of this before the end of 2025-01-10. Needs Information tickets will be moved to the Icebox after that point.

(Any ticket left in Needs Review, Needs Information, Next, or Doing without activity for 14 days gets such notifications. Make a comment describing the current state of this ticket and remove the Stale label to fix this.)

To make the bot ignore this ticket, add the bot-ignore label.

added Stale label

added Backlog label and removed Doing label

The new broker host is restarted with the configuration 32 GiB / 8 CPU cores.

For cost reasons, I have reduced the configuration to 8 GiB / 4 CPU cores. That is the largest configuration that fits in the EUR 50 / month credit. This change occurred at 2025-02-04 17:42:52.

https://lists.torproject.org/mailman3/hyperkitty/list/anti-censorship-team@lists.torproject.org/message/3WM7MGHEO2JHPBYYXE4QMPAMOHV67VLY/

mentioned in issue #40363 (closed)

added Next label and removed Backlog label

We have already finished updating the broker. Let's open a new ticket when an adjustment is needed.

The document has been updated already here: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Broker-Survival-Guide https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Broker-Installation-Guide

closed

Upgrade snowflake broker machine from Debian 10

Designs

Child items 0

Activity