The root account has my and @shelikhoo's public keys
copied from the existing broker in the authorized_keys file.
Other that than I didn't do any configuration.
ssh -i ~/.ssh/broker-key root@37.218.242.175
I had to configure the VPS with less RAM and CPU than the existing broker,
because of resource limits on the account.
What we can do, is get the new VPS all installed and configured,
then restart the old broker with less resources and restart the new one with more,
when it's time to do the migration.
We have a resource limit of 220.0 units.
The current broker uses 184.1 units:
32 GB
8 CPU cores
10 GiB disk
That leaves us with 35.9 units.
This is what I was able to provision under that limit,
costing 26.2 units:
4 GB
2 CPU cores
20 GiB disk
The RAM and CPU allocation is easy to change, but it requires powering off the VPS.
I have finished setting up nginx, acme and https forwarder on the machine. This is what I did:
# Firstly, install nginx with stream pluginapt install nginx libnginx-mod-stream# Setup APLN based tls acme: no need to worry about what to put on the http site anymore# COPY common/conf/alpn_proxy_nginx /etc/nginx/modules-enabled/99-000-tlsacmeapln.conf# Install curl and acmeapt install curluseradd -m acmeworkersudo -u acmeworker bashcurl https://get.acme.sh | sh -s email=shelikhoo@torproject.org# The domain name is only used for testing, and only letsencrypt supports alpn based domain validation./acme.sh --issue --domain 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io --server letsencrypt --alpn --tlsport 11443# and finally restart nginxcurl https://ssl-config.mozilla.org/ffdhe2048.txt > /etc/nginx/ffdhe2048.txtsystemctl enable nginxsystemctl start nginxnginx -s reload
I have finished working on initial deployment of broker and basic testing of it with client and standalone proxy. The SQS and amp cache signaling channel is not tested yet.
# Create a user just for webappsuseradd -m webapp# install additional packages required for the environment; if a different slim installion was used, it may require different set of packages to be installed manually. Please do not lost hope when systemd failed in wired way and the error message yield no result with search engineapt install systemd-container libpam-systemd# sudo is not sufficient here as we are interacting systemd and there is many environment varibles needs to be set correctlymachinectl shell --uid=webapp# Copy binaries to .config/broker# Copy broker.service to .config/systemd/user# Fetch geoip filescurl https://archive.torproject.org/tor-package-archive/torbrowser/13.0.14/tor-expert-bundle-linux-x86_64-13.0.14.tar.gz > tor-expert-bundle-linux-x86_64-13.0.14.tar.gztar -xzvf tor-expert-bundle-linux-x86_64-13.0.14.tar.gz# Copy data files to .config/brokersystemctl enable --user broker.servicesystemctl start --user broker.service# return to root shell# Copy broker.conf to /etc/nginx/sites-available/httpsnginx -s reload
# Fetch geoip filescurl https://archive.torproject.org/tor-package-archive/torbrowser/13.0.14/tor-expert-bundle-linux-x86_64-13.0.14.tar.gz > tor-expert-bundle-linux-x86_64-13.0.14.tar.gztar -xzvf tor-expert-bundle-linux-x86_64-13.0.14.tar.gz# Copy data files to .config/broker
IMO the distribution tor-geoipdb package should be preferred
to a one-time download of a tarball.
If it's not automated, the geoipdb will never be updated, practically speaking.
# From tor's official setup guide https://support.torproject.org/apt/curl https://deb.torproject.org/torproject.org/A3C4F0F979CAA22CDBA8F512EE8CBC9E886DDD89.asc | gpg --dearmor | tee /usr/share/keyrings/deb.torproject.org-keyring.gpg >/dev/nullapt install tor-geoipdb deb.torproject.org-keyring# Although we only selected tor-geoipdb, a tor daemon is also installed at the same time. Let's disable it systemctl disable tor.servicesystemctl mask tor.service
This issue has been waiting for information two
weeks or more. It needs attention. Please take care of
this before the end of
2024-06-07. ~"Needs
Information" tickets will be moved to the Icebox after
that point.
(Any ticket left in Needs Review, Needs Information, Next, or Doing
without activity for 14 days gets such
notifications. Make a comment describing the current state
of this ticket and remove the Stale label to fix this.)
To make the bot ignore this ticket, add the bot-ignore label.
Tell me when you want me to swap the RAM/CPU allocation of the old and new broker. (To give the new broker more CPU and RAM and the old broker less.) The old broker can continue running until you're ready to do the actual DNS change.
For Let's Encrypt autocert, you will need to either (1) copy the Let's Encrypt account credentials from the old broker to the new, or (2) ask the admin team to add a DNS CAA record that permits the Let's Encrypt account on the new broker to get new certificates.
The broker will automatically acquire a TLS certificate for the names given in --acme-hostnames the first time each name is accessed. If you use a subdomain of torproject.net, then you will need to get in touch with the Tor sysadmin team and ask to have a CAA DNS record created that authorizes a certain Let's Encrypt account to get certificates for that domain. See tpo/tpa/team#41462 (closed). You can use the autocert-account-id program to find the name of the account created in the /home/snowflake-broker/acme-cert-cache directory.
I have updated the nginx with rate limit settings(See also #40329 (closed)), this setting would be effective as soon as we switch to it to be the primary instance.
The exact limitation is adjustable, and one currently shown is not considered final. Currently we have no idea how common is NAT gateway.
@dcf Could you please remind me about the IPv6 address of the new broker? I was unable to locate it with ip address command. It is necessary to setup the nat type testing probetest and enable IPv6 support for the broker.
There are instructions for assigning an IPv6 address in the
installation guide.
But for this host you must use the prefix 2a00:c6c0:0:151:4::/80,
not 2a00:c6c0:0:154:4::/80.
Set up an IPv6 address. You can use any address in the 2a00:c6c0:0:151:4::/80 prefix.
root# python -c 'import os; print ":".join(os.urandom(2).encode("hex") for _ in range(3))'d8aa:b4e6:c89froot# vi /etc/network/interfaces iface eth0 inet6 static address 2a00:c6c0:0:154:4:d8aa:b4e6:c89f netmask 64 gateway 2a00:c6c0:0:154::1root# etckeeper commit "Add IPv6 address."root# reboot
Porting that little script to python3 is not too hard:
python3 -c 'import os; print(":".join(os.urandom(2).hex() for _ in range(3)))'
I am not sure about the subnet thing.
eclips.is support told me that we were "assigned" the IPv6 block 2a00:c6c0:0:151:4::/80 for all out instances.
The example configuration they gave me had netmask 64.
The /80 might have to do with their internal accounting, or something.
The configuration looks fine.
I think what actually happened here is that each client are allocated/reserved /80 subnet for that client, and in network configuration: /64 network are the address of neighbors(reachable with broadcast message). So they are actually not the same thing.
As for the python... Yeah, it is not hard to port...
This issue has been waiting for information two
weeks or more. It needs attention. Please take care of
this before the end of
2024-07-09. ~"Needs
Information" tickets will be moved to the Icebox after
that point.
(Any ticket left in Needs Review, Needs Information, Next, or Doing
without activity for 14 days gets such
notifications. Make a comment describing the current state
of this ticket and remove the Stale label to fix this.)
To make the bot ignore this ticket, add the bot-ignore label.
This issue has been waiting for information two
weeks or more. It needs attention. Please take care of
this before the end of
2024-07-25. ~"Needs
Information" tickets will be moved to the Icebox after
that point.
(Any ticket left in Needs Review, Needs Information, Next, or Doing
without activity for 14 days gets such
notifications. Make a comment describing the current state
of this ticket and remove the Stale label to fix this.)
To make the bot ignore this ticket, add the bot-ignore label.
I am currently setting up the Nat Type Test Helper(It is also named probetest, but I wish it could be renamed.)
The network namespace setup script was installed first:
/var/lib/probenattest/init-netns.sh
#!/bin/baship netns add net0ip link add veth-a type veth peer name veth-bip link set veth-a netns net0ip netns exec net0 ip link set lo upip netns exec net0 ip address add 10.0.0.2/24 dev veth-aip netns exec net0 ip address add fc00::2/7 dev veth-aip netns exec net0 ip link set veth-a upip address add 10.0.0.1/24 dev veth-bip address add fc00::1/7 dev veth-bip link set veth-b upip netns exec net0 ip route add default via 10.0.0.1 dev veth-aip netns exec net0 ip route add default via fc00::1 dev veth-amkdir -p /etc/netns/net0/ln -sf /etc/resolv.conf /etc/netns/net0/ln -sf /etc/hosts /etc/netns/net0/echo 1 > /proc/sys/net/ipv4/ip_forwardsysctl -w net.ipv6.conf.all.forwarding=1
and the following startup namespace setup service was setup and enabled:
/etc/systemd/system/probeNatTestSetup.service
[Unit]Description=Probe NAT Test SetupBefore=ferm.service[Service]Type=oneshotRemainAfterExit=yesExecStart=%S/probenattest/init-netns.sh[Install]WantedBy=default.targetWantedBy=ferm.service
WARNING: edited, systemd can brick the system with the previous version of the config.
The following action was then taken:
apt install fermapt remove iptables-persistent
And the ferm is configured with:
/etc/ferm/ferm.conf
# static public-facing ip addresses @def $IPv4_WORLD = 37.218.242.175; @def $IPv6_WORLD = 2a00:c6c0:0:151:4:ae99:c0a9:d585; # static private ip address @def $IPv4_PRIVATE = 10.0.0.2; @def $IPv6_PRIVATE = fc00::2; domain (ip ip6) { table filter { chain INPUT { policy DROP; # connection tracking mod state state INVALID DROP; mod state state (ESTABLISHED RELATED) ACCEPT; # allow local packet interface lo ACCEPT; # respond to ping proto icmp ACCEPT; # allow SSH connections proto tcp dport ssh ACCEPT; # allow HTTP connections (for ACME HTTP-01 challenge) proto tcp dport http ACCEPT; proto tcp dport https ACCEPT; # allow HTTPS-ALT connections proto tcp dport 8443 ACCEPT; } chain OUTPUT { policy ACCEPT; # connection tracking #mod state state INVALID DROP; mod state state (ESTABLISHED RELATED) ACCEPT; } chain FORWARD { policy DROP; # connection tracking mod state state INVALID DROP; mod state state (ESTABLISHED RELATED) ACCEPT; # forward packets to subnet @if @eq($DOMAIN, ip) { daddr $IPv4_PRIVATE ACCEPT; saddr $IPv4_PRIVATE ACCEPT; } @else { daddr $IPv6_PRIVATE ACCEPT; saddr $IPv6_PRIVATE ACCEPT; } } } # PRE- and POST- ROUTING rules for probetest table nat { chain POSTROUTING { @if @eq($DOMAIN, ip) { saddr "$IPv4_PRIVATE/24" outerface eth0 SNAT to $IPv4_WORLD random; } @else { saddr "$IPv6_PRIVATE/7" outerface eth0 SNAT to $IPv6_WORLD random; } } chain PREROUTING { @if @eq($DOMAIN, ip) { proto tcp dport 8443 interface eth0 DNAT to $IPv4_PRIVATE; } @else { proto tcp dport 8443 interface eth0 DNAT to $IPv6_PRIVATE; } } } }
And after that just reboot to confirm everything works! It will easily be most stressful part unless one have console access to that machine.
[Unit]Description=Snowflake Nat Type Test DaemonAfter=probeNatTestSetup.service[Service]# --addr 10.0.0.2:8081 is added to ensure should NetworkNamespacePath failed to apply the unit will fail. Do NOT remove it unless the the reason to add it is understood.ExecStart=%S/probenattestd/probenattestd -disable-tls --addr 10.0.0.2:8081WorkingDirectory=%S/probenattestdRestartSec=5sRestart=on-failureStateDirectory=probenattestdNetworkNamespacePath=/var/run/netns/net0[Install]WantedBy=default.target
and then enable and start it with systemctl enable probeNatTestd.service and systemctl start probeNatTestd.service.
WARNING: in systemd, the failure to apply NetworkNamespacePath=/var/run/netns/net0 is silent, and the service will run anyway without it being applied if systemd was unable to apply it without even a line of warning. This behaviour is against the best practice, and should be aware of when operating it. --addr 10.0.0.2:8081 is there to make sure the service will fail and make a noise when the NetworkNamespacePath is not applied, be considerate of this when changing it.
--addr 10.0.0.2:8081 is there to make sure the service will fail and make a noise when the NetworkNamespacePath is not applied, be considerate of this when changing it.
Okay, please add this information as a comment in the service file itself.
After a network configuration mishap today, I reinstalled the OS to begin the installation process from scratch. The IP address is the same but the SSH keys have changed.
Thanks! The misconfiguration is about systemd's dependency, I introduced a dependency circle between ferm.service, network namespace setup(probeNatTestSetup.service), sysinit.target, multi-user.target. I have find out the root issue with a local vm, and resume the process to deploy to the snowflake broker.
Server setup above replayed on reinstalled server.
ACME auto renew was setup with online resource https://github.com/unknowndevQwQ/acme.sh-systemd:
[Unit]Description=Renew certificates acquired via acme.shAfter=network.target network-online.target nss-lookup.targetWants=network-online.target nss-lookup.targetDocumentation=https://github.com/acmesh-official/acme.sh/wiki[Service]# If the version of systemd is 240 or above, then uncommenting Type=simple and commenting out Type=exec#Type=execType=simple# The --home argument should be the location of the acme.sh configuration directory.# This is the user unit, by default there is no need to set the --home folder#ExecStart=/usr/bin/acme.sh --cron --home %h/.acme.shExecStart=%h/.acme.sh/acme.sh --cron# acme.sh returns 2 when renewal is skipped (i.e. certs up to date)SuccessExitStatus=0 2Restart=on-failure
Why acme.sh, and not certbot, which will be managed automatically by the system package manager? It's fine if there's a good reason, but if not, we should optimize for low maintenance and reducing external dependencies.
depends on a external runtime(Python) and dependencies, and when managed by Debian package manager, will always be out of date, (acme.sh on the other hand only need a bash and curl command)
and is not designed to run as an underprivileged user with a least privilege setup: this setup is designed so that each component only need the permission it really need. In this setup the acme component is designed to run as an unprivileged user acmeworker.
The autoReloadNginx.service is there so that the certificate refresh process from privileged context does not need to be initiated by unprivileged user acmeworker.
I can go ahead and switch to certbot, but it will not be able to have the same least privilege setup to drop acme process's root privilege. Should I go ahead and do that?
My preference is for certbot.
Generally I think we should not have curl | bash
anywhere in the setup procedure.
My position is that we should optimize for low maintenance
and following standard procedures whenever possible.
Every little thing that is custom,
is something that not only we have to maintain documentation for,
it's something that somebody will have spend time learning
before they can deal with a problem on the server.
Generally: try to reduce the number of components,
and install the components in a standard way when possible.
We don't want to spend our "weirdness budget"
on mundane sysadmin stuff, save that for our own software.
The situation we want to avoid is one where you have set up things
in a way that you like personally, but for anyone else to work with it
they need to find you to find out how it works.
# This is not supported by certbot developerapt install certbotcertbot register -m shelikhoo@torproject.org# maybe replace this domain with the production domain name if necessarycertbot certonly -d 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io
Successfully received certificate.Certificate is saved at: /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/fullchain.pemKey is saved at: /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/privkey.pem
And apply them to nginx.
Click to expand
root@snowflake-broker-40349:/etc/nginx# certbot certonly -d 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io ^Croot@snowflake-broker-40349:/etc/nginx# certbot register -m shelikhoo@torproject.orgSaving debug log to /var/log/letsencrypt/letsencrypt.log- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Please read the Terms of Service athttps://letsencrypt.org/documents/LE-SA-v1.4-April-3-2024.pdf. You must agree inorder to register with the ACME server. Do you agree?- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -(Y)es/(N)o: y- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Would you be willing, once your first certificate is successfully issued, toshare your email address with the Electronic Frontier Foundation, a foundingpartner of the Let's Encrypt project and the non-profit organization thatdevelops Certbot? We'd like to send you email about our work encrypting the web,EFF news, campaigns, and ways to support digital freedom.- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -(Y)es/(N)o: nAccount registered.root@snowflake-broker-40349:/etc/nginx# certbot certonly -d 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io Saving debug log to /var/log/letsencrypt/letsencrypt.logHow would you like to authenticate with the ACME CA?- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -1: Spin up a temporary webserver (standalone)2: Place files in webroot directory (webroot)- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 1Requesting a certificate for 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.ioSuccessfully received certificate.Certificate is saved at: /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/fullchain.pemKey is saved at: /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/privkey.pemThis certificate expires on 2024-12-29.These files will be updated when the certificate renews.Certbot has set up a scheduled task to automatically renew this certificate in the background.- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -If you like Certbot, please consider supporting our work by: * Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate * Donating to EFF: https://eff.org/donate-le- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -root@snowflake-broker-40349:/etc/nginx# systemctl list-timersNEXT LEFT LAST PASSED UNIT ACTIVATES Mon 2024-09-30 22:57:44 UTC 8h left Mon 2024-09-30 14:08:56 UTC 11min ago apt-daily.timer apt-daily.serviceTue 2024-10-01 00:00:00 UTC 9h left Mon 2024-09-30 00:00:00 UTC 14h ago dpkg-db-backup.timer dpkg-db-backup.serviceTue 2024-10-01 00:00:00 UTC 9h left Mon 2024-09-30 00:00:00 UTC 14h ago logrotate.timer logrotate.serviceTue 2024-10-01 00:01:00 UTC 9h left Mon 2024-09-30 00:01:01 UTC 14h ago autoReloadNginx.timer autoReloadNginx.serviceTue 2024-10-01 00:14:59 UTC 9h left Mon 2024-09-30 06:43:49 UTC 7h ago man-db.timer man-db.serviceTue 2024-10-01 02:12:30 UTC 11h left - - certbot.timer certbot.serviceTue 2024-10-01 06:13:06 UTC 15h left Mon 2024-09-30 06:37:56 UTC 7h ago apt-daily-upgrade.timer apt-daily-upgrade.serviceTue 2024-10-01 06:25:00 UTC 16h left Mon 2024-09-30 06:25:01 UTC 7h ago ntpsec-rotate-stats.timer ntpsec-rotate-stats.serviceTue 2024-10-01 14:01:09 UTC 23h left Mon 2024-09-30 14:01:09 UTC 19min ago etckeeper.timer etckeeper.serviceTue 2024-10-01 14:01:09 UTC 23h left Mon 2024-09-30 14:01:09 UTC 19min ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.serviceSun 2024-10-06 03:10:06 UTC 5 days left Sun 2024-09-29 03:10:56 UTC 1 day 11h ago e2scrub_all.timer e2scrub_all.serviceMon 2024-10-07 00:51:07 UTC 6 days left Mon 2024-09-30 01:27:56 UTC 12h ago fstrim.timer fstrim.service12 timers listed.Pass --all to see loaded but inactive timers, too.root@snowflake-broker-40349:/etc/nginx# certbot renew --dry-runSaving debug log to /var/log/letsencrypt/letsencrypt.log- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Processing/etc/letsencrypt/renewal/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io.conf- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Account registered.Simulating renewal of an existing certificate for 0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Congratulations, all simulated renewals succeeded: /etc/letsencrypt/live/0tzfb4f02pigk12zhf4ebovqvzl8abcq2ckd-37-218-242-175.sslip.io/fullchain.pem (success)- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Adjusted nginx config to use ip address supplied by proxy protocol:
/etc/nginx/conf.d/logFormat.conf
What do you mean "the IP address supplied by the proxy protocol"?
Nginx should be configured to log nothing at all.
I don't see the reason for specifying a log format.
This makes me nervous – is there a chance Nginx could log client or proxy IP addresses?
We must take proactive steps to prevent such logging from happening.
Nginx can be configured to log IP address. I am configuring logging here to verify the ip based rate-limiting is working. I will turn it off. Sorry I was a little too familiar with nginx and just followed routine procedure without thinking about the specific problem at hand...
I was think if the error log should be turned off as well. They will not remember anything from client during normal workflow, and could only be turned off globally.
Example Error log
, server: , request: "GET /blog/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:25 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/workspace/drupal/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /workspace/drupal/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:25 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/panel/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /panel/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:25 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/public/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /public/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:26 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/apps/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /apps/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:26 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/app/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /app/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:27 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /index.php?s=/index/\think\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=Hello HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:28 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/public/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /public/index.php?s=/index/\think\app/invokefunction&function=call_user_func_array&vars[0]=md5&vars[1][]=Hello HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:28 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /index.php?lang=../../../../../../../../usr/local/lib/php/pearcmd&+config-create+/&/<?echo(md5("hi"));?>+/tmp/index1.php HTTP/1.1", host: "37.218.242.175:443"2024/09/26 10:57:28 [error] 60540#60540: *8321 open() "/usr/share/nginx/html/index.php" failed (2: No such file or directory), client: 127.0.0.1, server: , request: "GET /index.php?lang=../../../../../../../../tmp/index1 HTTP/1.1", host: "37.218.242.175:443"
As a sanity check with regard to logging, I recommend running a comprehensive port scan against the host, and then grepping /var/log and journalctl to see if the source IP address of the scan appears anywhere.
It's OK if it appears in SSH logs.
As a final artifact of this process,
I would like you to take all the notes you have taken
and combine them into a single document, like
Snowflake-Broker-Installation-Guide.
You can overwrite the Snowflake-Broker-Installation-Guide wiki page directly,
or put it on some other draft wiki page.
The idea is to have a single set of instructions that anyone can follow
in order to reinstall the broker or do maintenance on it.
It will then become the reference documentation that we edit and maintain going forward.
Domain name updated to: snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net
CAA record testing finished without incident.
Certificate is saved at: /etc/letsencrypt/live/snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net/fullchain.pemKey is saved at: /etc/letsencrypt/live/snowflake-broker-debianupgradestaging-j33r3zahe.torproject.net/privkey.pem
Everything seems to work fine running it from console, I see it connecting to my proxy. If I try this line in TB it works, but I never manage to connect to my proxy, I think is using the main broker, I guess there is something pinned there.
I can see my client connected to the proxy I was running with the exact commands you have provided. Would you mind try again with tor command line version to see if the unexpected behaviour can be removed?
Sorry, I see I didn't explain myself correctly. It did work fine like you did running tor manually from command line. But I didn't manage to connect to it in TB, I assume somehow the broker is pinned in TB.
This issue has been waiting for information two
weeks or more. It needs attention. Please take care of
this before the end of
2024-11-07. ~"Needs
Information" tickets will be moved to the Icebox after
that point.
(Any ticket left in Needs Review, Needs Information, Next, or Doing
without activity for 14 days gets such
notifications. Make a comment describing the current state
of this ticket and remove the Stale label to fix this.)
To make the bot ignore this ticket, add the bot-ignore label.
I have installed prometheus-node-exporter unattended-upgrades man-db screen rsync and there is no more blockers for swap brokers. Wiki was updated with the actual deployment step.
@dcf Please go ahead and adjust resource allocation for the old and new broker machine. Do NOT shutdown the old machine yet. Pending change: snowflake-webext!85 (merged)
The new broker host is restarted with the configuration 32 GiB / 8 CPU cores.
Because of the recent eclips.is / Greenhost changes,
the web interface does not hard-limit me to a maximum amount of resources.
I did not have to reduce the resource allocation of the current broker (or even restart it)
in order to increase the resource allocation of the new broker.
So both brokers are currently running with 32 GiB / 8 CPU cores.
Of course, we'll want to shut down the current broker soon,
so that we are not using double the resources for long.
My procedure:
"Console" tab
2024-11-12 19:46:17 Click "Turn off" button.
"Configuration" tab
Click "Update VPS" button
Select 32 GiB / 8 CPU cores
"Console" tab
2024-11-12 19:48:28 Click "Turn on" button.
I was able to log in again with SSH at 2024-11-12 19:49:18.
The prometheus metrics show that the client polls plummeted around 16:30 UTC yesterday and they haven't recovered. This would coincide with the DNS change: tpo/tpa/team#41878 (comment 3134084)
I wonder if our prometheus server is still polling metrics from the old broker exporter and not the new one. As this graph is still flat, but looking at the actual exporter it looks like clients are connecting. Just russia has ~550k connections since monday:
The number of requests seen from our CDN account did not drop with the number of client polls, so I'm guessing that's where the problem is/was (instead they spiked significantly, which is what I would expect if this is where the problem is).
Our CDN resource is configured to connect to the domain snowflake-broker.torproject.net. I wonder if some of the data centres had trouble with the DNS change. I just re-added the broker URL to try and force an update. It's probably too late to check today but we'll see if it did anything tomorrow.
We've received some reports from snowflake proxy operators on IRC that they previously had an unrestricted NAT type, but are now failing to open their probetest data channel and are classified as restricted. Looking at the metrics posted 2024-11-27 12:43:50 for our new broker machine: https://snowflake-broker.torproject.net/metrics
Somewhat related, our prometheus metrics, which we believe to be coming from the old broker machine still, show that while client polls have dropped off almost completely, unrestricted proxy polls have only slightly dropped
I wonder if there is a similar problem going on with (standalone) proxies as with the prometheus scraper, where proxy operators need to restart in order to re-resolve the domain name to the new IP address? That doesn't explain the probetest failures, just the low number of unrestricted proxies.
This shows a low number of proxies in general, and zero unrestricted proxies.
These number can't be right can they?? 76 proxies could not have possibly resulted in snowflake-proxy-poll-with-relay-url-count 45397176, that would be 45397176 / 76 proxies / 24 hours / 60 minutes = ~414 polls per minute per proxy.
Also client-snowflake-match-count 1941088 is much higher than client-denied-count 73872
After a prometheus restart, we're pulling metrics from the new broker, and sure enough unrestricted proxy polls have dropped to close to zero, which seems to indicate that most of them are still polling the old broker
Looks like it is standalone proxies that are disproportionately affected:
When prometheus was restarted to pull metrics from the new broker, the number of currently available standalone proxies dropped while all others rose from zero.
I can confirm that theory. I have a standalone proxy that I haven't restarted since before the switch and I see it connected to the old broker but not the new:
root@stuart:/opt# ss |grep 37.218.245.111tcp ESTAB 0 0 192.168.1.7:56048 37.218.245.111:httpsroot@stuart:/opt# ss |grep 37.218.242.175root@stuart:/opt#
I'm guessing the reason the client IP addresses aren't affected is because they already have an X-Forwarded-For header from the CDN, and the reason we have more than 1 proxy is because some proxies may be behind an additional proxy layer that is adding it.
This issue has been waiting for information two
weeks or more. It needs attention. Please take care of
this before the end of
2025-01-10. Needs Information
tickets will be moved to the Icebox after that point.
(Any ticket left in Needs Review, Needs Information, Next, or Doing
without activity for 14 days gets such
notifications. Make a comment describing the current state
of this ticket and remove the Stale label to fix this.)
To make the bot ignore this ticket, add the bot-ignore label.
The new broker host is restarted with the configuration 32 GiB / 8 CPU cores.
For cost reasons, I have reduced the configuration to 8 GiB / 4 CPU cores.
That is the largest configuration that fits in the EUR 50 / month credit.
This change occurred at 2025-02-04 17:42:52.