Update from Serge end.
Since I started running Serge, I have dumped the output of openfiles for the tor daemon and TCP connections on the ORPort every few minutes. I kept the limits for both a reasonable 20% higher than the norm, and there were only periodic [warn] messages about the high number of connections.
I increased them related sysctls, although it's likely that the ORPort connection was the only one that mattered. It seems as of a few days ago, the new baseline for ORPort connections is 15% higher than in the past.
The [warn] messages about too many connections ceased, and we are now in the new normal. I now have a cron job that checks every few minutes if the number of connections jumps more than ~15% or so above the new normal.
Is there an update on bridges failing to get the running flag?
From metrics, it looks like the number of bridges has stabilized, although it's down for the recent past.
In reference to dir auth operators, it's maybe not useful to simply contrast the community and the network. Dir auth operators aren't just random contributors, like a relay operator or some anonymous diff contributor, they play a highly centralized role. Maintaining guidelines for the dir auths is logical, insofar as they don't inhibit their integrity. I don't get how the CoC would do that.
"Independence" for dir auth operators, from my understanding, is about maintaining independent control of their boxes for the sake of the integrity of the network.
I'm also a bit confused as to why accepting the parameters of this meaning of "community" evokes any issue.
If there are concrete past or hypothetical examples of why it's an issue, that should be raised to clarify the actual issue.
thanks GeKo.
I get a 404 on https://gitlab.torproject.org/gk/network-tools-private/-/blob/main/Network-health-tools.md
@gman999 added...
Some edits inline based on https://forum.torproject.net/t/workshop-sysadmin-101-for-new-relay-operators/3400
Join us June 4th at 1900 UTC for new and prospective Tor relay and bridge operators on the basic “sysadmin foo” required to contribute to the network.
So you want to contribute to the Tor network by running a relay or maybe a bridge?
So you want to contribute to the open-source Tor network by running a relay or maybe a bridge?
The Tor network is the most important open-source tool for evading surveillance and bypassing internet censorship. Volunteer-operated Tor relays and bridges are vital to the health and integrity of the Tor network. Millions of users rely on relays and bridges to stay safe, and how you configure and maintain that relay or bridge is critical.
maybe drop "open-source" here then?
Your role is vital.
Volunteers aren't a nice enhancement. They are a core feature.
Running a relay or a bridge raises frequent questions.
Should I run a relay or a bridge? Should I run a relay or a bridge from a residential/home internet connection? Which operating system should I run for my Tor node (hint: the one you are most comfortable with securing and maintaining) More generally, what does it take to keep that relay or bridge operating safely, but both you and Tor users?
This workshop will start with a presentation seeking to approach some of the core issues that arise when running a Tor node. The session will move into an “ask me anything” discussion to approach other common and less common questions.
This workshop will start with a presentation approaching some of the ...
Geared towards current and prospective Tor bridge and relay operators, particularly those relatively new to running public internet services.
The 90-minute event will be [g]eared towards..
Seasoned Linux and BSD Tor operators will be attending the event ready to address the discussion.
A shorter version:
Join us June 4th at 1900 UTC for new and prospective Tor relay and bridge operators on the basic "sysadmin foo" required to contribute to the network.
A short introduction will approach some of the basics, like "Should I run a bridge or a relay?" and "Which operating system should I run?" (hint: the one you are most comfortable running). Feel free to come with your questions and to answer other people's questions.
adding @kushal in case not seeing the ticket...
Announce proposal for event:
So you want to contribute to the Tor network by running a relay or maybe a bridge?
The Tor network is the most important tool for evading surveillance and bypassing internet censorship. And Tor relays and bridges are vital to the health and integrity of the Tor network. Millions of users rely on relays and bridges to stay safe, and how you configure and maintain that relay or bridge is critical.
Your role is vital.
Running a relay or a bridge raises frequent questions.
should I run a relay or a bridge?
should I run a relay or a bridge from a residential/home internet connection?
which operating system should I run for my Tor node (hint: the one you are most comfortable with securing and maintaining)
more generally, what does it take to keep that relay or bridge operating safely, but both you and Tor users?
This event will start with a presentation seeking to approach some of the core issues that arise when running a Tor node. The session will move into an "ask me anything" discussion to approach other common and less common questions.
Geared towards current and prospective Tor bridge and relay operators, particularly those relatively new to running public internet services.
Seasoned Linux and BSD Tor operators will be attending the event ready to address the discussion.
mentioning @arma @trinity-1686a @Frosttall.
frosttall (irc handle) noticed that bridge was marked down on metrics.tpo earlier. Serge had restart (not HUP/reload) around the same time as reported issue:
Mar 23 20:44:20.000 [notice] Clean shutdown finished. Exiting. Mar 23 20:44:23.000 [notice] Tor 0.4.6.10 opening log file.
The ultimate conclusion was that Serge takes 30" before voting again on running bridges.
arma pointed this code snippet out:
feature/dirauth/dirauth_options.inc:CONF_VAR(TestingAuthDirTimeToLearnReachability, INTERVAL, 0, "30 minutes")
Apparently, this has been going on for a while, most recently 20220207-002949 20220208-195458 20220205-204058 20220205-175303 20220206-044751 are also times when there was no consensus/"all bridges down" as per Serge reporting. It was reported by trinity that there were examples in 2020,2021. Again, very strange that none of us noticed that there were no reported running bridges.
The issue came up recently, and I thought someone on the bridgedb end handled. There were no formal 'outages' on Serge since the IPv6 routing issues a month ago, and didn't connect the outages with tor restarting.
A workaround arma mentioned was ignoring Serge's reporting if no bridges are marked Running.
I had a longer conversation with someone at NYI about this, which mostly confirmed what I said ^.
Verizon is a regular culprit on quietly dropping traffic usually going one way but not the other, and there are apparently often hidden hops not displayed in a traceroute or mtr.
Yes, I believe we had this on Trac in the past.
We should probably determine the purpose of maintaining this. I mean, the port maintainers should have their own mechanisms for knowing when the tor port they maintain needs updating. OpenBSD uses portroach (.openbsd.org), but there's not central method of notifying the maintainer.
I assume it could basically be some wget/curl/lynx and grep current version out of port trees and package lists.
And in the future, we could extend to other basic TPO software, tor browser, nyx, etc.
Trying to make sense of this ticket there's a lot of conclusions but not enough data collected I feel.
I'm assuming the problem is just some route filtering at this point, since it's the simplest and most obvious answer. It might not be the answer, but it will be helpful getting there... or at least eliminating the most obvious answer.
If connectivity to moria1 is the core issue, outside of other possible attacks, iptables, etc, and there's a spike in relays moria1 doesn't believe is running, the best route might be to:
from the ASs/IPs that can't reach moria1, dump some traceroute or better yet mtr output over some period of time. We can then see the packet loss, changing routes, and where things are dropping. Then we can maybe find the problem hops.
there's also the approach of working toward the middle from both ends with support, ie, arma upstream from local IP and upstream from moria1 IP. The goal would be getting closer to the issue apparently in some route deep in the mix, and the closer you get the easier it will be for someone to resolve. Needless to say, I get the wariness of dealing with Verizon, but some time on the phone might get you to Level II support. Didn't Neel C deal with some Verizon routing stuff in the past, on a different issue?
the more involved approach for later might be to bootstrap somehow off ooni data, since they have a wide range of internet points of view, which means we can start preempting this in the future.
Just a few quick and undeveloped thoughts...
Sure Gaba. Appreciated.
Better yet, maybe someone on the anti-censorship team should consider taking the ticket over, since I'm not running Snowflake.
I merely raised the notion in the relay operators meeting to address what seemed like a problem in CPU usage for Snowflake users, and am filing the ticket to hopefully initiate the discussion.
As per issue raised at 20220305 Relay Operators meetup, we discussed possible high CPU usage with Snowflake with browser addon. Relates to "High CPU load on idle proxies" #40112
We should get contributions of users detailing:
snowflake_version,browser,browser_version,installed_addons,CPU,CPU_snowflake_usage,RAM,RAM_snowflake_usage,operating_system
We should likely include average Snowflake usage and maybe consider basic hardware specs on device. Contributions should no have other applications running to better control the results.
Any enhancements on this survey welcome.