Team

Network Health Team

About us

Welcome to the Network Health page! Our team, along with many dedicated individuals in the Tor community, is committed to ensuring the well-being of the Tor network, its nodes, and the community of operators. Our objetives are network security, functionality, and reliability for all users, focusing our efforts on five areas, withing these groups.

Security

Security involves protecting the network from malicious activities, ensuring the integrity and confidentiality of data transmitted across the network.

This area involves setting standards and actively removing threats from the network.

Track community standards about what makes a good relay
- Publish up-to-date expectations for relay operators
- Set best practices for how to set relay families
- Detect and resolve bad relays
  - Exitmap, sybil detection, hsdir traps, etc.

This area is focused on identifying and mitigating anomalies that can pose security risks, this area is essential for maintaining a secure operating environment within the Tor network.

Anomaly analysis / network health engineer [with network team]
- Establish baselines of expected network behavior
- Look for and resolve denial of service issues
- Track connectivity issues between relays
- Look for relays hitting resource limits

Functionality

Functionality ensures the network performs within healthy baselines, allowing access and usability for its users.

This area supports the functionality of the network by ensuring that growth and usage are monitored and optimized for performance, helping to manage the network efficiently based on actual usage patterns.

Make sure usage/growth stats are collected and accurate
- Track network performance, relay diversity by various metrics
- Count users [with network team]
- Monitor bridge growth and usage [with censorship team]

These efforts enhance the functionality of the network by providing necessary support and resources to those operating it, ensuring that it runs smoothly and effectively.

Relay advocacy [with community team]
- Maintain docs for setting up and running relays and bridges
- Grow a cohesive community of relay operators so they have peers
  - Keep relays on the right tor versions
- Relaunch a gamification / badge system for lauding good relay progress
- Strengthen relationships with non-profit orgs that run relays
- Help companies that want to offset their tor network load

Reliability

Reliability refers to the network's ability to consistently perform its intended function under normal and stress conditions, maintaining service continuity.

Maintain the components of the network
- Maintain directory authority relationships
- Keep bandwidth authorities working (including setting the right balance between speed and location diversity)
- Have enough tor browser default bridges, and keep them running smoothly [with censorship team]
- Update the fallbackdirs list

This focus area is critical for reliability as it involves maintaining core network infrastructure, which ensures the network remains operational and robust against various challenges and demands.

Communication Channels

Just go to #tor-dev, and somebody from the team might either be around or appear later and get back to you.

We use IRC for our meetings, we meet on the OFTC network.

Team meeting	UTC	Location
Primary team meeting	Monday 13:00 UTC	#tor-meeting

The Network Health's asynchronous medium of communication are the network-health@, tor-relays@, and tor-dev@ mailing lists, depending on which is more applicable. These lists are public in the sense that anyone can subscribe, send emails, and read archives. Feel free to subscribe and just listen if you want, and feel free to post if you have a question that you think is on topic.

For metrics related topic our asynchronous medium of communication is the network-health@ mailing list. This list is public in the sense that anyone can subscribe and read archives. But it's moderated on the first post, meaning that your first post will be reviewed to make sure it's not spam and on topic and all further posts will go directly to the list. Feel free to subscribe and just listen if you want, and feel free to post if you have a question that you think is on topic.

General Priorities

Detect and resolve bad relays
- Exitmap, sybil detection, hsdir traps, etc.
Anomaly analysis / network health engineer [with network team]
- Establish baselines of expected network behavior
- Monitor network disruption or problems
Relay advocacy [with community team]
- Strengthen relationships with non-profit orgs that run relays
- Maintain docs for setting up and running relays and bridges
Make sure usage/growth stats are collected and accurate
- Track network performance, relay diversity by various metrics
Maintain the components of the network to keep it healthy
- Keep bandwidth authorities working (including setting the right balance between speed and location diversity)

PRIORITIES FOR 2025 [Q1-Q2]

Community Advocacy and Support

Relay Community Engagement
- Establish community-driven behavioral agreements and consequences for relay operators. (P112-O2)
Maintain the components of the network
- Work with directory authority operators to plan transition from C to Arti. (P141)

Network Health Engineering and Anomaly Analysis

Relay Attacks Mitigation
- Evaluate and implement solutions to relay attacks. (P112-O3 (minus O3.5))
- bandwidth inflation on the Tor network (P112-O3.5)
Connectivity Tracking
- Track relay-to-relay connectivity. (GSOC)
Onbasca Refactoring
- Refactor and redesign onbasca.
Anomaly Analysis
- Conduct surprise anomaly analysis on the network as needed.
- Develop and implement algorithms for anomaly detection (P183).
SBWS maintenance and development
- Maintain and develop sbws (P183)
Measure Arti performances
- Ensure Arti collects performance metrics and delivers them to the metrics pipeline. (P141)
- Create a test network with authorities, middle nodes, and exit nodes. (P141)
- Discuss possible list of metrics we want to have from arti-based relays with the arti team. (P141)

Detection and Resolution of Bad Relays

Tooling Improvements
- Improve tools for detecting bad relays.
Detection and Resolution
- Run bad-relay detection scripts regularly.

Metrics pipeline development and improvement

Metrics Services Infrastructure
- Deploy Network Status API for metrics services.
- Improve monitoring and alerting for metrics service.
- Enhance data collection methods (P183)
- Rewrite metrics-library in rust. (GSOC)
Metrics Website
- Rebuild the Tor metrics website.

Support for Researchers

Network Experiments
- Provide support for researchers conducting network experiments.
Tor Safetyboard work
- Evaluate research proposals as needed

Past priorities and roadmaps Priorities discussion pad

Active Projects

Project 112 - Combating Malicious Relays
Project 141 - Arti Relays
Project 183 - Exception reporting framework

Network Health

We are concerned with the well-being of the Tor network and its particular relays. There are currently five main areas of work involved in that effort, guided by processes and policies which we have developed over time:

To work in those areas a lot of tools got developed over time both by Tor Project staff and external contributors and volunteers. Some of those tools are currently in use while others are obsolete or unused in our day-to-day work. For an overview see:

Additionally, we need to make sure that the respective data collection is sound and working. We, therefore, have set up a lot of monitoring to be notified about (and able to fix) potential issues as quickly as possible:

Data collection and services monitoring

Metrics

We provide a set of monitoring and observability software tools and services for the public Tor network.

General Guides

Services and Tools

We list some long-term projects maintained under the metrics umbrella. These services are designed and developed with the flexibility to be mirrored, should the need arise.

CollecTor is the friendly data collecting service
ExoneraTor helps to find out whether an IP address was used as a Tor relay
Metrics Website is the primary place to learn interesting facts about the Tor network
metrics-lib is a Java library that fetches and parses Tor descriptors.
Onionoo is a web-based protocol to learn about currently running Tor relays and bridges
Exit Scanner/TorDNSEL/Tor Check
OnionPerf
Metrics Timeline

Tools and services data flow diagram:

(plantuml source)

Inventory

Metrics products are hosted on tpa maintained hardware, except for some onionperf installations which are administered via ansible.

We maintain a list of metrics VMs and the services they host.

Services Ops

We maintain a list of services and their operational documentation.

How to get involved

There are several areas where you could get involved:

Contribute to one of our projects
Get involved with network data analysis
Help us redesign our metrics data pipeline
- Network Status API (NSA for short ;)
- descriptorParser
- TagTor
- tor_fusion (parse network documents in rust)

Resources

Developer meeting notes

Stockholm's meeting notes