T

Team

Meta for non-specific project tickets, projects management, and general information in the wiki.

Network Health Team

About us

Welcome to the Network Health page! There are several people in the Tor community taking care of the network's health.

The five areas that we focus on are:

  1. Track community standards about what makes a good relay
    • Publish up-to-date expectations for relay operators
    • Set best practices for how to set relay families
    • Detect and resolve bad relays
      • Exitmap, sybil detection, hsdir traps, etc.
  2. Anomaly analysis / network health engineer [with network team]
    • Establish baselines of expected network behavior
    • Look for and resolve denial of service issues
    • Track connectivity issues between relays
    • Look for relays hitting resource limits
  3. Make sure usage/growth stats are collected and accurate
    • Track network performance, relay diversity by various metrics
    • Count users [with network team]
    • Monitor bridge growth and usage [with censorship team]
  4. Relay advocacy [with community team]
    • Maintain docs for setting up and running relays and bridges
    • Grow a cohesive community of relay operators so they have peers
      • Keep relays on the right tor versions
    • Relaunch a gamification / badge system for lauding good relay progress
    • Strengthen relationships with non-profit orgs that run relays
    • Help companies that want to offset their tor network load
  5. Maintain the components of the network
    • Maintain directory authority relationships
    • Keep bandwidth authorities working (including setting the right balance between speed and location diversity)
    • Have enough tor browser default bridges, and keep them running smoothly [with censorship team]
    • Update the fallbackdirs list

Communication Channels

Just go to #tor-dev, and somebody from the team might either be around or appear later and get back to you.

We use IRC for our meetings, we meet on the OFTC network.

Team meeting UTC Location
Primary team meeting Monday 16:00 #tor-meeting

The Network Health's asynchronous medium of communication are the network-health@, tor-relays@, and tor-dev@ mailing lists, depending on which is more applicable. These lists are public in the sense that anyone can subscribe, send emails, and read archives. Feel free to subscribe and just listen if you want, and feel free to post if you have a question that you think is on topic.

For metrics related topic our asynchronous medium of communication is the network-health@ mailing list. This list is public in the sense that anyone can subscribe and read archives. But it's moderated on the first post, meaning that your first post will be reviewed to make sure it's not spam and on topic and all further posts will go directly to the list. Feel free to subscribe and just listen if you want, and feel free to post if you have a question that you think is on topic.

General Priorities

  1. Detect and resolve bad relays
    • Exitmap, sybil detection, hsdir traps, etc.
  2. Anomaly analysis / network health engineer [with network team]
    • Establish baselines of expected network behavior
    • Monitor network disruption or problems
  3. Relay advocacy [with community team]
    • Strengthen relationships with non-profit orgs that run relays
    • Maintain docs for setting up and running relays and bridges
  4. Make sure usage/growth stats are collected and accurate
    • Track network performance, relay diversity by various metrics
  5. Maintain the components of the network to keep it healthy
    • Keep bandwidth authorities working (including setting the right balance between speed and location diversity)

PRIORITIES FOR 2023

  • Various work within our fighting malicious relays project: Sponsor 112 - Q1-Q4

    • O1:
      • Network anomaly detection
        • better understand the relay dynamics of the Tor network
        • build mechanisms to annotate, understand, tag, and track relays and how they behave and churn on the network
      • Bad-relay tooling improvements - Q1-Q4
      • Deploy a data store for metrics services
      • Map out possible plans for quantifying and improving our trust in relays/operators
      • Deploy a data API for metrics services
    • O2:
      • Establish community-driven behavioral agreements and consequences for relay operators - Q1-Q4
      • Support OTF fellow on Relay Operators Community Health Research
    • O3:
      • Evaluate and implement solutions to relay attacks - Q2-Q4
  • Run bad-relay detection scripts - Q1-Q4

  • Consider tickets from other teams - Q1-Q4

    • dgoulet - network team
    • gus - community/comms
    • meskio - anti-censorship
  • Support for researchers for network experiments - Q1-Q4

  • Improve user support for relay operators

    • Have relay operators understand their contribution (better docs, and think through what to show them that reflects their contribution, why their relay isn't performing as they want, etc)
  • Relay operator meetups - Q1-Q4

  • Keep moderating and answering the tor-relays mailing list - Q1-Q4

  • Handle EOL relays

  • Improve monitoring and alerting for metrics services

  • Rebuild Tor metrics website - Q3-Q4

  • Relay-to-Relay connectivity tracking - Q3-Q4

  • Work on onbasca (refactoring and redesign) - Q2-Q4

  • Start accumulating a list of metrics we want to have from arti-based relays

  • Fix any sbws critical issues that may come up - Q1-Q4

  • Surprise 'anomaly analysis' on the network as needed

  • Evaluate metrics for relays and VPN client and their possible privacy issues/risks

  • Network diversity metrics tracking and build plan on how to improve upon that

  • Things that we should look for and escalate to ourselves if we notice:

    • Are there changes to tor or tor browser that would improve our bad-relays treadmill?
    • Are there any current "issues inside the Tor network or protocol that allow attacks that harm UX for other users"? - Sponsor 112 O3
      • finish the PoW onion service idea so people don't feel the need to DoS the network anymore

Roadmap Q1 2023 - January to March

MUST HAVE

  • Sponsor 61

    • Objective 2: Decrease latency for end users by deploying smarter load balancing mechanisms.
      • O2.1: Reduce the number of slow and extremely slow sessions for our users by developing and deploying load balancing improvements. - Q1
        • Fix any sbws critical issues that may come up - likely whole year
  • Sponsor 112

    • Objective 1: Implement expanded network monitoring system and tools
      • O1.1: Build mechanisms to annotate, understand, tag and track relays to document their behavior and churn on the network - Q1 (Hiro)
        • Deploy a data store for metrics services (Hiro)
        • Deploy a data API for metrics service (Hiro)
      • O1.2: Improve network health monitoring tools - Q1-Q2 (GeKo)
      • O1.3: Improve and deploy tools to automatically detect malicious relay activity (Juga) - Q1-Q3
    • Objective 2: Establish community-driven behavioral agreements and consequences for relay operators
      • O2.1 Develop evaluation criteria for determining if behavior expectation and consequence solutions are appropriate - Q1 (ggus/GeKo)
      • O2.2: Evaluate promising solutions for relay operator code of conducts, policies, agreements and methods for enforcement - whole year (ggus/acute/GeKo)
        • Support OTF fellow on Relay Operators Community Health Research
  • Run bad-relay detection scripts - whole year

  • Metrics infrastructure maintenance - whole year

  • Consider tickets from other teams - whole year

  • Support for researchers for network experiments - whole year

  • Improve user support for relay operators

    • Handle EOL relays - whole year
    • Relay operator meetups - whole year
    • Keep moderating and answering the tor-relays mailing list - whole year

NICE TO HAVE

  • Work on onbasca (refactoring and redesign) <-- release will happen once congestion control is out and tested. ~ Maybe Q1
    • Deploy on 1 bwauth first
    • Fix bugs
    • Feature-parity with sbws
  • Surprise 'anomaly analysis' on the network as needed
  • Evaluate metrics for relays and VPN client and their possible privacy issues/risks
  • Network anomaly detection: use current monitoring infrastructure to get some of the anomalies we can catch with it.
  • Start accumulating a list of metrics we want to have from arti-based relays
  • Network diversity metrics tracking and build plan on how to improve upon that
  • Things that we should look for and escalate to ourselves if we notice - whole year
    • Are there changes to tor or tor browser that would improve our bad-relays treadmill?
    • Are there any current "issues inside the Tor network or protocol that allow attacks that harm UX for other users"?
    • Finish the PoW onion service idea so people don't feel the need to DoS the network anymore - Q1-Q3
  • Bring back Tor Weather - Q1

Past priorities and roadmaps

Active Sponsor Projects

Network Health

We are concerned with the well-being of the Tor network and its particular relays. There are currently four main areas of work involved in that effort, guided by processes and policies we have developed over time:

Metrics

We provide a set of monitoring and observability software tools and services for the public Tor network.

General Guides

Products

Product is a codebase of software maintained by the Network Health team. Not all metrics related products are mentioned in this section. Instead, we list some long-term products here that are or will be released in a way that a third party can decide to run a mirror of this type of service.

Inventory

Metrics products are hosted on tpa maintained hardware, except for some onionperf installations which are administered via ansible.

We maintain a list of metrics VMs and the services they host.

Services Ops

We maintain a list of services and their operational documentation.

Resources

Developer meeting notes

Other