T

Team

Meta for non-specific project tickets, projects management, and general information in the wiki.

Network Health Team

About us

Welcome to the Network Health page! There are several people in the Tor community taking care of the network's health.

The five areas that we focus on are:

  1. Track community standards about what makes a good relay
    • Publish up-to-date expectations for relay operators
    • Set best practices for how to set relay families
    • Detect and resolve bad relays
      • Exitmap, sybil detection, hsdir traps, etc.
  2. Anomaly analysis / network health engineer [with network team]
    • Establish baselines of expected network behavior
    • Look for and resolve denial of service issues
    • Track connectivity issues between relays
    • Look for relays hitting resource limits
  3. Make sure usage/growth stats are collected and accurate
    • Track network performance, relay diversity by various metrics
    • Count users [with network team]
    • Monitor bridge growth and usage [with censorship team]
  4. Relay advocacy [with community team]
    • Maintain docs for setting up and running relays and bridges
    • Grow a cohesive community of relay operators so they have peers
      • Keep relays on the right tor versions
    • Relaunch a gamification / badge system for lauding good relay progress
    • Strengthen relationships with non-profit orgs that run relays
    • Help companies that want to offset their tor network load
  5. Maintain the components of the network
    • Maintain directory authority relationships
    • Keep bandwidth authorities working (including setting the right balance between speed and location diversity)
    • Have enough tor browser default bridges, and keep them running smoothly [with censorship team]
    • Update the fallbackdirs list

Communication Channels

Just go to #tor-dev, and somebody from the team might either be around or appear later and get back to you.

We use IRC for our meetings, we meet on the OFTC network.

Team meeting UTC Location
Primary team meeting Monday 16:00 #tor-meeting

The Network Health's asynchronous medium of communication are the network-health@, tor-relays@, and tor-dev@ mailing lists, depending on which is more applicable. These lists are public in the sense that anyone can subscribe, send emails, and read archives. Feel free to subscribe and just listen if you want, and feel free to post if you have a question that you think is on topic.

For metrics related topic our asynchronous medium of communication is the network-health@ mailing list. This list is public in the sense that anyone can subscribe and read archives. But it's moderated on first post, meaning that your first post will be reviewed to make sure it's not spam and on topic and all further posts will go directly to the list. Feel free to subscribe and just listen if you want, and feel free to post if you have a question that you think is on topic.

General Priorities

  1. Detect and resolve bad relays
    • Exitmap, sybil detection, hsdir traps, etc.
  2. Anomaly analysis / network health engineer [with network team]
    • Establish baselines of expected network behavior
    • Monitor network disruption or problems
  3. Relay advocacy [with community team]
    • Strengthen relationships with non-profit orgs that run relays
    • Maintain docs for setting up and running relays and bridges
  4. Make sure usage/growth stats are collected and accurate
    • Track network performance, relay diversity by various metrics
  5. Maintain the components of the network to keep it healthy
    • Keep bandwidth authorities working (including setting the right balance between speed and location diversity)

PRIORITIES FOR 2022

[Must have]

  • Various work within our performance and scalability project : Sponsor 61 - Q1/Q2/Q3
  • Run bad-relay detection scripts - Q1/Q2/Q3/Q4
  • Support for researchers for network experiments - Q1/Q2/Q3/Q4
    • pass the tor-research-safety-board torch to GeKo and get it organized again - Q1
  • Consider tickets from other teams - Q1/Q2/Q3/Q4
    • dgoulet - network team
    • gus - community/comms
    • roger - anti-censorship
  • Improve user support for relay operators
    • Have relay operators understand their contribution (better docs, and think through what to show them that reflects their contribution, why their relay isn't performing as they want, etc)
    • Follow-on from the simply secure metrics redesign, too
  • Handle EOL relays - Q1/Q3
  • Relay operator meetups - Q1/Q2/Q3/Q4
  • Keep moderating and answering the tor-relays mailing list - Q1/Q2/Q3/Q4
  • Improve monitoring and alerting for metrics services - Q1
  • Support OTF fellow on Relay Operators Community Health Research - Q1
  • Help Miko (Outreachy Intern) where needed (tpo/community/team#46 (closed)) - Q1
  • Bad-relay tooling improvements - Q1
  • Fix any sbws critical issues that may come up - Q1/Q2/Q3/Q4
  • Rebuild Tor metrics website Q2
  • Deploy a data store for metrics services - Q1/Q2/Q3
    • Document current data models for metrics services - Q1
  • Various work within our network health project: Sponsor 112 - Q4
    • Map out possible plans for quantifying and improving our trust in relays/operators - Q3 and/or Q4
    • O2.2: Evaluate promising solutions for operator codes of conduct, contracts, and methods for enforcement. Develop a proof-of-concept - Q4 2022
    • O3.1: Reduce incentives to run bad relays by fixing known open issues - Q4 2022
    • O3.2: Evaluate and implement solutions to relay attacks - Q4 2022

[Nice to have]

  • Work on sbws2 (refactoring and redesign) - Q1/Q2/Q3/Q4
  • Start accumulating a list of metrics we want to have from arti-based relays
  • Surprise 'anomaly analysis' on the network as needed
  • Support the unblock tor campaign
  • Network diversity metrics tracking and build plan on how to improve upon that
  • Network anomaly detection - related to Sponsor 112 O1
    • better understand the relay dynamics of the Tor network
    • build mechanisms to annotate, understand, tag, and track relays and how they behave and churn on the network
  • Things that we should look for and escalate to ourselves if we notice:
    • Are there changes to tor or tor browser that would improve our bad-relays treadmill?
    • Are there any current "issues inside the Tor network or protocol that allow attacks that harm UX for other users"? - Sponsor 112 O3
      • finish the PoW onion service idea so people don't feel the need to DoS the network anymore
  • Document current data models for metrics services - Q1
  • Evaluate metrics for relays and VPN client and their possible privacy issues/risks
  • Deploy a data API for metrics services

Roadmap Q2 2022 - April to June

MUST HAVE

Sponsor 61
  • Objective 2: Decrease latency for end users by deploying smarter load balancing mechanisms.

  • O2.1: Reduce the number of slow and extremely slow sessions for our users by developing and deploying load balancing improvements.

  • Objective 4: Improve our ability to proactively detect, diagnose, and resolve user-facing performance issues.

  • O4.2: Find and fix performance-impacting issues and bugs discovered from monitoring and scanning.

  • Run bad-relay detection scripts

  • Bad-relay tooling improvements (Juga)

  • Fix any sbws critical issues that may come up

  • Support for researchers for network experiments

  • Consider tickets from other teams

  • Support OTF fellow on Relay Operators Community Health Research

  • Relay operator meetups.

  • Keep moderating and answering the tor-relays mailing list

  • Handle EOL relays

  • Support mentee from GSoC (from one of the candidates https://torweather.herokuapp.com/)

  • Improve monitoring and alerting for metrics services.

  • Deploy a data store for metrics services.

NICE TO HAVE

  • Work on sbws2 (refactoring and redesign)

    • deployed in 1 bwauth
    • fix bugs
    • feature parity with sbws
  • Surprise 'anomaly analysis' on the network as needed

  • Think about metrics for the VPN client and their possible privacy issues/risks

  • Network anomaly detection: use current monitoring infrastructure to get some of the anomalies we can catch with it.

Roadmap Q1 2022 - January to March

  • S61 Objective 2: Decrease latency for end users by deploying smarter load balancing mechanisms
  • S61 O2.1: Reduce the number of slow and extremely slow sessions for our users by developing and deploying load balancing improvements
  • S61 Objective 4: Improve our ability to proactively detect, diagnose, and resolve user-facing performance issues
  • S61 O4.2: Find and fix performance-impacting issues and bugs discovered from monitoring and scanning
  • Bad-relay tooling improvements
  • Support OTF fellow on Relay Operators Community Health Research
  • Help Miko (Outreachy Intern) where needed (tpo/community/team#46 (closed))
  • Improve monitoring and alerting for metrics services
  • Plan for deployment a data store for metrics services
    • Document current data models for metrics services
  • Fix any sbws critical issues that may come up
  • Handle EOL relays
  • Relay operator meetups
  • Keep moderating and answering the tor-relays mailing list
  • Plan the work for rebuilding Tor metrics website
  • Work on sbws2 (refactoring and redesign)
  • Think about metrics for the VPN client and their possible privacy issues/risks
  • Pass the tor-research-safety-board torch to GeKo and get it organized again

OKRs

Priorities from previous years

Previous roadmaps

Active Sponsor Projects

Metrics

We provide a set of monitoring and observability software tools and services for the public Tor network.

General Guides

Products

Product is a codebase of software maintained by the Network Health team. Not all metrics related products are mentioned in this section. Instead, we list some long-term products here that are or will be released in a way that a third party can decide to run a mirror of this type of service.

Inventory

Metrics products are hosted on tpa maintained hardware, except for some onionperf installations which are administered via ansible.

We maintain a list of metrics VMs and the services they host.

Services Ops

We maintain a list of services and their operational documentation.

Resources

Developer meeting notes

Other