Network Health
We will organize potential network health team work according priorities. This will involve
- Getting some rough understanding what is in scope and where items overlap with other teams' work
- Prioritizing the genuine network health tasks
- Thinking about interactions with other teams that work on related topics (Do we need new channels for that? Do we need special contact persons in the respective teams? ...)
Material:
We'll start with arma's mail about the roadmap for a potential network health team: https://lists.torproject.org/pipermail/tor-project/2018-December/002138.html. Thus, (re-)reading that thread and thinking along the scope of the session outlined above seems like a good idea.
Facilitator(s): GeKo + arma
Audience: Anyone interested in Tor network health
Duration: 1 hour
Prep
You do not need any prep to make this session. (Re-)reading the mail linked to in the Material section above is recommended, though.
Desired outcomes
- Having a prioritized list of things a network health team/person would work on
- Having a clear understanding where the network health work overlaps with other areas and how do avoid duplication of work
Notes
1. Overview
- Standards for good relays (community team owns most of it)
- Documenting community standards about good relays [gus has done some work - community team ownership]
- Best practices for relay families [nobody currently; phase 2]
- Detecting and resolving bad relays [bad relay team has been working on that]
- Anomaly Analysis (nobody; Network Health engineer would own)
- Baselines for performance, usage, load, etc [bad-relays team]
- Finding DoS issues [network team]
- Tracking relay connectivity [nobody currently; phase 2; phase 1 examine current data]
- Looking for resource limit issues [nobody currently; phase 1 - examine current data]
- Look for default bridges hitting resource limits [nobody currently]
- exception reports (maybe relay operators should get it separately) [nobody currently]
- bandwidth authorities behavior [nobody currently]
- Ensure current usage stats are accurate (nobody owns; network health should)
- Tracking users, performance, relays by various metrics [metrics team, do we have all the metrics we need/want?]
- count users [with network team and metrics team; but currently nobody actively]
- Monitoring bridge growth and usage [with censorship team, metrics team]
- Relay advocacy; creating a community (nobody owns; community team should)
- maintain docs for setting up and running relays and bridges [gus - community team (check whether it gets maintained]
- grow a cohesive community of relay operators so they have peers [nobody currently]
- Keeping relays on right Tor version [nobody currently]
- Gamification/rewards/incentives for relays [nobody currently, could be a fellow or intern]
- strengthen non-profits that run relays [nobody currently]
- Communicate with companies that make heavy use of the Tor network [nobody currently]
- Maintain components of network (nobody; network health/network team should)
- maintain directory authority relationships [okay]
- keep bandwidth authorities working (including setting the right balance between speed and location diversity) [juga, pastly, network team]
- have enough tor browser default bridges, and keep them running smoothly [with censorship team; s30]
- update the fallbackdirs list [teor, gus]
2. What should we do with funding
- Ticket brainstorming captured on post-its
- Categories:
- Active scanning (e.g. DNS errors, Exitmap, OnionPerf)
- Metrics scripts/apps/APIs for querying/custom reports
- Analysis of reports (make analysis of data we have easier)
- Relay reporting
- Resource use (CPU, memory pressure, ...)
- Protocol violations
- Heartbeat log (Getting data out of the relay; What is acceptable here?)
- Bugfixes to user counting, other metrics
- Metrics for estimating how relay campaigns go (diff of capacity, numbers...)
3. Who owns what; who should own what?
- Documented above