Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
Trac
Trac
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Operations
    • Operations
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • Legacy
  • TracTrac
  • Issues
  • #4439

Closed
Open
Opened Nov 08, 2011 by Karsten Loesing@karsten

Develop a Java/Python API that wraps relay descriptor sources and provides unified access to them

Quite a few metrics tools are processing archived and current relay descriptors to provide aggregate statistics, make descriptor archives searchable, or monitor the Tor network. These tools have a non-trivial amount of code in common that imports relay descriptors from various sources. Copying code is bad. Let's write an API that all these metrics tools can use and that facilitates developing new tools.

Note that this API is different from existing Tor controller APIs which connect to a Tor's control port and provide descriptors that the Tor process knows about. The new API won't connect to a Tor control port (even though it would be possible, but it's not required), but it may read the cached descriptors from a Tor's data directory, along with importing relay descriptors from other sources. Of course, the two APIs can be combined, but there's also a reason for the API described here to exist separately. None of the metrics tools requires to control a Tor process.

There are two major sources for relay descriptors:

  • Local directories: We can read relay descriptors from the cached-* files of a local Tor data directory or from the output directory of the directory-archive script or metrics-db. Some of these local directories can grow quite large, so that we'll need an efficient way to exclude descriptors that we already know. Also, some files contained in these directories may contain multiple relay descriptors while others don't. We'll want to support an arbitrary number of local directories in the new API.

  • Directory authorities/mirrors: We can download relay descriptors from the directory authorities or directory mirrors via Tor's directory protocol. We should restrict downloads to the minimum and only download missing descriptors. We should also download compressed descriptors if possible. In some cases we're interested whether a directory authority serves a descriptor (e.g., consensus-health script). In most cases we want to set a timeout for downloading descriptors.

We should design the new API in a way that it's stateless with respect to different executions and that it doesn't have its own configuration. A tool that uses the API should first initialize the API by creating relay descriptor data sources and then requesting descriptors to process.

The following tools may use the new API once it's ready: metrics-db, the part of metrics-web that aggregates statistics, the ExoneraTor database, the relay search database, the consensus-health script, the descriptor-health script, and the basic monitoring infrastructure.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: legacy/trac#4439