Skip to content

O.1.4. Evaluate and deploy tools for data management

One of our key objectives is to evaluate and deploy tools that improve data management across the pipeline. This includes assessing solutions for storage, querying, retention, and archiving to ensure they meet our performance, scalability, and reliability requirements.

This objective builds upon O.1.2 and O.1.3 by focusing on three main services—aggregator-rs, collector, and tor_fusion—which have been identified as critical to stabilizing our data pipeline. Prioritizing these components will help ensure reliable data processing and improve overall system robustness.

Here are three key results that will mark the successful completion of this objective:

  • Deploy aggregator-rs to process relays and bridges statuses We plan to deploy aggregator-rs to process the status data of relays and bridges more efficiently. By leveraging aggregator-rs, we can handle higher data throughput, reduce processing latency, and streamline the overall pipeline. This deployment is a key step toward modernizing our infrastructure.

  • https://gitlab.torproject.org/tpo/network-health/metrics/aggreagator-rs

  • Restructure tor_fusion to compute aggregate statistics about the Tor network more effectively. While it currently processes data for relays and bridges, support for onion services statistics is still under develpment.

  • tpo/network-health/metrics/tor_fusion#9

  • Rewrite collector to improve performance and enhance its ability to recover from failures. The current implementation struggles with processing large quantities of data during specific network events, and the new version will be optimized for faster data ingestion, better fault tolerance, and more robust recovery mechanisms.

Edited by Hiro
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information