Skip to content

O.1.2. Review and update data collection tools

We should conduct a thorough review of the tools responsible for data collection to ensure they are effective, efficient, and aligned with the team’s evolving needs. Our current data collection pipeline consists of the following components:

  • descriptorParser: This tool parses raw descriptor data into a structured format for database ingestion. As part of the review, we should assess how to ingest and archive historical data in a way that remains accessible to downstream services, including the metrics service and the network status API.

  • collector: As the primary data aggregation utility. Collector gathers input from various sources. We should evaluate its performance, robustness in error handling, and its ability to handle varying volumes of documents, including potential future growth. A planned rewrite of the service aims to improve memory usage and overall stability.

  • tor-fusion: This service performs network-wide data aggregation. We should revisit its overall design and review the structure and content of the data it stores to identify opportunities for optimization, improved transparency, and better alignment with current analysis needs.

Edited by Hiro
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information