Create a tool to detect issues in the bandwidth files given their key/values
Bandwidth files contain structured key/value data describing relays and their measured bandwidth over time. However, there is currently no standalone tool that systematically validates these files for logical inconsistencies or suspicious values. It would be useful to create a tool that parses bandwidth files and reports potential issues based on the contained key/value pairs. Example checks: - A relay has been seen in fewer consensuses than expected given other fields. - Inconsistencies between “known” and “measured” metrics. - Issues similar to those previously tracked in legacy/trac#29954 and related child tickets. - Potential incorporation of logic from legacy/trac#30735. ### Why this matters Bandwidth files influence how relays are weighted in the Tor network. If inconsistencies or anomalies go undetected: - Relay weights may be inaccurate. - Measurement data may silently degrade. - Bugs in the measurement pipeline may go unnoticed. - Historical regressions may reappear without warning. Having a validation tool improves: - Debuggability - Data quality assurance - CI integration for measurement pipelines - Transparency for operators and developers ### Goals Build a standalone tool (CLI preferred) that: - Parses bandwidth files. (Can leverage existing services, like [descriptorParser](gitlab.torproject.org/tpo/network-health/metrics/descriptorParser/) - Extracts relay-level key/value entries. - Runs a series of validation rules. - Outputs a structured report of detected issues. The tool should be usable: - Manually (CLI invocation), - In CI pipelines, - For regression detection. ### Example Validation Rules 1. Consensus Count Consistency If: `consensus_count` (or similar field) indicates a relay was known for N consensuses, But other fields suggest fewer observations, report: ``` Relay <fingerprint> seen in fewer consensuses than expected. Expected: X Observed: Y ``` 2. Missing Required Keys If required keys are absent for a relay entry: flag as error or warning. Example: `Relay <fingerprint> missing required key: bw` 3. Invalid Value Ranges Examples: - Negative bandwidth values. - Timestamps in the future. - Zero values where not allowed. - Percentiles outside valid ranges. 4. Cross-Field Logical Inconsistencies Examples: - measured_at timestamp older than published_at. - Relay marked as measured but missing measurement result. - Relay marked “unmeasured” but has non-zero bandwidth. 5. Historical/Legacy Checks Reimplement or integrate checks previously discussed in: - legacy/trac#29954 and child tickets - legacy/trac#30735 The goal is to prevent known classes of data issues from resurfacing.
issue