Skip to content
Snippets Groups Projects

tor_fusion: parsing tor network data efficiently in rust.

tor_fusion is a project to parse Tor network documents in the Rust programming language.

Links:

This is the README for the tor_fusion project as a whole. If you want find more practical information regarding parsing Tor network documents and extracting metrics you might want to check out these links:

Why rewrite how network documents are parsed?

The data analysis community is evolving and moving to different tools then when the metrics pipeline was first developed.

At the same time we have way more data produced from nodes and services on the public Tor network then when metrics started as a project in Tor.

We are in the process of restructuring our pipeline so that is easier to maintain over time, but also so that we are able to offer better resources to our community and process data more efficiently.

Rust stands out as a practical choice for processing Tor network metrics, due to its performance and security features.

Rust offers efficiency when handling large datasets. Additionally, developers can use an array of libraries explicitly designed for data analysis to streamline data processing that is not specific to Tor, offering more flexibility to researchers.

What documents are supported?

We are currently only parsing onionperf analysis files. The long term plan is to embed Arti to download and parse all types of documents produced by the various network nodes and services.

Deployment on Tor Project machines

tor_fusion is deployed on metricsdb-01.torproject.org machine via puppet and runs alongside descriptorParser on metricsdb.

When new code is merged into main it gets deployed automagically and built on the machine directly.

The scripts used to build and run tor fusions are also deployed via puppet from the metrics-bin repository.

Run

First build tor_fusion via cargo:

$ cargo build --release

Latest tested versions are: rustc 1.77.0 (aedd173a2 2024-03-17) cargo 1.77.0 (3fe68eabf 2024-02-29)

You need to configure a postgresql DB to load the data into via a config.toml file. You can check the example provided in this repository: config.toml.example

The tables needed by tor_fusion can be checked from the metrics-sql-tables repository.

Then:

# decompress the onionperf analysis file:
$ xz -d onionperf-analysis.json.xz
# run the binary against the json file:
$ ./target/release/tor_fusion onionperf-analysis.json