... | ... | @@ -101,4 +101,39 @@ described above. |
|
|
- Table comparing the C Tor and Arti onion service implementations: https://onionservices.torproject.org/dev/implementations/
|
|
|
- The original state management CLI implementation plan: https://gitlab.torproject.org/tpo/core/arti/-/blob/main/doc/dev/notes/state-management-cli.md
|
|
|
|
|
|
## 3. Relay to relay connectivity in the Tor network
|
|
|
|
|
|
- mentors: juga, GeKo
|
|
|
- hours: 175h
|
|
|
- skills:
|
|
|
- rust
|
|
|
- data analysis
|
|
|
- graph theory
|
|
|
- graph databases
|
|
|
- expected outcome:
|
|
|
- updated and streamlined partitioning detection tool (erpc)
|
|
|
- have a module to analyze the partitions in the Tor network and visualize it
|
|
|
- difficulty: medium
|
|
|
|
|
|
### Problem
|
|
|
|
|
|
In an ideal world, any Tor relay would be able to reach any other Tor relay when trying to build paths through the network, as partitioning in the Tor network is bad for Tor's anonymity guarantees. During GSoC 2023
|
|
|
[erpc](https://gitlab.torproject.org/tpo/network-health/erpc) got built, which is a tool
|
|
|
implemented in Rust to check for partitions in the Tor network by building two hop
|
|
|
circuits between all the relays.
|
|
|
It stores the results in a graph database (neo4j). The graph vertices are the fingerprints of the relay and the edges are the relay pairs involved in the circuit. Also stored is the message obtained building the circuit and the timestamp.
|
|
|
|
|
|
This data needs to be analyzed to find partitions in the Tor network and present them in a meaningful way.
|
|
|
|
|
|
### Proposal
|
|
|
|
|
|
This project would involve updating and optimizing erpc to keep our dataset manageable. Additionally, it needs research into which algorithms are most suitable to find the partitions in the Tor network. Since the network is currently stored as a directed graph, we can apply community detection and clustering algorithms.
|
|
|
Neo4j already offers several clustering algorithms within its Graph Data Science (GDS) library.
|
|
|
The project would also involve writing the code to apply the partitioning algorithms and present the results.
|
|
|
For example, a first approach could be listing the relay fingerprints and the number of other relays they're able to build a circuit to.
|
|
|
This can be further improved in several ways, for instance by adding properties to the vertices like country, ASN, flags, family, etc. This would allow it to analyze separately the cyclic non-exits subgraphs and the acyclic exits subgraphs. It'd also be possible to detect whether some subgraphs are not connected because of families, ASN or other reasons.
|
|
|
During the analysis and implementation it would be helpful to visualize parts of the graph, therefore the project would also involve to select some open source graph visualization tool and also implement the code to automaticaly analyze and visualize subgraphs.
|
|
|
|
|
|
Resources:
|
|
|
- https://gitlab.torproject.org/tpo/team/-/wikis/gsoc-previous-years#1-relay-to-relay-connectivity-in-the-tor-network
|
|
|
- https://en.wikipedia.org/wiki/Graph_partition |