Update gsoc pages. authored by Gaba's avatar Gaba
The following is a page to hold all the Google Summer of Code's projects for 2025. You can find [projects](gsoc-previous-years) we had in previous years in [this page](gsoc-previous-years). The following is a page to hold all the Google Summer of Code's projects for next year. You can find [projects](gsoc-previous-years) we had in previous years in [this page](gsoc-previous-years).
## Why do the Tor project participates in GSoC? ## Why do the Tor project participates in GSoC?
...@@ -11,119 +11,3 @@ Tor, as an open-source project, benefits from participating in programs that: ...@@ -11,119 +11,3 @@ Tor, as an open-source project, benefits from participating in programs that:
- provide a structure to mentor a student - provide a structure to mentor a student
- pay students so time is not a barrier to participation - pay students so time is not a barrier to participation
- mentors learn and practice leadership and communication skills - mentors learn and practice leadership and communication skills
# GOOGLE SUMMER OF CODE'S PROJECTS FOR 2025
## 1. Project "Rewrite metrics-lib in Rust"
- Mentors:
- [Hiro](https://gitlab.torproject.org/hiro)
- [Sarthik](https://gitlab.torproject.org/sarthikg)
- Hours required: 350 hours
- Skills required: Rust + some Java
- Expected outcome: **A library to process Tor network documents in Rust is available.**
- Difficulty: medium
### Background
Tor Metrics Library is a Java library that fetches and parses Tor descriptors.
Metrics lib provides a Java API for processing Tor network data from the [CollecTor](https://collector.torproject.org) service for statistical analysis and for building services and applications.
### Proposal
The metrics pipeline is being restructured and is slowly moving away from a mostly JAVA codebase to a rust and python tool belt.
This project would involve a complete re-implementation of the Tor metrics library in Rust.
[Metrics-lib](https://metrics.torproject.org/metrics-lib.html) is a JAVA based API that is used to parse and validate Tor network documents. The rust rewrite should provide the same parsing and validation functionalities provided by metrics-lib and in addition allow exporting of the documents in some external storage, like parquet files to be saved into object storage or a table on a postgresql database.
_Resources:_
- https://gitlab.torproject.org/tpo/network-health/metrics/library
- https://metrics.torproject.org/metrics-lib.html
- https://gitlab.torproject.org/tpo/network-health/metrics/descriptorParser/
- https://gitlab.torproject.org/tpo/network-health/team/-/wikis/metrics/development/home
## 2. Project "Onion Service Support Tooling for Arti"
- Mentors:
- [Gabi](https://gitlab.torproject.org/gabi-250/)
- [Wesley](https://gitlab.torproject.org/wesleyac/)
- Hours required: 175 hours
- Skills required: Knowledge of Rust (experience with async programming is a plus); ability to use Git
- Expected outcome: **the `arti` CLI is extended with more commands for key and state management; constructive discussions leading to changes to, or recommendations for, Arti's APIs and documentation.**
- Difficulty: Medium
### Problem
Arti has two state management subcommands, `arti hss` and `arti hsc`, for managing the state of onion services and onion service clients, respectively. These commands are currently very limited in functionality, and do not support many of the features onion service clients and operators will require.
### Proposal
This project is about contributing to the tooling onion service clients and operators will need for managing the on-disk state and keys of their Arti onion services. It will involve extending the existing state management commands, as well as potentially adding new ones, and contributing to Arti's APIs and
documentation.
For example, the extra functionality we need includes but is not limited to:
- a subcommand for listing keys and certificates from the configured keystores
- a subcommand for performing consistency, validity, and integrity checks on the specified stores (this might also take an optional `--fix` flag, to fix the detected issues, if possible)
- an `arti hss destroy-and-recreate` subcommand, for generating a new identity (set of keys) for an existing onion service (this command will replace all the keys and state of the service)
- an `arti hss destroy` subcommand, for removing the persistent state and all the keys of a onion service.
- miscellaneous low-level "plumbing" subcommands, which deal with individual files from the keystore and state directories (for example. `arti keys-raw remove-by-path`)
- a C Tor to Arti key migration tool, which will enable onion service operators to seamlessly migrate from C Tor to Arti
- field-formatted output to be easily parseable by other programs (maybe enabled by a special flag) (when/if makes sense). Similar (but maybe better) functionality as `gpg(1)` `--with-colons` and many other CLI tools
- `man` pages for each CLI or subcommand
A successful project will involve implementing some, or all, of the functionality
described above.
### Resources
- The Arti repository: https://gitlab.torproject.org/tpo/core/arti
- About onion services: https://onionservices.torproject.org/technology/
- Table comparing the C Tor and Arti onion service implementations: https://onionservices.torproject.org/dev/implementations/
- The original state management CLI implementation plan: https://gitlab.torproject.org/tpo/core/arti/-/blob/main/doc/dev/notes/state-management-cli.md
## 3. Relay to relay connectivity in the Tor network
- mentors:
- [juga](https://gitlab.torproject.org/juga)
- [gk](https://gitlab.torproject.org/gk)
- hours: 175h
- skills:
- rust
- data analysis
- graph theory
- graph databases
- expected outcome:
- updated and streamlined partitioning detection tool (erpc)
- have a module to analyze the partitions in the Tor network and visualize it
- difficulty: medium
- contact us joining https://matrix.to/#/#tor-network-health:matrix.org
- project documentation: https://tpo.pages.torproject.net/network-health/erpc
### Problem
In an ideal world, any Tor relay would be able to reach any other Tor relay when trying to build paths through the network, as partitioning in the Tor network is bad for Tor's anonymity guarantees. During GSoC 2023
[erpc](https://gitlab.torproject.org/tpo/network-health/erpc) got built, which is a tool
implemented in Rust to check for partitions in the Tor network by building two hop
circuits between all the relays.
It stores the results in a graph database (neo4j). The graph vertices are the fingerprints of the relay and the edges are the relay pairs involved in the circuit. Also stored is the message obtained building the circuit and the timestamp.
This data needs to be analyzed to find partitions in the Tor network and present them in a meaningful way.
### Proposal
This project would involve updating and optimizing erpc to keep our dataset manageable. Additionally, it needs research into which algorithms are most suitable to find the partitions in the Tor network. Since the network is currently stored as a directed graph, we can apply community detection and clustering algorithms.
Neo4j already offers several clustering algorithms within its Graph Data Science (GDS) library.
The project would also involve writing the code to apply the partitioning algorithms and present the results.
For example, a first approach could be listing the relay fingerprints and the number of other relays they're able to build a circuit to.
This can be further improved in several ways, for instance by adding properties to the vertices like country, ASN, flags, family, etc. This would allow it to analyze separately the cyclic non-exits subgraphs and the acyclic exits subgraphs. It'd also be possible to detect whether some subgraphs are not connected because of families, ASN or other reasons.
During the analysis and implementation it would be helpful to visualize parts of the graph, therefore the project would also involve to select some open source graph visualization tool and also implement the code to automaticaly analyze and visualize subgraphs.
### Resources:
- https://gitlab.torproject.org/tpo/team/-/wikis/gsoc-previous-years#1-relay-to-relay-connectivity-in-the-tor-network
- https://en.wikipedia.org/wiki/Graph_partition