The proposed project involves two elements: (1) porting guard, middle, exit nodes, bridge servers, and directory authorities, including support for pluggable transports, from the C implementation to Arti and (2) working with directory authority and relay operators to transition the software running their relays from the C implementation to Arti.
The impacts of this project are wide ranging:
- Because moving to Rust means a drastic reduction in risk of exploits made possible by C vulnerabilities, end users are safer—and more able to confidently and securely do their work, organize, and exercise their human rights online.
- Moving to Rust will increase the pace of development of new features. This means that the Tor Project can more effectively react to fast-changing censorship and surveillance threats, releasing improvements for users faster. Improvements reaching users faster means they will have a more reliable circumvention option, even during turbulent situations.
- This project makes the infrastructure behind the Tor network more sustainable for a relatively small nonprofit to maintain. The project’s limited developer time can be used more efficiently and their work can benefit users faster.
- Finally, Arti makes it easier for other applications to embed Tor, which is a request we hear from developers frequently. Making Tor easier to embed means that users will have more secure, private circumvention options for their online activities beyond the applications the Tor Project develops.
**The goal of this project is to make users safer by ensuring the network is powered by a more secure implementation of Tor. **
**Objective 1: Arti supports directory authorities**
Directory authorities are a group of special-purpose relays on the Tor network that maintain the list of currently-running relays and every hour publish a consensus view of the network. In Tor terms, a consensus is a single signed document compiled and voted on by the directory authorities once per hour, ensuring that all clients have the same information about the relays that make up the Tor network. Currently eight relays are considered directory authorities and have been chosen to be in these positions because of their operator’s long-term contributions to the Tor network and Tor community.1
To achieve this Objective, we will port Tor’s existing directory authority implementation to Rust and establish a minimally-disruptive path to migrate existing directory authorities to the new code base. This Objective requires explicit, collaborative communication with directory authority operators, conducting tests with them, and engaging them in the creation of a transition plan.
O1.1 Port the ability for directory authorities to form a view of network status: In this Activity, we will develop in Arti the mechanisms necessary for directory authorities to receive, manage, probe, track, consume, and configure different kinds of information from all Tor relays in order to form the network status view that is distributed to clients. Directory authorities collect information like reliability, bandwidth measurements, and router descriptors. They additionally assign status flags to relays (see O1.8 for status flag implementation details)—all of this needs to be transitioned from C to Arti.
O1.2 Port the ability for directory authorities to manage authority keys: Each directory authority has a “directory signing key.” Directory authorities use this key to provide a signed list of all the known relays to clients using the network. This means that unless an adversary can control a majority of the directory authorities, they can't trick a client into using other Tor relays. This Activity includes developing ways for directory authorities to generate and manage signing and identity keys, to store keys in an encrypted way, to consume authority signing keys and certificates, and to alert when keys and certificates are close to expiration.
O1.3 Port the system for directory authorities to generate votes: Directory authorities vote on a consensus view of the network once per hour and thus need a mechanism to generate these votes. In this Activity, we will port the directory authority’s system to generate votes into Arti.
O1.4 Port the ability for directory authorities to compute consensus documents: Directory authorities need to take the votes and form, encode, compute, and sign the consensus once per hour. This process includes computing a consensus on included relays, descriptors and microdescriptors for each relay, flags and other properties of each relay, parameters and the weight of these parameters, and recommended versions of Tor. In this Activity, we will port the system for directory authorities to generate votes into Arti.
O1.5 Port directory authority voting protocol: The directory authorities use a protocol to handle how they vote. This protocol instructs directory authorities how to post votes to other authorities, how to download votes from other authorities, what timing to use for voting operations, how to push and fetch consensus signatures, and how to publish the consensus. In this Activity, we will port the directory authority voting protocol into Arti.
O1.6 Run conformance testing on directory authority voting protocol: In this Activity, we will fuzz test the performance of the consensus voting protocol to ensure that it is well-behaved under pathological inputs. We’ll also fuzz test the protocol against the C implementation’s protocol to ensure that both protocols give the same results.
O1.7 Add directory authority denial of service (DoS) protections: Directory authorities, like all parts of the Tor network, need protections against DoS attacks. In this Activity, we will port directory authority-specific DoS protection code to Arti. Denial-of-service attacks against Tor and against the directory authorities are real and observed in the wild. Over the last several years, DoS attacks have increased and have been negatively affecting the network while also negatively impacting the work we’ve completed to improve performance and usability of Tor. The defenses that Tor currently has against these attacks were added because without them, the DoS attacks would be successful at degrading network performance, security, or correctness. If we do not port these defenses to Arti, we expect that these attacks will once again be used on the live network, causing authorities to crash or degrade their performance beyond the point of usability.
In addition to the DoS protections that regular relays and directory caches require, directory authorities need to defend against excessive numbers of concurrent HTTP requests, excessive uploads, and other forms of socket and memory exhaustion. We will implement in Rust the same set of denial-of-service protections for directory authorities that currently exist in C. Most of these features are implemented in the mainline C Tor distribution in the same places and/or via the same paths as our other out-of-memory, socket-exhaustion, or address-duplication code. Through the years, the directory authority operators have also added small patches to their setup to improve resistance to DoS attacks. We will solicit these patches from the directory authority operators and port their functionality to Arti as well, if we believe it is necessary and feasible.
We will identify the mechanisms used to prevent denial-of-service attempts against directory authorities in the C implementation and double-check that they are well and correctly documented and specified, improving or extending our documentation if necessary. Then we will ensure that each of these mechanisms has a corresponding implementation in Arti.
O1.8 Add directory authority security improvements: In this Activity, we will implement necessary changes to improve the security of directory authorities that have been raised by the research community, including resolving long-standing relay flag assignment issues and putting into place a mechanism to protect against hidden tampering with consensus documents.
Tor directory authorities assign "flags" to relays, depending on various conditions. These flags are used to place relays into different circuit positions (eg. "Guard", "Exit"), circuit properties (eg. "Fast", "Stable"), or roles (eg. "Authority"). The flags are voted on by each individual directory authority, each hour, depending on their view of the Tor network. Those votes are then merged in the consensus to produce an unified view of each relay in the network.
The current methodology for how directory authorities choose which flags to assign to relays is detailed in the Tor directory protocol specification, v3, Section 3.4.2. There are known issues with flag assignments concerning "Fast", "Stable" and "Guard" flags that have existed for a significant amount of time. These issues are various edge cases that have possible security exposure concerns for users; network attack vectors; new features, such as Congestion Control, exposing flag assignment problems; long-standing flag-weight balancing equation issues resulting in imbalanced or inconsistent applications causing flags to flap back and forth causing churn; and relays getting the "Exit" flag when they have no ability to actually exit traffic.
Resolving these flag assignment issues in directory authorities has been hampered by the nature of the C implementation. During the directory authority re-implementation work, we must implement flag assignment, so this is an opportunity to make progress on resolving these issues with new development work. If we do not fix the flag assignment issues that have persisted in Tor, we will be forced to implement the buggy flag assignment that exists in the C implementation and reproduce the bugs deliberately to maintain compatibility.
In addressing directory authority flag assignment issues within the scope of this Activity, our approach for determining which issues need attention involves integrating the implementation of our specified flag assignment specifications with evaluations of appropriate strategies for resolving outstanding assignment concerns. We will employ the following criteria in making these determinations for prioritizing and evaluating our success:
• The urgency and severity of the directory authority flag assignment issue
• The clarity and simplicity of the solution to address incorrect, bad, or otherwise harmful behavior
• The balance between addressing the issue and maintaining existing functionality within the directory authority re-implementation project timeline and budget constraints
• The extent to which unresolved issues hinder or delay the implementation of flag assignments in the directory authority context
• Whether resolving the issue may introduce any privacy, security, or network disruptive implications
After evaluating issues and solutions based on these criteria, we will proceed with implementing the most appropriate path forward. Our success in this process will be measured by assessing consensus documents produced by directory authorities to confirm that flag assignments are being properly implemented and assignment issues have been resolved.
Any changes to directory authority flag assignments that deviate from existing specifications must undergo our established process for updating specifications to accurately reflect actual implemented directory authority flag assignments, as detailed in Tor's proposal process and further refined in O3.4.
Additional effort in this Activity is allocated to implementing consensus hash chains. It is possible for directory authorities to either tamper with consensus documents or deliver targeted consensus to specific clients without detection. Detecting and alerting on consensus equivocation lays the groundwork for developing mitigation strategies to defend against such an attack. In order to detect when such attacks happen, we have developed a proposal (defined in Tor Proposal #2392) that details the implementation of a consensus hash chain. As part of this Activity, we intend to implement this proposal, which will allow for directory authorities to validate and thus detect if the previous consensus was incompatible, and raise warnings. Clients would then be able to disable operations, for safety reasons, if this were to occur.
As with any proposed implementation of new development, the development process often can expose unspecified, or other necessary deviations from the proposal itself. Any such deviations from this proposal encountered through implementation will be addressed and noted through Tor’s proposal process with the end result ultimately defined in our specification documents.
O1.9 Develop C implementation to Arti migration tool: In this Activity, we will develop a mechanism that will allow directory authority operators to migrate their directory authority from C to Arti. This includes evaluating available mechanisms for transitioning from the C implementation to Rust and building additional mechanisms as needed.
Directory authority operators are volunteering their time and resources to maintain these important servers in the Tor network. For the transition to an Arti Tor network to be successful, we need to help operators as much as possible to transition their servers.
We will create a migration tool that can accept the configuration and state of a C directory authority as input, and provide the configuration and state of an Arti directory authority as output. We may also identify further requirements of this tool as a part of refining the transition plan described in O1.12. Since the exact set of data and configurations to be migrated is somewhat flexible, we will evaluate the quality of our solution based on its reported ease-of-use among authority operators and by its ability, in testing, to transform a working C authority configuration into a working Rust authority configuration.
If we do not provide this tool, directory authority operators will need to perform this migration process manually, which creates risk of hard-to-debug and/or security-critical mistakes, which is likely to meet opposition among directory authority operators.
O1.10 Port extra-info handling mechanisms: Extra-info relay descriptors contain relay information that Tor clients do not need in order to function. These are self-published, like server descriptors, but not downloaded by clients by default. Extra-info documents are the primary place where network statistics are recorded and reported by Tor relays. If authorities do not receive and report this information, there is no way to monitor and report information about the state of the network.
In this Activity, we will port mechanisms that allow directory authorities to parse, receive, and serve extra-info descriptors. The feature is receiving, storing, and serving extra-info documents as described in `dir-spec.txt` sections 3.2, 3.7, and appendix B. We will ensure, while writing the code that handles caching, fetching, receiving, and storing router descriptors, that it also accepts and validates extra-info documents. You can find authority support for this feature at the function dirserver_add_extrainfo() in the C code base.
O1.11 Port features to run as a bridge authority: A bridge authority is a special-purpose relay that maintains the list of bridges in the Tor network. The bridge authority is distinct from directory authorities because it does not vote in the consensus protocol. Instead, it serves to aggregate relay descriptors sent to it by bridges, checking their cryptographic validity and testing that the bridges’ ORPorts within these descriptors are reachable. It then sends these descriptors to the bridge distribution mechanisms (e.g., BridgeDB, rdsys), which distribute bridges to clients. In this Activity, we will port the existing bridge authority implementation to Arti.
Bridges are a key part of Tor's anti-censorship mechanism. Without a bridge authority, bridges can't be cataloged and distributed to users. If we do not implement bridge authority functionality in Arti, then we would have to either drop anti-censorship support, redesign Tor’s entire anti-censorship system not to require a bridge authority, or leave this key component of our infrastructure implemented in C.
We will develop the features and configuration options that make a bridge authority differ in its behavior from a regular authority. (With a bridge authority, descriptors are not voted upon or published publicly, but are rather delivered out-of-band to a bridge distribution mechanism.) The rules for testing bridges are also different, and some request types are disabled in order to make sure that bridge descriptors are not leaked. In C, the functionality that implements bridge authorities is concentrated in the `bridgeauth.c` file, in the code that it calls, and the code surrounding it.
We will audit the behavior of the C bridge authority and ensure that it matches our specifications, creating new documentation as needed. Then we will, for each of the features described and behaviors required, implement it in Rust and make sure that it is tested and correct.
O1.12 Work with directory authority operators to plan transition from C to Arti: In this Activity, we will plan a parallel phase-in/phase-out approach for directory authorities, in close collaboration with these operators, to transition to Arti. This requires communication and careful coordination with the directory authorities to facilitate a smooth transition. Transitioning directory authorities involves multiple delicate, critical components that need to be carefully handled in a well-coordinated way. As part of this process, we will gather feedback from the directory authorities, who have significant experience running Tor relays, in order to make improvements that ease transition.
Administration of directory authorities is substantially more complex than administration of relays: there are a number of tools that only authority operators use or require. Further, Tor's design means that the directory authorities cannot be arbitrarily reconfigured as independently as relays can: if the authorities do not agree exactly about certain key behaviors, they will not reach a consensus, and the network won't work. Therefore, the problem of planning and supporting an authority transition is substantially different from the problem of doing so for relay transitions.
In this Activity, we will first develop a phased transition plan to switch the network from C authorities to Arti authorities. Then we will consult with the authority operators, adjusting the plan as needed, to ensure that they believe the plan is achievable, sensible, and robust. If during this Activity we identify new necessary tooling for migration that we have not yet identified, we will develop those tools as part of O1.9. (We have not currently identified any such tooling, but any planning activity carries the possibility of finding unexpected second-order requirements.)
Objective 2: Arti supports all necessary relay types
To achieve this Objective, we will build relay mode in Arti such that it can replace Tor for relays in the network. This includes supporting all necessary relay types: guards, middle relays, exits, and bridges (with and without pluggable transports).
Notably absent from the above list are directory authorities. This work is distinctly defined in Objective 1 “Arti supports Directory Authorities.” Separating the work to port directory authorities from the work to port relays allows us to implement a multi-phased approach for a smoother transition of the network. The Tor relays on the network can effectively migrate to the Rust-based Arti code independently from migrating the directory authorities.
O2.1 Implement core relay functionality: Guard, middle, exit, and bridge relays all share the same core functionality, and in this Activity, we will port that functionality from the C implementation to Arti, and thoroughly test the functionality to ensure we achieve feature parity. This includes porting the way relays handle handshakes; architecture for handling cells; ORPort implementation; mechanisms for key management and descriptor generation; support for directory cache and DirPort; mechanisms for use of the consensus, and relay-cell forwarding on circuits.
O2.2 Port exit relay support: Exit relays need specific features. In this Activity, we will port to Arti, from the C implementation, functionality required to support exit relays, and thoroughly test the features to ensure feature parity: exit support for TCP and UDP, policy support, and using DNS at exit.
O2.3 Port relay-side pluggable transport support: In this Activity, we will port to Arti the C Tor functionality required to support pluggable transports, and thoroughly test the porting work provides feature parity.
O2.4 Reimplement in Arti relay-side administrator support: In the final Activity of Objective 2, we will reimplement tools, that exist in C Tor, for relay operators to administer their relays, including implementing bandwidth accounting and restrictions, integrating core limits and systemd support, adding a mechanism for resource limitation handling, adding useful human-readable log entries, and adding syslog integration.
The goal in this Activity is specifically to bring feature parity to the administrative tools that the C implementation and its surrounding ecosystem provides to relay operators. We believe that we must do this because if we do not, relay operators will find Arti to be an unworkable or unsatisfactory replacement for the C implementation and they will not be able to migrate successfully.
First, we will describe the specific work that we know we must do in this Activity. Then we will describe why we believe there will be additional, emergent items in this area, how we will prioritize them, and how we will evaluate our success.
The specific known features to be reimplemented in Rust, and the development efforts to do so are as follows:
• Bandwidth limits are features used to control the average and burst number of bytes a relay uses. This feature allows relay operators to control how much bandwidth their relay uses. Bandwidth limit features allow relay operators to keep their ISP bills under control and ensure they only donate bandwidth that they can afford. (This corresponds to the C implementation's `BandwidthRate`, `BandwidthBurst`, `RelayBandwidthRate`, and `RelayBandwidthBurst` options, primarily implemented in `src/core/mainloop/connection.c` and `src/lib/evloop/token_bucket.c`.)
◦ At this time, we plan to reimplement this feature by employing wrappers on Arti's existing network backend code to record the amount of traffic sent or received on the network, and to throttle traffic when it would otherwise be greater than the amount permitted. We will try to identify existing library solutions that can integrate with our network backend, if appropriate. We will include a mechanism for configuring these limits.
• Bandwidth accounting is a feature used to limit the total number of bytes produced and consumed over a longer time frame of days, weeks, or months. This feature allows relay operators to control how a specified amount of bandwidth is used over time, which is another mechanism for relay operators to control their ISP bills and ensure they only donate bandwidth that they can afford. (This corresponds to the C implementation's `AccountingMax` option, primarily implemented in `src/feature/hibernate/`.)
◦ At this time, we plan to reimplement this feature by porting the C codebase's existing mechanisms for scheduling "hibernation" intervals to Arti. We'll integrate this code to monitor our total bandwidth usage (monitored as described in the item above) to decide when to begin "hibernating". The hibernation mechanism itself will extend Arti's current "dormancy" code to minimize functional duplication. The accounting code will use Arti's existing “persistent state manager" code to record partial activity to a state file so that relays can be rebooted without exceeding their bandwidth allocations.
• Limits on the number of CPU cores to be allocated to the relay process are features used to prevent exhaustion of resources and/or excessive hosting bills. Again, this allows relay operators to ensure they only donate the CPU resources that they can afford. (This corresponds to the C implementation's `NumCpus` option, primarily implemented in the `get_num_cpus()` function and in the other functions that call it.)
◦ At this time, we plan to reimplement these features by adding a configuration option to control the number of CPUs used, by using existing library code to learn the maximum number of cores available and to tell the operating system to limit the number of cores used (when applicable). For our existing asynchronous networking backends, we will write code so that they get configured with the correct number of worker threads to use the configured number of cores.
• Integration with the `systemd` service monitoring system is a feature is used to surface information about whether a relay has crashed or become unresponsive. This allows relay operators to effectively monitor the status of their relay. (In C this is primarily implemented via the blocks of code in `hibernate.c`, `mainloop.c`, and `main.c` marked with `#ifdef HAVE_SYSTEMD` and `#ifdef HAVE_SYSTEMD_209`.)
◦ At this time, we plan to reimplement this feature by adding optional "process status" features to our runtime, to expose information about Arti's PID and liveness, and then by exposing those features via APIs required by systemd.
• Resource limitation for maximum open network connection usage is a feature used to ensure that Tor relays don't exhaust system resources or crash other programs. (In C this is handled primarily via the `ConnLimit` option, and the `set_max_file_descriptors()` function and surrounding code.)
◦ At this time, we plan to reimplement this feature in two pieces. First, we will write code to detect the number of network connections available, to keep track of the number we are using, and to adjust system limits as configured. Second, we will write code to handle the case where we have run out of available connections gracefully, and not crash or produce excessive errors.
• Integration with existing system logging methods are features used to support better integration with existing sysadmin tooling. This allows relay operators to more easily monitor and maintain the status of their relay. (In C this is handled by the `Log syslog` option, and implemented in the logfile_deliver() function and related invocation and configuration code.)
◦ At this time, we plan to reimplement this feature by using an existing Rust "syslog" system library that integrates with our logging subsystem (currently the `tracing` library). If none exists, we will implement one.
• Ongoing improvements to logging and diagnostic messages reported by Arti are features used to help relay operators understand logs and easily monitor and maintain their relays. Improved messages also aid Tor developers to diagnose problems reported by relay operators. (In C this includes the totality of log messages throughout the codebase, and usability improvements such as rate-limiting log messages as implemented with the`log_fn_ratelim()` function.)
◦ At this time, we plan to reimplement these features by learning, from operators in production and on the test network, which information they find overly verbose and which information they find missing. Then we will add or downgrade logging as needed. If needed, we'll reimplement or adapt a mechanism for coalescing similar log messages to avoid filling up the logs.
Above and elsewhere, when we describe our plans for how to reimplement a given feature, we are referring to our current best anticipated plans. We may make changes to how we reimplement these given features as needed if we find that the originally anticipated means for delivering a feature can be replaced with one that can be delivered more effectively, efficiently, or reliably. We are outlining our planned development efforts in this way in order to clarify the anticipated workflow, but if we find a more effective path to deliver it, we will do so in the interests of cost containment and product quality.
Some effort in this Activity is allocated for solving issues raised by operators when it comes to administration of relays in Arti, based on the feedback we collect in O4.3. We will determine which issues raised by relay operators need to be resolved as part of this Activity by evaluating:
• The urgency and severity of the reported issue
• The difficulty of resolving the issue
• The extent to which the issue prevents or delays relay operators from migrating to Arti
• The extent to which the project budget and remaining time permit handling the request
• Whether or not the issue creates any privacy or security implications
After these requests are evaluated and we decide whether or not to resolve them, we will gauge our success based on whether relay operators report that these issues have been resolved and whether relay operators are able to migrate to Arti successfully, following the targets set in the M&E plan.
Objective 3: Arti is stable enough for general usage
To achieve this Objective, we must test and tune Arti so we can ensure the new relay implementation is stable enough for general usage on supported modern operating systems. Across this Objective we will be analyzing the privacy implications of Arti, its robustness, and ensuring that Arti builds reproducibly.
O3.1 Ensure parity with C implementation: We will begin this Objective with an Activity to ensure that Arti is in parity with the C implementation. To do so, we will reverse engineer security requirements from existing C code and translate those to Arti. Through this process we will be identifying missing features and implementing them, as well as evaluating privacy implications of Arti to ensure it does not introduce any new privacy vulnerabilities.
O3.2 Reimplement, develop, and test protections to external attacks: Our next Activity to ensure stability is to reimplement, develop, and test protections for Arti against malicious input and external attacks like DoS. We will then conduct extensive fuzzing to ensure these protections are working as expected and to improve them where they are failing. The C implementation includes numerous security features and practices that protect it from forms of active attack. These attacks are not theoretical—they have been observed on the actual network. If we did not reimplement these features, Arti would be susceptible to attacks that have already been observed on the live Tor network, including attacks that would enable an attacker to crash some or all of the network, degrade performance, or deanonymize user traffic. Relay operators and users would not be able to migrate safely under such conditions, and if they tried, we expect that active attackers would crash their relays at will.
Features that exist in the C codebase that we will port to Arti:
• Support for detection of and response to memory-related denial-of-service attacks. (The C implementation provides this feature via the `MaxMemInQueues` option and the related `cell_queues_check_size()` function.) Building this will require listing the places in our code where we allocate memory based on externally triggerable actions, and instrumenting those places to track their total amount of allocated memory. Then we'll need to implement code to monitor these total amounts and terminate connections or circuits (or take other appropriate responses) when the cell queues become too full. Our work here will be guided by the algorithms in the C implementation, which in turn are based on the work of Jansen, Tschorsch, Johnson, and Scheuermann (2014).
• Support for detection of and response to other resource-based denial-of-service attacks to prevent an attacker from opening huge numbers of connections or circuits from the same IP or network. (The C implementation provides this feature via the code in `src/core/or/dos.c`.) Building this will require analyzing and porting algorithms from that C file to Arti and refactoring them as appropriate to account for Arti's more modular code structure.
Beyond reimplementing existing protections against DoS attacks, we also must harden Arti against hostile inputs to reduce the likelihood of a successful exploitation of any programming errors in our codebase. We will do this by developing specific tests, utilizing those tests, and conducting extensive fuzzing that will allow us to detect and resolve security errors.
This is critical because the C implementation, flawed as the C language is, is the product of over 20 years of active research, refinement, and testing. As such, new successful exploits against it are rare. Although Rust is a safer language than C, and security-relevant mistakes are harder to make in Rust, Arti will not boast the same benefit of years of experience and code inspection by security researchers. As such, it is critical for the safety of Tor users that we take measures to limit the impact of any mistakes that we make during development.
Although these efforts are not reimplementing specific security features, they are efforts to bring Arti as close to parity with the C implementation as possible by replicating a process of thorough refinement and testing to discover and resolve exploits.
To accomplish this, we will be developing:
• Tests to ensure that unhandled programming errors (called "panics" in Rust) are contained to a single protocol context (a circuit, a stream, or a channel), and do not cause the termination of any more protocol objects than are needed. To develop these tests, we will add a fault injection mechanism to the stream, circuit, and channel mechanisms, in order to simulate the case where a programming error has caused a "panic," and verify that only the offending protocol object is closed. If we did not validate that the error containment mechanism worked, we would risk the possibility of shipping code without working error containment—and then having an error that otherwise would be minor in its effects turn into an opportunity for a network-wide DoS attack or worse.
We will also be writing tests using:
• Existing automated “fuzzing” tools to search inputs that can cause our program to crash. We will apply tools like `cargo-fuzz`, which is already used for Arti's existing input-handling code, to all of the new input-parsing code we write for this project. We will apply it to types of code that Arti does not currently expose for fuzzing—notably, our connection and circuit state machines. If we were to ship unfuzzed parser state machines, we would significantly raise the odds of an attacker discovering an unpatched vulnerability and using it to attack the network. (Attackers commonly use fuzzers themselves to find software vulnerabilities.)
Once these tests are in place, we will:
• Conduct testing and fuzzing on Arti, using the tools mentioned above. If we find errors or potential security issues as a result of running these tests, we will locate and resolve the bugs responsible.
We believe that this work is essential in order to ensure that we are delivering a secure project. Since programming mistakes can't be avoided completely, it's important to limit their impact and minimize the odds that hostile parties find them before they are fixed. Because Tor is a tool used by human rights defenders, journalists, minority communities, and marginalized populations that are the target of surveillance and attempted deanonymizing attacks online, delivering an insecure Arti implementation would put these users at great risk. To remove this Activity would be to downgrade the confidence we have in the security of our tool
Even if we are lucky enough not to have any relevant programming errors, by carrying out these mitigation strategies and being transparent about our process, we will increase confidence among relay operators that they will not lose security by migrating to Arti and increase confidence among users that the move to Arti will not decrease the efficacy of privacy and security properties they rely on when using Tor.
O3.3 Ensure stability on less common relay platforms: Finally, we will work on less common platforms used for relays—FreeBSD, OpenBSD, and Windows—that may have not been tested during work in Objectives 2 and ensure they are stable.
O3.4 Adjust to bring in alignment, as necessary, specification documents for all technologies and/or protocols being updated under this project: The Tor Project's development processes involve updating Tor specifications as Tor is developed. We document all security, privacy, and design requirements and features as part of authoring, updating, and maintaining all specifications. In this project in particular, we will rely on existing specifications to properly reimplement relay operations in Arti—thus, we will regularly reference, review, and update specification documents throughout the delivery of this project. If any deviations are found in the specifications, or any updates are necessary, we will reconcile or update them. If we discover one is wrong or inadequate, we will follow our standard practice for resolving issues in specifications:
• If the correction is obvious, update the specification immediately
• If the correction is not obvious, open a ticket to determine what correction is necessary. Determine, based on analysis of the C code and our protocol design, what the specification should say. Update the specification when this analysis is complete
• If the existing C behavior is incorrect, or the specification is overly tied to the C behavior, write a protocol-change proposal for the better behavior, leading to a specification change
We will follow the above practices to keep up-to-date the specification documents for all technologies and/or protocols being updated under this project, along with documentation for their security, privacy, and design requirements and features.
Our current specifications are maintained in our specifications and proposals repository. The specifications most relevant to Tor relay operation are: `tor-spec`, `cert-spec`, `dir-spec`, `dos-spec`, `ext-orport-spec` (for bridges), and `padding-spec`. Not every specification listed will need refinement or editing during the course of this project, but all of these specifications will be referenced and reviewed.
Additionally, some documents in the `proposals` subdirectory of the repository mentioned above —those marked with "FINISHED" status in 000-index.txt—are proposals that are implemented in some version of Tor; the proposals themselves still need to be merged into the specifications proper. We will ensure that any of these proposals that are relevant to the work completed in this project are merged into the appropriate specification.
Objective 4: Arti relay implementation performs as well or better than C implementation
To achieve this Objective, we must profile and verify Arti relay performance and ensure that Arti relay implementation performs as well or better than the C implementation of Tor.
O4.1 Ensure Arti performance is measurable: In this Activity, we will port or adapt the necessary tools for us to measure Arti’s relay performance using Shadow, a tool that allows us to simulate the real Tor network. This work allows us to measure the performance of Arti reproducibly and reliably, without conducting experiments on the live Tor network (thus, protecting users from any negative impacts). We will ensure that it is possible to use data directly from Arti to analyze and reason about performance characteristics of the application.
We will reimplement and/or adapt the following tools/features to measure Arti performance:
• OnionPerf3: measures performance of bulk file downloads over Tor. Together with its predecessor, Torperf, OnionPerf has been used to measure long-term performance trends in the C implementation of the Tor network since 2009. It can also be used to perform short-term experiments to compare the performance of different Tor configurations or implementations.
• MetricsPort4: exposes internal performance counters, gauges, etc., in the OpenMetrics format used by modern observability and monitoring systems. Used for requesting internal metrics data.
• Supporting Shadow scripts5: scripts that allow us to measure specific tuning experiments using Shadow.
Not all of these tools can use Arti directly, due to differing APIs and other external interfaces. In some cases, the best solution for reimplementation will involve adapting the tool to work with Arti, rewriting the tool to work with Arti, or extending Arti to work with the existing tool. In certain cases, we will expand these tools to measure new metrics that are not currently collected but are necessary to identify performance regression or problems and successfully demonstrate that Arti is behaving as expected. We will evaluate, document the evaluation, and choose an appropriate approach based on the expected effort and benefits of each option.
Beyond the reimplementation and/or adaptation of features and tools above, we will also implement tooling and infrastructure inside of Arti that will allow us to interpret timing data from the different phases of circuit handling throughout Arti.
Ensuring Arti performance is measurable is absolutely critical to the success of this project. Without the data we can collect as a result of this Activity, we will not be able to confirm that Arti offers performance and speed that matches, or is better, than the C implementation of Tor. Users and relay operators will not migrate to Arti if the performance is unknown or significantly worse. Performance metrics are critical for us to be able to evaluate Tor performance in simulations and on the real network, and they empower us to more easily identify why performance problems are occurring so we can fix them.
Equally important, the M&E Plan of this project relies on this Activity. We cannot meaningfully report on our performance-related Objectives or measure our Indicators without this Activity.
O4.2 Ensure Arti collects performance metrics and delivers them to the metrics pipeline. In this Activity we will make sure that Arti collects performance metrics and that these metrics are delivered and displayed to the public metrics portal (metrics.torproject.org) by ensuring Arti’s extra-info and metrics-port performance data is included in the metrics pipeline.
Specifically, we will ensure that Arti collects the same set of relevant network performance data as the C implementation and re-exports it to the existing metrics pipeline. (There is a small subset of usage metrics that are related to now-obsolete versions of the Tor protocols. Arti will not support these protocol versions so this information will not be collected and reported by Arti.)
The specific information to be collected is covered in the "extrainfo" document, which is specified in section 2.1.2 of `dir-spec.txt`6. In the C implementation, this information is exposed via the `router_build_fresh_signed_extrainfo()` function, and collected in various function-specific cases throughout the codebase.
Just like in O4.1, the collection and performance reporting of metrics is critical to the success of the project. Without this information, we would not be able to report important statistics, such as the number of users on the network, the number of users visiting from various countries, the usage of various pluggable transports, and the prevalence of different protocols. We use this information to detect network trends and anomalies, which is extremely important during a migration such as the one planned in this project.
Equally important, the M&E Plan of this project relies on this Activity. We cannot meaningfully report on our Objectives or measure our Indicators without this Activity.
O4.3 React to measured performance on Arti relays to improve tuning parameters: With the results made possible through O4.1 and O4.2, we will measure performance and tune Arti with that information, including tuning flow control and UDP buffering for exits. Our experience with C relay development leads us to expect that the first version of a major new networking program will have some unexpected performance issues. If we do not resolve these issues to the best of our ability before we encourage relay operators to upgrade to Arti, we will be knowingly shipping a tool with unknown, potentially degraded, performance quality, which negatively impacts the work of human rights defenders, journalists, and activists who use Tor. This is why it's critical to run performance testing and respond to the results of that testing.
Many of the existing network traffic scheduling algorithms that we plan to port from the C relay implementation have tuning parameters, but these parameters have been tuned in part based on their observed effects on the C implementation. We believe that Arti's network behavior will likely differ enough that those parameters or algorithms may not be optimal.
To address this, after we have implemented the performance metrics of O4.2 above, we will need to use them to:
• Observe performance on the test network and live network to identify any areas where performance lags behind that of the C implementation. We will also solicit reports of bad observed performance from users.
• Run Shadow simulations (see O4.2) with different settings for key parameters related to the algorithms of O4.4 to validate whether choices made for C are still reasonable for Arti.
• Adjust tuning parameters or algorithms as appropriate. We will choose interventions here based on simplicity and cost-to-impact ratio.
If we did not do this work, we could not confidently say that Arti would match or beat the performance of the C implementation and it's likely we would ship a version of Arti with unknown performance bottlenecks making it less efficient than the C relay implementation. Shipping a version of Arti with performance problems would undermine our effort to get relay operators to migrate and undermine user confidence in Tor. In order to release a version of Arti that at the very least matches the performance of the C implementation, this testing and tuning is a vital activity that cannot be removed.
O4.4 Adapt existing mechanisms for improving network performance: In this Activity we will adapt performance improvements that already exist in the C implementation for use in Arti.
The earliest versions of Tor (from around 2003-2004) used a "greedy" algorithm to handle traffic: as soon as a relay had something to send on the network, it would send it immediately, giving all data equal priority. This algorithm proved to be grossly inefficient: it resulted in bad performance for interactive applications such as chat or voice, since they were drowned out by bandwidth-heavy downloads. It also caused huge latencies, since operating system kernel buffers were always full and unable to send out new traffic until already-queued traffic was delivered.
Because of these problems, researchers developed, and we have subsequently implemented, several algorithmic improvements that C relays now use when choosing which traffic to handle first. We will reimplement these algorithmic improvements in Arti. If we did not implement these improvements in Arti relays, we expect that we would revert to the worse Tor performance of the past, which would lead to unsatisfied users, and relay operators unwilling to migrate to a worse-performing tool. This would undermine migration efforts.
The algorithms that we will reimplement from C into Arti are:
• Exponentially weighted moving average (EWMA) queue-history-based cell selection algorithm: Relays use this algorithm to decide which of several circuits on a channel should have priority for sending traffic. (This is principally implemented in `circuitmux.c` and `circuitmux_ewma.c` in our C implementation.)
◦ At this time, we plan to reimplement EWMA by extracting and specifying the current algorithm, then implementing it in Arti. This will likely require some refactoring on the existing Arti protocol code, which has been written to allow modular improvements in this area.
◦ Some details in the existing EWMA algorithm were selected with strong ties to the details of the C implementation and of `libevent`, its low-level network concurrency library. It is possible, but not certain, that we will discover some of these details are not a good match for the asynchronous futures-based network libraries (`tokio`, `async-std`, etc) used with Rust. If this occurs, we will adjust the algorithms as appropriate so that Arti relays can provide similar behavior to C relays without having to share their internal code structure. We will choose any such adjustments in consultation with researchers—when possible, those who designed the original algorithms. We will select adjustments in order to deliver comparable performance to the C implementation.
◦ In the event that it is deemed necessary to adjust the algorithms for compatibility with existing Rust practices, this would be a better end result and less costly than shipping a mismatched and unchanged algorithm because it would require rewriting external Rust libraries to match the C implementation behavior.
• Kernel informed socket transport (KIST) and/or "KIST-Lite" scheduling algorithms: Relays use these algorithms to decide globally which circuits across all channels should get priority first, based on observed kernel buffer sizes. (This is principally implemented in the C implementation in `scheduler.c` and `scheduler_kist.c`.)
◦ At this time, we plan to reimplement this by evaluating which of the algorithms will best conform to the details of our Rust implementation's network code, writing adapter code as needed to expose the details it needs, and investigate how to interface the newer kernel APIs with Arti’s async libraries—these newer APIs expose more information and events that should make it possible to schedule dynamically, thus further improving performance. Then we will need to write and test the appropriate scheduling code, based on our specifications and the C implementation, to ensure that all the circuits handled by Arti are multiplexed to share the relay’s bandwidth fairly, according to the selected algorithm.
◦ Similar to EWMA, some details in the existing KIST algorithms were selected with strong ties to the details of the C implementation, and it is possible, but not certain, that we will discover some of these details are not a good match for the asynchronous futures-based network libraries used with Rust. If this occurs, we will adjust the algorithms as appropriate so that Arti relays can provide similar behavior to C relays without having to share their internal code structure. We will select adjustments in order to deliver comparable performance to the C implementation.
◦ In the event that it is deemed necessary to adjust the algorithms for compatibility with existing Rust practices, this would be a better end result and less costly than shipping a mismatched and unchanged algorithm because it would require rewriting external Rust libraries to match the C implementation behavior.
The algorithms that we will need to re-adjust based on work in this Activity:
• Round-trip time congestion control (RTTCC) and Conflux: RTTCC is an algorithm to improve speed and reduce memory requirements for fast Tor relays by reducing queue lengths. Conflux7 is a dynamic traffic-splitting approach that assigns traffic to an overlay path based on its measured latency. Together, congestion control and Conflux are used to provide performance and improvements. (In the C implementation, RTTCC is implemented principally via the `src/core/or/conestion_control*.c` files. Conflux is principally implemented in the C implementation `src/core/or/conflux*.c`.)
◦ Congestion control and Conflux are already implemented in Arti for a specific part of the Tor protocol. We will need to complete some minor adaptation, tuning, and refactoring to both in order to maximize performance and adjust to work completed above in this Activity.
Objective 5: The Tor network is significantly transitioned to Arti Relays
In this Objective we will create the test network needed for testing all work created in this project, create a framework for evaluating the transition from the C implementation to Rust, develop any needed tools for relay operators to migrate their relays, and engage the community in a public campaign to encourage transition to Arti. This Objective is not about transitioning all relays to Arti—as stated above, the Tor Project does not control the volunteer relay operators and cannot force them to transition—but is about making a concerted effort to encourage a significant number of operators to transition through support and easy-to-use migration tools.
O5.1 Create and maintain a test network with all Tor relays: This Activity will take place at the beginning of the project, as the test network will be used across all Objectives to test, tune, and verify the work conducted. To create a test network, we will use virtual machines to deploy an entire Tor network using test versions of relays using iterative versions of Arti. We’ll also deploy test network support infrastructure, like the metrics pipeline, Prometheus, and log monitoring. This Activity will facilitate all other work in this project as well as the ability to collect information necessary for the Monitoring & Evaluation Plan.
O5.2 Develop and deploy tools for streamlined migration for relay operators: In this Activity we will develop tools that make it as easy as possible for existing relay operators to migrate their C relays to Arti. This includes specific migration tools as well as developing distribution packages for Debian, Fedora, and commonly used relay operating systems.
Relay operators are nearly all volunteers; they run relays in their spare time and with their own investment. Asking relay operators to migrate their relays to Arti is above and beyond a regular update of the C implementation—this will be a major change. Offering easy, well-suited tools to migrate relays improves the chances that operators will migrate their relays promptly and correctly without loss of security.
We will develop the following tools to support relay operators:
• Instructions and scripts as needed to convert data and configurations stored by the C implementation to and from a format usable by Arti. If we did not do this, operators would not be able to retain existing relay identities, statistics, or configurations when moving to Tor. Losing configuration information would cause relays to behave in a way that their operators didn't expect, like losing exit policies or families or resource limits. Losing identity information would cause the relays to be treated by the directory authorities and the rest of the network as if they were completely new relays, losing their history and reputation information. Note these are distinct migration scripts from O1.9, which is oriented around directory authority migration; these are specifically for relay migration.
• Packages of Arti for use with operating systems commonly used by relay operators. If we did not do this, operators would not be able to keep Arti up-to-date with the same processes they use for applying security updates to the rest of their systems and would likely either risk unpatched security bugs or have to adopt new and unfamiliar practices that would lower their odds of migration.
We believe that additionally, we will find that some operators rely on a variety of ad-hoc tools to configure, launch, monitor, maintain, and secure their Tor relays. We cannot and do not plan to port all of these tools to work with Arti, but we expect that there will be some cases where a small effort in adapting a tool will be paid off in terms of a large number of relay operators that become willing to migrate when it is adapted.
We will identify specific tools that the relay operator community would like us to port by engaging with this community in Objective 5. We will determine which tools are necessary to port by using the following evaluation criteria:
• Expected difficulty and effort needed to port or adapt the tool,
• Comparative difficulty in adjusting the tool to fit Arti versus adjusting Arti to fit the tool
• Reported necessity or utility of the tool for relay operators to migrate
• Expected chances that if we do not port the tool ourselves, somebody else will do so, as many such tools were originally developed by a member of the Tor community
• Extent to which the issue prevents or delays relay operators from migrating to Arti
O5.3 Create a group of early adopter relay operators to test Arti: The path to a successful transition includes involving relay operators in a testing process so we can gather and integrate their feedback before running a public transition campaign. This activity includes the work to build a list and do outreach to engaged relay operators for the purposes of experimenting with Arti; working with these operators to spin up Arti relays; gathering their feedback about the process; and writing documentation for operators to set up or transition relays to Arti and publish that documentation on community.torproject.org.
O5.4 Run public campaign to migrate to Arti: In this Activity, we will create an evaluation framework for monitoring the transition; set up and run a campaign for encouraging relay operators to transition from C to Arti; help relay operators transition; and use the evaluation framework to monitor the success of the campaign over time and adjust approach as necessary. To do so, we will:
• Create an evaluation framework for public campaign
◦ Investigate the state of the network at the start of O5.4 to better understand how to effectively compare C and Arti implementations. For example, we will research questions like:
▪ How many C-Tor relay groups do we have with more than one relay per IP address?
▪ How many relay operators do we have and how many of them have contact information?
▪ How many C-Tor relays can a single Arti replace per IP address?
◦ Based on the investigation above, define metrics to use in the evaluation framework, set baselines measurements of these metrics, and set targets of these metrics to use in the campaign evaluation framework. (If any of these metrics are also Indicators in the M&E plan, we will also update the M&E plan at this time.) Metrics for the evaluation framework, for example, may include:
▪ Number of relay families/operators transitioned from C to Arti over X days
▪ Number of bridge operators transitioned from C to Arti over X days
▪ Number of help requests received and addressed over X days
▪ Number of bug reports received and addressed over X days
▪ Amount of bandwidth provided by Arti relays
◦ Create a process to modify the campaign plan based on different outcomes of the midterm evaluation.
• Run public campaign
◦ Create and utilize a feedback loop such that feedback from relay operators is delivered to developers to triage and address
◦ Update relay operator documentation on the Tor community portal
◦ Utilize social media, the tor-relays@ mailing list, the Tor Forum, regular relay operator meetups, and other relevant channels to:
▪ Encourage relay operators to upgrade
▪ Solicit feedback from relay operators about the upgrade experience
▪ Respond to help requests across support channels
▪ Provide public updates about the progress of the transition
◦ Hold a workshop for relay operators about how to migrate their relays to Arti
• Monitor success of public campaign
◦ At campaign midterm, utilize evaluation criteria to measure success of the campaign so far
◦ If midterm evaluation reveals that we are not reaching targets, utilize the process previously created to modify the campaign plan
◦ At the end of the campaign, evaluate transition against target numbers specified in the M&E plan
Objective 6: Promote and encourage information sharing and collaboration between Internet Freedom Program Implementers
To achieve this objective we will work with DRL to host a DRL implementers meeting in March of 2024.
O6.1 Convene a meeting of DRL Internet Freedom program implementers to encourage and promote information sharing and peer-to-peer learning. In this activity we will research and determine a location for the March 2024 Implementers Meeting and submit the location to DRL for approval. We will obtain quotes for various venues, staffing, catering and other associated costs and submit a detailed budget for GO approva and then Host the DRL Implementers meeting.