This document describes the formal evaluation criteria for evaluating Pluggable Transports for integration and deployment as part of the various Tor Project software packages.
The evaluation criteria is divided into three separate components, “Review Coverage and Reviewability Evaluation”, “Design Evaluation” and “Implementation Evaluation”, each which cover different but related considerations examined when reviewing a given design.
== 1. Review Coverage and Reviewability Evaluation ==
The “Review Coverage and Reviewability Evaluation” portion of the evaluation criteria seeks to examine how easy it is to review the design for deployment. These criterion establish if further examination is possible, and if there is an actual software component to deploy if the design is accepted.
=== 1.1 Is the software published, and is it entirely free / open source software? ===
This criteria seeks to establish if the given software exists, and if it is possible to deploy to users on a large scale without encountering licensing problems. All Pluggable Transports that will be deployed to users must pass this criteria, and most will without any special considerations.
However, there are designs that, for example, call for non-free (and thus non-distributable) components required to implement part of their functionality (Eg: Skype, a copy of Windows in a VM image, etc). It is important to note that "non-free" in this context includes software that is "Gratis" but not "Libre" as distributing third-party built binaries still is fraught with licensing issues, and impacts end user system security.
=== 1.2 How well documented is the design? ===
This criteria seeks to establish the difficulty of reviewing the design of a given Pluggable Transport, by examining the amount of formal documentation that is available. Most if not all Pluggable Transports that wish to be considered for deployment should have a formal design document, a threat model, and a formal specification. These documents are essential in obtaining a high level understanding of how a given Pluggable Transport is supposed to work.[[BR]][[BR]]Additionally, when the existing documentation is being examined attention should be given to the test-ability of the security and unblockability claims made by the authors.
=== 1.3 How much existing review has been done? Is the project active? ===
This criteria examines how much peer review the design has received already, for example in papers by the academic community. Additionally, it examines the state of the project surrounding the design, taking into account how much continued attention and development the design is receiving currently from it's inventors and implementers.
=== 1.4 What is the design's deployment history? ===
This criteria examines a given Pluggable Transport's deployment history. Deployment history is more useful when evaluating existing and established designs as opposed to brand new designs, however if there is an existing deployment history, factors such as the type and number of users it received, the amount of publicity associated with prior deployment, and the results of each deployment event are examined.
== 2. Design Evaluation ==
The “Design Evaluation” portion of the evaluation criteria examines the Pluggable Transport design itself to categorize the characteristics, capabilities, features, security, and drawbacks of the design in a systematic manner.
=== 2.1 How difficult or expensive will it be to block the design? ===
This criteria examines how difficult we believe it will be for various given adversaries to block a given Pluggable Transport.[[BR]][[BR]]Factors to be considered here include, but are not limited to, external services or protocols a given design relies on being available, how much collateral damage would be required to censor the protocol, how much development time an adversary must invest to block a protocol (Eg: writing DPI rules), and what fraction of censoring countries a given design expected to function in.[[BR]][[BR]]It is worth noting that not all Pluggable Transports are oriented around censorship circumvention, and some are used as test-beds for research in other related areas (Eg: obfsproxy-wfpadtools is a Pluggable Transport used by researchers to investigate website fingerprinting defenses, but provides no extra anonymity or blocking resistance.).
=== 2.2 What impact on anonymity does the design have, if any? ===
This criteria examines how a given Pluggable Transport design impacts the user's anonymity. While many Pluggable Transports focus solely on censorship circumvention in the form of reachability, some designs attempt to increase the users anonymity, for example by adding padding to defend against website fingerprinting attacks. On the opposite end of the spectrum, certain transports reduce the user's anonymity as a consequence of the design. While such drawbacks are not deal-breaking for deployment they must be carefully considered.
=== 2.3 What is the design's overhead in terms of computational costs and bandwidth? ===
This criteria examines how expensive a given Pluggable Transport design is for the user and the network entry point in terms of computational resources such as CPU resources and network bandwidth. As network entry points (Bridges in Tor terminology) are volunteer administered having a good idea of how much additional CPU, memory, and network bandwidth is required from operators is an important deployment consideration.[[BR]][[BR]]Examining these factors also help to establish if there are any limitations to the environments that a given design can be deployed in, for example, a transport that massively inflates the amount of data sent and received over the network due to the heavy use of stegonography may not be suited to low-bandwidth environments, regardless of the additional blocking-resistance obtained by such trade-offs.[[BR]][[BR]]Similar considerations apply to things like CPU cycles where the CPU cost of additional layers of cryptography for example, impose a minimum set of required hardware characteristics to provide adequate end user performance.
=== 2.4 How resilient is the design to active probing? ===
This criteria examines how well a given design fares versus active probing attacks similar to those done by the Chinese state firewall (Obtain a list of “interesting” IP addresses / ports by passively monitoring traffic, and follow up with active connections later to determine what services and protocols are actually being offered). While this criteria partially falls under 2.1, it is an attack that is explicitly worth considering as existing censors in the real world have demonstrated the capability and willingness to mount such things.
== 3 Implementation Evaluation ==
The “Implementation Evaluation” portion of the criteria evaluates the actual concrete software implementation of a given Pluggable Transport design to see if it is suitable for deployment, and if not how much work would be required to take the software to a deployable state.
=== 3.1 Does the design use the Tor Project's Pluggable Transport Application Programming Interface (API) already? ===
The Tor Project has a standard recommended approach for integrating Pluggable Transport implementations so that each transport can be invoked and managed in a modular fashion by the Tor process. The API additionally allows Tor and the associated back-end infrastructure to handle things such as the publication and distribution of endpoints to users, and the collection of user and usage metrics related statistics for a given transport.[[BR]][[BR]]As the interface is well specified and libraries exist for multiple languages, all Pluggable Transport designs should support this before being considered for further evaluation.
=== 3.2 Is the implementation cross-platform? How about mobile support? ===
The Tor Project officially supports Linux, OS X and Microsoft Windows as client platforms. Pluggable Transport implementations that aim to be useful for the most users should support all three platforms as clients at a bare minimum, though ideally a given implementation should also be capable of running as the “server” on each of the supported platforms as well.
Mobile support if any is also worth examining as there have been recent efforts to increase the circumvention measures available for users on mobile devices. While the fundamental design and integration strategy on the currently supported mobile platform (Android) is not particularly different, there are unique mobile related porting issues that arise and thus should be noted, especially if it is believed that a given Pluggable Transport design would be useful to a mobile device user-base.
=== 3.3 What is the implementation's build process like, how easy is it to deploy, and what is deployment scaling like? ===
This criteria examines how easy it is to deploy a given implementation, from the build integration and endpoint distribution perspectives.
The Tor Project uses a deterministic build system for client binaries for added user security. If a given implementation is not possible to build deterministically, or depends on a large number of third party components that require separate investigation and auditing, these factors will negatively affect the odds of wide spread deployment.[[BR]][[BR]]On the endpoint distribution side, for most implementations to see success, there must be a sufficient number of volunteer run entry points into the Tor network reachable via a given Pluggable Transport. If deployment by volunteer Bridge operators is excessively difficult, or requires massive amounts of Bridge side resources, then this can negatively affect the uptake of a given Pluggable Transport.
=== 3.4 How is the implementation's code from a security and maintainability perspective? ===
This criteria aims to evaluate a given implementation based on code quality. Providing secure, reliable and safe software is important and while the underlying Tor protocol provides certain security properties such as encryption and authentication, Pluggable Transports can introduce new security risks or vulnerabilities if not designed and implemented correctly.[[BR]][[BR]]Additionally code maintainability comes into play when integrating third party components as it often ends up that the Tor Project is put in the position of continuing to maintain and support the given code base once it is deployed. To ease this well documented implementations with unit tests and integration tests will be easier to deploy.
=== 3.5 How well-instrumented is the implementation in terms of collecting usage / performance / etc metrics? ===
This criteria aims to evaluate how easy it will be to obtain valuable metrics regarding how well a given design performs once it has been deployed. A large portion of this functionality is covered and handled by the Extended OR Port protocol portion of the Pluggable Transports API, so most implementations should get this for free by virtue of being a Pluggable Transport API compliant module.[[BR]][[BR]]Certain designs include their own additional difficulties in gathering certain types of statistics (Eg: Client information is missing as the data passes through an intermediary before reaching the Tor server instance).[[BR]]