Year 1: 1 December 2010 - 30 November 2011.
Phase 1 = 1st six months of project year 1.
Phase 2 = 2nd six months of project year 1.
Tasks without dates are assumed to be due end of phase.
Category A (highest priority): aka must do
- Support camouflaged transport
- proposal for modular transport spec (before Dan's class starts; approx. 3/15). Roger, Nick, Tom. (#2758); done: Modular transports are a way to decouple protocol-level obfuscation from the core Tor protocol in order to better resist client-bridge censorship. Our approach is to specify a means to add pluggable transport implementations to Tor clients and bridges so that they can negotiate a superencipherment for the Tor protocol. We described the necessary changes to Tor in proposal 180.
- proof-of-concept transport plugins (for team use; not required for use by user in country): Nick, Tom, Steven
- http headers (#2759); done: We built a socks proxy that will stick the Tor transport in http headers. It isn't designed to fool a human looking at the traffic, but it seems to fool wireshark. We have a proof of concept, it comes with a screenshot, and it's even been used in-country (which is arguably a flaw, not a feature, but it did happen).
- superencryption (#2760); done: We built obfsproxy which is a protocol obfuscation layer for TCP protocols. obfsproxy does not provide authentication or data integrity and does not hide data lengths. It is more suitable for providing a layer of obfuscation for an existing authenticated protocol, like SSH or TLS.
- Hooks / Support for Rendezvous SponsorF
(Needs better definition before Tor would know what work, if any, needs to be done.)
- Metrics Karsten, Tom.
- problem statement for DNS/ftp reflectors to measure bridge reachability (#2531); done: Roger talked to Dan's student about the topic, and Dan's student did a project on it. I believe not much came of the project. We've folded the idea into a broader blog post about measuring bridge reachability.
- problem statement for entropy-of-network analysis (#2530); done: We need a better understanding of how much anonymity the Tor network provides against a partial network adversary who observes and/or operates some of the network. Specifically, we want to understand the chances that this class of adversary can observe a user's traffic coming into the network and also the corresponding traffic exiting the network. We wrote a blog post describing the research problem of meauring the safety of the Tor network.
- get bwauth and torperf data up on metrics.tp.o (#2394, #2534); done: Torperf measures Tor's performance by downloading files of various sizes and noting how long substeps take. The Torperf performance data are available on the metrics website. The bandwidth authority scanners measure the bandwidth of relays in the Tor network to adjust the relays' self-advertised bandwidth values. The measurement results that the directory authorities include their votes are available for a single consensus, e.g., votes published on November 1, 2011, 00:00 UTC (7.4M), or in the v3 vote tarballs of the descriptor archives.
- put bridge pool assignments on metrics.tp.o (#2537); done: BridgeDB gives out bridge addresses to censored users via email or http and reserves a few addresses for giving them out manually. We now publish archives containing the sanitized assignments which bridges are given out by which mechanism.
- analysis of passive bridge reachability data; done: Bridges count how many users connect to them per day and by country and publish these aggregated statistics. The research question here was whether we can learn that a bridge is blocked only by looking at these statistics. We performed a case study that looks at Chinese bridge users in the first half of 2010. The result is that these statistics will be useful in the future, but that we need other data to confirm that a bridge was actually blocked.
- analyze bridge churn (#2794); done: Bridges, like relays, have very different uptime characteristics. We conducted an analysis of bridge churn by investigating how BridgeDB could measure bridge stability over time to give out at least one stable bridge per user. We suggest to implement the bridge stability metric described in the analysis in BridgeDB and make it configurable to tweak the requirement parameters if needed.
- write a censorship detector (similar to the consensus health checker) that analyses bridge stats as soon as they come in and sends early warnings about country-wide blocking events Tom, Nick? (#2718); done: The aggregated statistics that relays and bridges report can be used to detect country-wide blockings. We started building An anomaly-based censorship-detection system for Tor that uses relay statistics to warn about possible censorship events. Possible censorship events can be visualized on the metrics website. The detection algorithm still requires some fine-tuning. The source code of the current censorship detector is available. The next steps will be to refine the detection algorithm and implement it in a notification service.
- a "comparison of datagram tor designs" analysis (May 1). Steven and others (#1855); done: Joel Reardon identified in 2008 that the major cause of latency in the Tor network is delay in the output queue at Tor nodes, resulting from TCP flow control. Since then, a couple of Tor datagram designs have been proposed. We now have a Comparison of Tor Datagram Designs.
- microdescriptor roll-out for client-side (May 1) Nick (#1748); done: Microdescriptors are a feature designed to greatly reduce the amount of data that needs to be transmitted to implement the Tor directory protocol. Proposal 158 and proposal 162 contain the design details. Microdescriptors are enabled by default for clients since Tor 0.2.3.3-alpha.
- Bandwidth authority improvements Mike, Aagbsn, Tomb?
- Better tracking for measurement feedback (#1976, attempt by June 1; succeed by Nov. 1); done: The bandwidth authority scanners measure relay bandwidth by downloading files through two hop paths. We developed a new feedback mechanism that transforms this measurement process into a PID (proportional–integral–derivative) control loop, designed to minimize error in terms of the difference of measured bandwidth for each relay from the network average. The system may need tuning, and/or may still drive the network into unexpected failure modes, such as socket overload or CPU overload on relays as it tries to load balance and converge on minimum error. The PID algorithm is enabled on four out of five bandwidth scanners.
- Upgrade to new SQLALchemy release (#2391, June 30); done: The bandwidth scanners ran into some errors when using older SQL libraries. Upgrading to newer versions required some API migration. The most recent bandwidth scanner version supports the latest SQL libraries.
- Help Aagbsn understand system, write spec (#2861, June 30); done: The bandwidth scanners measure the bandwidth of relays in the Tor network to adjust the relays' self-advertised bandwidth values. We now have a Bandwidth Scanner specification explaining how the scanners are measuring relay bandwidth and aggregating scanner results.
- Set up a 5th scanner (June 30); done: A subset of the currently eight directory authorities run a bandwidth scanner. At least three directory authorities need to include bandwidth scanner results in their votes in order to vote on them in the consensus. We now have five bandwidth scanners set up to compensate failures of at most two of them.
- a "why else is tor slow" draft (Nov. 1) Roger; deferred: It's too early for a comprehensive document here. Instead we need to flesh out all the various suggestions for performance improvements (research, design, implementation) and categorize and prioritize them. That's happening on the performance roadmap, and will be a major priority during the Year2 work.
- analysis for increasing the set of guards (Nov 1); done: We fleshed out the questions in this blog post. Next step is a) interest some professors and grad students in answering the questions, and b) plan to guess some better parameters and put them in place while we're waiting for the research end.
- meetings with sponsor F, tor talk for stanford, interact with isi Roger, Mike (#3034); done: Roger met with ISI in March 2011, sponsor F in April and periodically, taught a lecture for Dan's class and showed up for the presentations day at the end of the class.
- write up research task summaries, and bibliography recommendations, for dan's class Roger (#2535); done: See ticket for summaries and recommendations.
- interact with sponsor F; done: Roger participated in a sponsor F meeting in August, called in to another sponsor F meeting, met with the STORM researchers periodically, and presented at both PI meetings.
Category B (lower priority): aka shall do
- Bridge user/operator usability
- analyze security tradeoffs from using a socks proxy vs a bridge (#2764); done: Looks like a fine idea. The downsides are primarily in terms of convenience (collecting statistics, having a fingerprint so the user can follow the proxy if it changes addresses, etc). The discussion also led to #4624, where we suggest a config option to let the user specify if she's configuring bridges for the primary purpose of reachability or the primary purpose of security. See also this blog post. Next steps are that we should move forward with projects like #3466.
- libevent2 working for windows bundles? (June 1) Erinn (#2007, others?); done: There are three major components necessary to limit the use of kernel socket buffers on Windows and thereby (presumably) fix bug 98. The first part was to port Tor's original networking backend to use the Libevent2 "bufferevent" implementation -- to "make Tor work with libevent2 on windows." This is now complete, and we are continuing to shake out tricky bugs in it. The second part is to enable the IOCP implementation in Libevent 2 and debug it as needed. This is implemented and under testing: we have basic network functionality working in initial testing, but some tricky bugs remain. The final part is to switch the IOCP implementation in Libevent 2 to store all of its buffered network data in the program's memory rather than in the kernel's memory. We have made the switch, and are testing it now. At this point, only the last phases of testing and debugging remain. However, this code has historically been difficult to debug, and the bugs have been difficult to reproduce, due in part to the disparity of network environments, Windows versions and configurations, and the complexity of the code. At this point, we have all the major components of this solution implemented, have solved all the bugs we understand, and will continue to hunt for more bugs to fix.
- bridge-by-default bundles (June 1) Erinn (#1538); done: The Vidalia Bridge Bundle for Windows is configured to be a bridge by default.
- assess options for preventing relays from enumerating bridges. Analyze security of bridge users using guard-flagged relays for their middle hop, or of bridges forcing an entry guard as an extra hop. done: Adversaries are able to enumerate bridges by running a relay and looking for relay-like connections from non-relays. A likely defense is to make bridges redirect their clients' connections through guard nodes, so that no single relay can compromise more than a small number of bridges. There is an initial design draft, based on a session from our July developers' meeting, in proposal 188. We think that this will prevent or mitigate some of the more urgent enumeration attacks. A bigger picture overview of the issues can be found in this blog post.
- bridgedb spec (#1606); done: BridgeDB processes bridge descriptor files to learn about new bridges, maintains persistent assignments of bridges to distributors, and decides which bridges to give out upon user requests. The new BridgeDB specification specifies how BridgeDB works internally. Some of the decisions in BridgeDB may be suboptimal. The specification document is meant to specify current behavior as of April 2011, not to specify ideal behavior.
- analyze FC paper proposing adaptive bridge address distribution. done: There are five building blocks involved here: 1) A way to discover how much use a bridge is seeing from a given country: see the WECSR10 paper and usage graphs. 2) A way to get fresh bridge addresses over time: see this blog post. 3) A way to discover when a bridge is blocked in a given country: see this blog post; the bottom of this blog post also gives an overview of the Proximax idea. 4) Distribution strategies that rely on different mechanisms to make enumeration difficult. 5) Make defeating the distribution strategies the best way to identify or block bridges, which means solving most of the items in this blog post.
- bridge users can remember bridges across restarts; done: We assume that there are at least two different bridge user types: those who just need any way to reach the Tor network and those who're trying to hide the fact that they're using Tor. We wrote and deployed a patch for the bridge authority to count bridge descriptor fetches in a 24 hour interval. Data collected is total number of descriptor fetches, unique descriptors fetched, and a few percentiles of descriptor fetches per bridge. We also wrote a blog post examining different bridge usage scenarios that is partially based on the collected data and a Tor proposal to store more information about a bridge across restart than is currently possible.
- tor integrates upnp / pmp for port forwarding without Vidalia? (tor-fw-helper, incl #1983); done: Tor now can negotiate with UPnP/NAT-PMP compliant routers to automatically configure port forwarding on routers which need it, due to NAT and/or firewall configuration. Both UPnP and NAT-PMP are operational and have been tested under Windows, *nix, and Mac OS X.
- bridges on ipv6; done: Enabling Tor clients to connect to bridges via IPv6 requires a few substeps. We started by writing proposal 186 enabling relays and bridges to advertise more than one address. We implemented those parts of proposal 186 which apply to clients and private bridges (i.e., no bridge authority involvement). We also finished rewriting the Tor bridge code to accept connections on an IPv6 address as well as the Tor client code to connect to a private bridge on an IPv6 address. This enables a bridge to have two addresses, one IPv4 and one IPv6, and also to advertise both of these and result in full functionality of private bridges albeit with the configuration file as the only way of configuring the client (i.e., no BridgeDB or Vidalia support).
Category C (lowest priority): aka should do
- Cleaning up better (only if funded or volunteers found)
- browser-level, solvable by torbutton
- OS-level: provide sponsor F guidance on bundle usage
- torbutton, chrome/mobile Mike, Nathan?
- Attempt Firefox Mobile release of Torbutton (#1506); done: We attempted to make a Firefox Mobile port of Torbutton, but failed because of the multi-process APIs that Mozilla decided to debut on mobile. The new multi-process APIs radically alter the observer behavior and event delivery reliability, and do not appear to be as complete as their non-mobile equivalents. We are currently not planning to make another attempt.
- Stable release of Torbutton (#3071); done: Torbutton 1.4 has been released on July 2, 2011.
- Continue to keep in touch with Chrome folks (#1770, #1816, #1925, #2956, #3072); done: It would be great if we could create a Chrome extension to upgrade Google Chrome's Incognito Mode into a full privacy mode that protects against all network adversaries, giving Chrome users real privacy-by-design if and when they want it. We deployed a version of HTTPS-Everywhere for Chrome that now has the same security properties as Firefox. The addon uses a subset of the APIs we need for Tor mode. We also created a list of Important Google Chrome Bugs, and the Chrome developers created a ticket to keep track of these bugs. The next step will be to discuss the Tor Browser design document with the Chrome developers.
- Fork Firefox 4 for Tor Browser Bundle; done: We needed to create and maintain a series of patches for Firefox 4 for use in our Tor Browser Bundles. The latest Tor Browser Bundle contains a patched Firefox 7.0.1.
- CSS Font fixes; done: According to the Panopticlick data set, the font list (which they obtained through plugins) was the second most identifiable chunk of data they saw, behind plugins themselves. We block plugins, but fonts are still available through CSS. We created ticket #2872 to track progress on this task and completed it in December 2011.
- Cloud experiments Runa
- perform experiments on cloud providers, e.g., using the change-my-IP interface (does it work? what does it cost? other interesting issues) (#3033); done: The Tor Cloud on Amazon EC2 is a user-friendly way of setting up Tor bridges on the Amazon EC2 cloud computing platform. Users are able to launch their own virtual machines and computing resources with flexible and cost-effective terms. We also looked into changing the IP address of a cloud image (#3606) and found that users can get a new IP address by stopping and starting their EC2 instance.
- Tor maintenance
- put out a tor 0.2.2.x stable release Nick, Roger (#3032); done: Tor 0.2.2.32, the first stable release in the 0.2.2 branch, was released on August 29, 2011.
- eliminate tls renegotiation Nick, Roger; done: Tor 0.2.3.6 is the first version to contain a new handshake protocol (v3) for authenticating Tors to each other over TLS. The v3 protocol should allow us to become more resistant to fingerprinting than previous protocols, and should require less TLS hacking for future Tor implementations. Implements proposal 176. The v3 protocol gets used between any two Tors that both are running Tor version 0.2.3.6 or later. There are still bugs in the v3 handshake code, the most significant of which are fixed in Tor 0.2.3.7.
- proposal to support alternate crypto ops backward-compatibly Nick, Roger, Tom; done: Tor's original choice of cryptography algorithms (SHA1 for a digest function, AES128 for encryption, DH1024 for key agreement, and RSA1024 for authentication) have been unchanged since the first widely used Tor releases in 2003-2004. Some of these algorithms and key sizes are showing their age, however, as are some of the ways that our protocols put them together. Nick wrote a draft proposal discussing what algorithms we might want to use in the next year or two, how we might want to use them, and how to migrate to new algorithms and protocols without breaking the existing Tor network. This proposal is getting extensive discussion from the community, and will likely spawn more drafts.