revamp secs 8 and 9 (cb97bf8c) · Commits · ZerXes / Tor

doc/tor-design.tex

+139 −225

Original line number	Diff line number	Diff line
		@@ -1351,11 +1351,6 @@ acknowledge his existence.
		\Section{Attacks and Defenses}
		\label{sec:attacks}

		% XXX In sec9 we should talk about bandwidth classes, which will
		% enable us to accept a lot more ORs than if we continue to
		% require 10mbit connections for all ORs. -RD


		Below we summarize a variety of attacks, and discuss how well our
		design withstands them.

		@@ -1647,236 +1642,157 @@ by the session key shared by the client and server.
		\Section{Open Questions in Low-latency Anonymity}
		\label{sec:maintaining-anonymity}

		% There must be a better intro than this! -NM
		In addition to the open problems discussed in
		Section~\ref{subsec:non-goals}, many other questions remain to be
		solved by future research before we can be confident of our security.
		Section~\ref{subsec:non-goals}, many other questions must be solved
		before we can be confident of Tor's security.

		Many of these open issues are questions of balance. For example,
		how often should users rotate to fresh circuits? Too-frequent
		rotation is inefficient, expensive, and may lead to intersection attacks
		and predecessor attacks \cite{wright03},
		but too-infrequent rotation
		makes the user's traffic linkable. Along with opening a fresh
		circuit, clients can also limit linkability by exiting from a middle point
		of the circuit, or by truncating and re-extending the circuit; but
		more analysis is needed to determine the proper trade-off.

		A similar question surrounds timing of directory operations:
		how often should directories be updated? With too-infrequent
		updates clients receive an inaccurate picture of the network; with
		too-frequent updates the directory servers are overloaded.

		%do different exit policies at different exit nodes trash anonymity sets,
		%or not mess with them much?
		%% Why would they? By routing traffic to certain nodes preferentially?

		%[XXX Choosing paths and path lengths: I'm not writing this bit till
		% Arma's pathselection stuff is in. -NM]
		%Alice always chooses her path to contain at least
		%three nodes unrelated to herself and her destination, choosing the
		%number of nodes beyond the third from a geometric distribution;
		%explain why. -NM

		%%%% Roger said that he'd put a path selection paragraph into section
		%%%% 4 that would replace this.
		%
		%I probably should have noted that this means loops will be on at least
		%five hop routes, which should be rare given the distribution. I'm
		%realizing that this is reproducing some of the thought that led to a
		%default of five hops in the original onion routing design. There were
		%some different assumptions, which I won't spell out now. Note that
		%enclave level protections really change these assumptions. If most
		%circuits are just two hops, then just a single link observer will be
		%able to tell that two enclaves are communicating with high probability.
		%So, it would seem that enclaves should have a four node minimum circuit
		%to prevent trivial circuit insider identification of the whole circuit,
		%and three hop minimum for circuits from an enclave to some nonclave
		%responder. But then... we would have to make everyone obey these rules
		%or a node that through timing inferred it was on a four hop circuit
		%would know that it was probably carrying enclave to enclave traffic.
		%Which... if there were even a moderate number of bad nodes in the
		%network would make it advantageous to break the connection to conduct
		%a reformation intersection attack. Ahhh! I gotta stop thinking
		%about this and work on the paper some before the family wakes up.
		%On Sat, Oct 25, 2003 at 06:57:12AM -0400, Paul Syverson wrote:
		%> Which... if there were even a moderate number of bad nodes in the
		%> network would make it advantageous to break the connection to conduct
		%> a reformation intersection attack. Ahhh! I gotta stop thinking
		%> about this and work on the paper some before the family wakes up.
		%This is the sort of issue that should go in the 'maintaining anonymity
		%with tor' section towards the end. :)
		%Email from between roger and me to beginning of section above. Fix and move.
		how often should users rotate to fresh circuits? Frequent rotation
		is inefficient, expensive, and may lead to intersection attacks and
		predecessor attacks \cite{wright03}, but infrequent rotation makes the
		user's traffic linkable. Along with opening a fresh circuit, clients can
		also limit linkability by exiting from a middle point of the circuit,
		or by truncating and re-extending the circuit; but more analysis is
		needed to determine the proper trade-off.

		A similar question surrounds timing of directory operations: how often
		should directories be updated? Clients that update infrequently receive
		an inaccurate picture of the network, but frequent updates can overload
		the directory servers. More generally, we must find more
		decentralized yet practical ways to distribute up-to-date snapshots of
		network status without introducing new attacks.

		How should we choose path lengths? If she uses only two hops, then both
		these nodes are certain that by colluding they will learn about Alice
		and Bob. Our current approach is that Alice always chooses at least three
		nodes unrelated to herself and her destination. Thus normally she chooses
		three nodes, but if she is running an OR and her destination is on an OR,
		she uses five. Should Alice choose a nondeterministic path length (say,
		increasing it from a geometric distribution), to foil an attacker who
		uses timing to learn that he is the fifth hop and thus concludes that
		both Alice and the responder are on ORs?

		Throughout this paper, we have assumed that end-to-end traffic
		confirmation will immediately and automatically defeat a low-latency
		anonymity system. Even high-latency anonymity
		systems can be vulnerable to end-to-end traffic confirmation, if the
		traffic volumes are high enough, and if users' habits are sufficiently
		distinct \cite{limits-open,statistical-disclosure}. \emph{Can
		anything be done to make low-latency systems resist these attacks as
		well as high-latency systems?}
		Tor already makes some effort to conceal the starts and
		ends of streams by wrapping all long-range control commands in
		identical-looking relay cells, but more analysis is needed. Link
		padding could frustrate passive observers who count packets; long-range
		padding could work against observers who own the first hop in a
		circuit. But more research needs to be done in order to find an
		efficient and practical approach. Volunteers prefer not to run
		constant-bandwidth padding; but more sophisticated traffic shaping
		approaches remain somewhat unanalyzed.
		%[XXX is this so?]
		Recent work
		on long-range padding \cite{defensive-dropping} shows promise. One
		could also try to reduce correlation in packet timing by batching and
		re-ordering packets, but it is unclear whether this could improve
		anonymity without introducing so much latency as to render the
		anonymity system. Even high-latency anonymity systems can be
		vulnerable to end-to-end traffic confirmation, if the traffic volumes
		are high enough, and if users' habits are sufficiently distinct
		\cite{limits-open,statistical-disclosure}. Can anything be done to
		make low-latency systems resist these attacks as well as high-latency
		systems? Tor already makes some effort to conceal the starts and ends of
		streams by wrapping all long-range control commands in identical-looking
		relay cells. Link padding could frustrate passive observers who count
		packets; long-range padding could work against observers who own the
		first hop in a circuit. But more research remains to find an efficient
		and practical approach. Volunteers prefer not to run constant-bandwidth
		padding; but no convincing traffic shaping approach has ever been
		specified. Recent work on long-range padding \cite{defensive-dropping}
		shows promise. One could also try to reduce correlation in packet timing
		by batching and re-ordering packets, but it is unclear whether this could
		improve anonymity without introducing so much latency as to render the
		network unusable.

		Even if passive timing attacks were wholly solved, active timing
		attacks would remain. \emph{What can
		be done to address attackers who can introduce timing patterns into
		a user's traffic?} % [XXX mention likely approaches]

		%%% I think we cover this by framing the problem as ``Can we make
		%%% end-to-end characteristics of low-latency systems as good as
		%%% those of high-latency systems?'' Eliminating long-term
		%%% intersection is a hard problem.
		%
		%Even regardless of link padding from Alice to the cloud, there will be
		%times when Alice is simply not online. Link padding, at the edges or
		%inside the cloud, does not help for this.

		In order to scale to many users, and to prevent an
		attacker from observing the whole network at once, it may be necessary
		for low-latency anonymity systems to support far more servers than Tor
		currently anticipates. This introduces several issues. First, if
		approval by a centralized set of directory servers is no longer
		feasible, what mechanism should be used to prevent adversaries from
		signing up many spurious servers?
		Second, if clients can no longer have a complete
		picture of the network at all times, how can they perform
		discovery while preventing attackers from manipulating or exploiting
		gaps in client knowledge? Third, if there are too many servers
		for every server to constantly communicate with every other, what kind
		of non-clique topology should the network use? Restricted-route
		topologies promise comparable anonymity with better scalability
		\cite{danezis-pets03}, but whatever topology we choose, we need some
		way to keep attackers from manipulating their position within it.
		Fourth, since no centralized authority is tracking server reliability,
		How do we prevent unreliable servers from rendering the network
		unusable? Fifth, do clients receive so much anonymity benefit from
		running their own servers that we should expect them all to do so, or
		do we need to find another incentive structure to motivate them?
		(Tarzan and MorphMix present possible solutions.)

		% [[ XXX how to approve new nodes (advogato, sybil, captcha (RTT));]

		Alternatively, it may be the case that one of these problems proves
		intractable, or that the drawbacks to many-server systems prove
		greater than the benefits. Nevertheless, we may still do well to
		consider non-clique topologies. A cascade topology may provide more
		defense against traffic confirmation.
		% XXX Why would it? Cite. -NM
		%
		% Huh? Do you mean for simple attacks just because of larger anonymity
		% sets? -PS
		Does the hydra topology (many input nodes, few output nodes) work
		better? Are we going to get a hydra anyway because most nodes will be
		middleman nodes?

		As mentioned in Section~\ref{subsec:dos}, Tor could improve its
		robustness against node failure by buffering transmitted stream data
		at the network's edges until the data has been acknowledged by the
		other end of the stream. The efficacy of this approach remains to be
		tested, however, and there may be more effective means for ensuring
		reliable connections in the presence of unreliable nodes.

		%%% Keeping this original paragraph for a little while, since it
		%%% is not the same as what's written there now.
		%
		%Because Tor depends on TLS and TCP to provide a reliable transport,
		%when one of the servers goes down, all the circuits (and thus streams)
		%traveling over that server must break. This reduces anonymity because
		%everybody needs to reconnect right then (does it? how much?) and
		%because exit connections all break at the same time, and it also harms
		%usability. It seems the problem is even worse in a peer-to-peer
		%environment, because so far such systems don't really provide an
		%incentive for nodes to stay connected when they're done browsing, so
		%we would expect a much higher churn rate than for onion routing.
		%there ways of allowing streams to survive the loss of a node in the
		%path?

		% Roger or Paul suggested that we say something about incentives,
		% too, but I think that's a better candidate for our future work
		% section. After all, we will doubtlessly learn very much about why
		% people do or don't run and use Tor in the near future. -NM

		%We should run a squid at each exit node, to provide comparable anonymity
		%to private exit nodes for cache hits, to speed everything up, and to
		%have a buffer for funny stuff coming out of port 80.
		% on the other hand, it hampers PFS, because ORs have pages in the cache.
		%I previously elsewhere suggested bulk transfer proxies to carve
		%up big things so that they could be downloaded in less noticeable
		%pieces over several normal looking connections. We could suggest
		%similarly one or a handful of squid nodes that might serve up
		%some of the more sensitive but common material, especially if
		%the relevant sites didn't want to or couldn't run their own OR.
		%This would be better than having everyone run a squid which would
		%just help identify after the fact the different history of that
		%node's activity. All this kind of speculation needs to move to
		%future work section I guess. -PS]
		Common wisdom suggests that Alice should run her own onion router for best
		anonymity, because traffic coming through her node could plausibly have
		come from elsewhere. How much mixing do we need before this is actually
		effective, or is it immediately beneficial because many real-world
		adversaries won't be able to observe Alice's router?

		To scale to many users, and to prevent an attacker from observing the
		whole network at once, it may be necessary for low-latency anonymity
		systems to support far more servers than Tor currently anticipates.
		This introduces several issues. First, if approval by a centralized set
		of directory servers is no longer feasible, what mechanism should be used
		to prevent adversaries from signing up many colluding servers? Second,
		if clients can no longer have a complete picture of the network at all
		times, how can they perform discovery while preventing attackers from
		manipulating or exploiting gaps in client knowledge? Third, if there
		are too many servers for every server to constantly communicate with
		every other, what kind of non-clique topology should the network use?
		Restricted-route topologies promise comparable anonymity with better
		scalability \cite{danezis-pets03}, but whatever topology we choose, we
		need some way to keep attackers from manipulating their position within
		it \cite{casc-rep}. Fourth, since no centralized authority is tracking
		server reliability, How do we prevent unreliable servers from rendering
		the network unusable? Fifth, do clients receive so much anonymity benefit
		from running their own servers that we should expect them all to do so
		\cite{econymics}, or do we need to find another incentive structure to
		motivate them? Tarzan and MorphMix present possible solutions.

		% advogato, captcha

		A cascade topology with long-range padding and mixing may provide more
		defense against traffic confirmation against a large adversary, because
		it aggregates many users. Does the hydra topology (many input nodes,
		few output nodes) work better against some adversaries? Are we going to
		get a hydra anyway because most nodes will be middleman nodes?

		When a Tor node goes down, all its circuits (and thus streams) must break.
		Do users abandon the system because of this brittleness? How well
		does the method in Section~\ref{subsec:dos} allow streams to survive
		node failure? If affected users rebuild circuits immediately, how much
		anonymity is lost? It seems the problem is even worse in a peer-to-peer
		environment---so far such systems don't provide an incentive for peers to
		stay connected when they're done retrieving content, so we would expect
		a higher churn rate.

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		\Section{Future Directions}
		\label{sec:conclusion}

		Tor brings together many innovations into
		a unified deployable system. But there are still several attacks that
		work quite well, as well as a number of sustainability and run-time
		issues remaining to be ironed out. In particular:

		% Many of these (Scalability, cover traffic, morphmix)
		% are duplicates from open problems.
		%

		\emph{Scalability:} Tor's emphasis on design simplicity and
		deployability has led us to adopt a clique topology, a
		semi-centralized model for directories and trusts, and a
		full-network-visibility model for client knowledge. None of these
		properties will scale to more than a few hundred servers.
		Promising approaches to better scalability exist (see
		Section~\ref{sec:maintaining-anonymity}), but more deployment
		experience would be helpful in learning the relative importance of
		these bottlenecks.

		\emph{Incentives:} Volunteers may want to run nodes for publicity
		or better anonymity \cite{econymics}.
		more users -> more anonymity

		\emph{Cover traffic:} Currently we avoid cover traffic because
		whereas its costs in performance and bandwidth are clear, and because its
		security benefits are not well understood. With more research
		\cite{SS03,defensive-dropping}, this price/value ratio may change,
		both for link-level cover traffic and also long-range cover traffic.

		\emph{Better directory distribution:} Even with the threshold
		directory agreement algorithm described in Section~\ref{subsec:dirservers},
		directory distribution is still performance-critical. We must find more
		decentralized yet practical ways to distribute up-to-date snapshots of
		network status without introducing new attacks. Also, directory
		retrieval presents a scaling problem, since clients currently
		download a description of the entire network state every 15
		minutes. As the state grows larger and clients more numerous, we
		may need to move to a solution in which clients only receive
		incremental updates to directory state.

		\emph{Implementing location-hidden servers:} While
		Section~\ref{sec:rendezvous} describes a design for rendezvous
		points and location-hidden servers, these features have not yet been
		implemented. While doing so we are likely to encounter additional
		issues that must be resolved, both in terms of usability and anonymity.
		Tor brings together many innovations into a unified deployable system. The
		immediate next steps include:

		\emph{Scalability:} Tor's emphasis on design simplicity and deployability
		has led us to adopt a clique topology, a semi-centralized model for
		directories and trusts, and a full-network-visibility model for client
		knowledge. These properties will not scale past a few hundred servers.
		Section~\ref{sec:maintaining-anonymity} describes some promising
		approaches, but more deployment experience will be helpful in learning
		the relative importance of these bottlenecks.

		\emph{Bandwidth classes:} In this paper we assume all onion routers have
		good bandwidth and latency. We should adapt the Morphmix model,
		where nodes advertise their bandwidth level (DSL, T1, T3), and
		Alice avoids bottlenecks in her path by choosing nodes that match or
		exceed her bandwidth. In this way DSL users can join the Tor network.

		\emph{Incentives:} Volunteers who run nodes are rewarded with publicity
		and possibly better anonymity \cite{econymics}. More nodes means increased
		scalability, and more users can mean more anonymity. We need to continue
		examining the incentive structures for participating in Tor.

		\emph{Cover traffic:} Currently Tor avoids cover traffic because its costs
		in performance and bandwidth are clear, whereas its security benefits are
		not well-understood. We must pursue more research on both link-level cover
		traffic and long-range cover traffic to determine some simple padding
		schemes that offer provable protection against our chosen adversary.

		%%\emph{Offer two relay cell sizes:} Traffic on the Internet tends to be
		%%large for bulk transfers and small for interactive traffic. One cell
		%%size cannot be optimal for both types of traffic.
		% This should go in the spec and todo, but not the paper yet. -RD

		\emph{Caching at exit nodes:} We should run a caching web proxy at each
		exit node, to provide anonymity for cached pages (Alice's request never
		leaves the Tor network), to improve speed, and to reduce bandwidth cost.
		%XXX and to have a layer to block to block funny stuff out of port 80.
		% is that a useful thing to say?
		On the other hand, forward security is weakened because routers have the
		pages in their cache. We must find the right balance between usability
		and security.

		\emph{Better directory distribution:} Directory retrieval presents
		a scaling problem, since clients currently download a description of
		the entire network state every 15 minutes. As the state grows larger
		and clients more numerous, we may need to move to a solution in which
		clients only receive incremental updates to directory state.

		\emph{Implement location-hidden services:} The design in
		Section~\ref{sec:rendezvous} has not yet been implemented. While doing
		so we are likely to encounter additional issues that must be resolved,
		both in terms of usability and anonymity.

		\emph{Further specification review:} Although we have a public,
		byte-level specification for the Tor protocols, this document has
		@@ -1889,7 +1805,6 @@ designer of MorphMix to make the common elements of our two systems
		share a common specification and implementation. So far, this seems
		to be relatively straightforward. Interoperability will allow testing
		and direct comparison of the two designs for trust and scalability.
		% XXXX Bandwidth classes.

		\emph{Wider-scale deployment:} The original goal of Tor was to
		gain experience in deploying an anonymizing overlay network, and
		@@ -1900,7 +1815,6 @@ able to evaluate some of our design decisions, including our
		robustness/latency trade-offs, our performance trade-offs (including
		cell size), our abuse-prevention mechanisms, and
		our overall usability.
		% XXX large and small cells on same network.

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%