more fixes. i declare this the first draft.

svn:r3598

more fixes. i declare this the first draft.
e3266768 · Roger Dingledine · aca8c362 · e3266768
Commit e3266768 authored 20 years ago by Roger Dingledine
--- a/doc/design-paper/challenges.tex
+++ b/doc/design-paper/challenges.tex
 \documentclass{llncs}
-% XXXX NM: Fold ``bandwidth and usability'' into ``Tor and file-sharing'' --
-% ``bandwidth and file-sharing''.

 \usepackage{url}
 \usepackage{amsmath}
 \usepackage{epsfig}

-\setlength{\textwidth}{6.1in}
-\setlength{\textheight}{8.5in}
-\setlength{\topmargin}{1cm}
-\setlength{\oddsidemargin}{.5cm}
-\setlength{\evensidemargin}{.5cm}
+\setlength{\textwidth}{5.9in}
+\setlength{\textheight}{8.4in}
+\setlength{\topmargin}{.5cm}
+\setlength{\oddsidemargin}{1cm}
+\setlength{\evensidemargin}{1cm}

 \newenvironment{tightlist}{\begin{list}{$\bullet$}{
  \setlength{\itemsep}{0mm}
@@ -122,7 +120,7 @@ giving an effective vector for physical or online attackers.
 Tor provides these protections even when a portion of its
 infrastructure is compromised.

-To connect to a remove server via Tor, the client software learns a signed
+To connect to a remote server via Tor, the client software learns a signed
 list of Tor nodes from one of several central \emph{directory servers}, and
 incrementally creates a private pathway or \emph{circuit} of encrypted
 connections through authenticated Tor nodes on the network, negotiating a
@@ -373,10 +371,10 @@ eavesdropper can perform traffic analysis on the entire network.
 %financial health as well as network security.
 The Java
 Anon Proxy~\cite{web-mix} provides similar functionality to Tor but
-handles only web browsing rather than arbitrary TCP\@.
+handles only web browsing rather than all TCP\@.
 %Some peer-to-peer file-sharing overlay networks such as
 %Freenet~\cite{freenet} and Mute~\cite{mute}
-Zero-Knowledge Systems' commercial Freedom
+Zero-Knowledge Systems' Freedom
 network~\cite{freedom21-security} was even more flexible than Tor in
 transporting arbitrary IP packets, and also supported
 pseudonymity in addition to anonymity; but it has
@@ -387,7 +385,7 @@ more scalable peer-to-peer designs like Tarzan~\cite{tarzan:ccs02} and
 MorphMix~\cite{morphmix:fc04} have been proposed in the literature, but
 have not been fielded. These systems differ somewhat
 in threat model and presumably practical resistance to threats.
-Note that MorphMix and Tor differ only in
+Note that MorphMix differs from Tor only in
 node discovery and circuit setup; so Tor's architecture is flexible
 enough to contain a MorphMix experiment.
 We direct the interested reader
@@ -461,7 +459,7 @@ attacks, because its network has fewer edges. JAP was born out of
 the ISDN mix design~\cite{isdn-mixes}, where padding made sense because
 every user had a fixed bandwidth allocation and altering the timing
 pattern of packets could be immediately detected. But in its current context
-as a general Internet web anonymizer, adding sufficient padding to JAP
+as an Internet web anonymizer, adding sufficient padding to JAP
 would probably be prohibitively expensive and ineffective against a
 minimally active attacker.\footnote{Even if JAP could
 fund higher-capacity nodes indefinitely, our experience
@@ -621,7 +619,7 @@ any anonymizing network: their intensive bandwidth requirement, and the
 degree to which they are associated (correctly or not) with copyright
 infringement.

-As noted above, high-bandwidth protocols can make the network unresponsive,
+High-bandwidth protocols can make the network unresponsive,
 but tend to be somewhat self-correcting as lack of bandwidth drives away
 users who need it.  Issues of copyright violation,
 however, are more interesting.  Typical exit node operators want to help
@@ -636,7 +634,7 @@ So when letters arrive, operators are likely to face
 pressure to block file-sharing applications entirely, in order to avoid the
 hassle.

-But blocking file-sharing is not easy: many popular
+But blocking file-sharing is not easy: popular
 protocols have evolved to run on non-standard ports to
 get around other port-based bans.  Thus, exit node operators who want to
 block file-sharing would have to find some way to integrate Tor with a
@@ -726,20 +724,20 @@ nodes, open proxies, and service abusers, these systems hope to make
 ongoing abuse difficult.  Although the system is imperfect, it works
 tolerably well for them in practice.

-But of course, we would prefer that legitimate anonymous users be able to
-access abuse-prone services.  One conceivable approach would be to require
+Of course, we would prefer that legitimate anonymous users be able to
+access abuse-prone services.  One conceivable approach would require
 would-be IRC users, for instance, to register accounts if they want to
 access the IRC network from Tor.  In practice this would not
 significantly impede abuse if creating new accounts were easily automatable;
 this is why services use IP blocking.  To deter abuse, pseudonymous
 identities need to require a significant switching cost in resources or human
 time.  Some popular webmail applications
-impose cost with Reverse Turing Tests, but these may not be costly enough to
-deter abusers.  Freedom used blind signatures to limit
+impose cost with Reverse Turing Tests, but this step may not deter all
+abusers.  Freedom used blind signatures to limit
 the number of pseudonyms for each paying account, but Tor has neither the
 ability nor the desire to collect payment.

-We stress that as far as we can tell, most Tor uses so far are not
+We stress that as far as we can tell, most Tor uses are not
 abusive. Most services have not complained, and others are actively
 working to find ways besides banning to cope with the abuse. For example,
 the Freenode IRC network had a problem with a coordinated group of
@@ -891,8 +889,8 @@ prevent individual machines within the enclave from running Tor
 clients~\cite{or-jsac98,or-discex00}.

 Of course, Tor's default path length of
-three is insufficient for these enclaves, since the entry and/or exit
-themselves are sensitive. Tor thus increments the path length by one
+three is insufficient for these enclaves, since the entry or exit
+themselves are sensitive. Tor thus increments path length by one
 for each sensitive endpoint in the circuit.
 Enclaves also help to protect against end-to-end attacks, since it's
 possible that traffic coming from the node has simply been relayed from
@@ -1208,49 +1206,47 @@ further study.
 \subsection{Trust and discovery}
 \label{subsec:trust-and-discovery}

-The published Tor design adopted a deliberately simplistic design for
+The published Tor design uses a deliberately simplistic design for
 authorizing new nodes and informing clients about Tor nodes and their status.
-In preliminary Tor designs, all nodes periodically uploaded a
-signed description
+All nodes periodically upload a signed description
 of their locations, keys, and capabilities to each of several well-known {\it
-  directory servers}.  These directory servers constructed a signed summary
+  directory servers}.  These directory servers construct a signed summary
 of all known Tor nodes (a ``directory''), and a signed statement of which
 nodes they
-believed to be operational at any given time (a ``network status'').  Clients
-periodically downloaded a directory to learn the latest nodes and
-keys, and more frequently downloaded a network status to learn which nodes were
+believe to be operational then (a ``network status'').  Clients
+periodically download a directory to learn the latest nodes and
+keys, and more frequently download a network status to learn which nodes are
 likely to be running.  Tor nodes also operate as directory caches, to
-lighten the bandwidth on the authoritative directory servers.
+lighten the bandwidth on the directory servers.

-In order to prevent Sybil attacks (wherein an adversary signs up many
-purportedly independent nodes to increase her chances of observing
-a stream as it enters and leaves the network), the early Tor directory design
-required the operators of the authoritative directory servers to manually
-approve new nodes.  Unapproved nodes were included in the directory,
+To prevent Sybil attacks (wherein an adversary signs up many
+purportedly independent nodes to increase her network view),
+this design
+requires the directory server operators to manually
+approve new nodes.  Unapproved nodes are included in the directory,
 but clients
-did not use them at the start or end of their circuits.  In practice,
-directory administrators performed little actual verification, and tended to
-approve any Tor node whose operator could compose a coherent email.
+do not use them at the start or end of their circuits.  In practice,
+directory administrators perform little actual verification, and tend to
+approve any Tor node whose operator can compose a coherent email.
 This procedure
-may have prevented trivial automated Sybil attacks, but would do little
+may prevent trivial automated Sybil attacks, but will do little
 against a clever and determined attacker.

 There are a number of flaws in this system that need to be addressed as we
-move forward.  They include:
-\begin{tightlist}
-\item Each directory server represents an independent point of failure; if
-  any one were compromised, it could immediately compromise all of its users
-  by recommending only compromised nodes.
-\item The more nodes join the network, the more unreasonable it
-  becomes to expect clients to know about them all.  Directories
-  become infeasibly large, and downloading the list of nodes becomes
-  burdensome.
-\item The validation scheme may do as much harm as it does good.  It is not
-  only incapable of preventing clever attackers from mounting Sybil attacks,
-  but may deter node operators from joining the network.  (For instance, if
-  they expect the validation process to be difficult, or if they do not share
-  any languages in common with the directory server operators.)
-\end{tightlist}
+move forward. First,
+each directory server represents an independent point of failure: any
+compromised directory server could start recommending only compromised
+nodes.
+Second, as more nodes join the network, %the more unreasonable it
+%becomes to expect clients to know about them all.
+directories
+become infeasibly large, and downloading the list of nodes becomes
+burdensome.
+Third, the validation scheme may do as much harm as it does good.  It not
+only can't prevent clever attackers from mounting Sybil attacks,
+but it may deter node operators from joining the network, if
+they expect the validation process to be difficult, or they do not share
+any languages in common with the directory server operators.

 We could try to move the system in several directions, depending on our
 choice of threat model and requirements.  If we did not need to increase
@@ -1261,18 +1257,17 @@ But, we can only do that if can simultaneously make node capacity
 scale much more than we anticipate to be feasible soon, and if we can find
 entities willing to run such nodes, an equally daunting prospect.

-
 In order to address the first two issues, it seems wise to move to a system
 including a number of semi-trusted directory servers, no one of which can
 compromise a user on its own.  Ultimately, of course, we cannot escape the
 problem of a first introducer: since most users will run Tor in whatever
 configuration the software ships with, the Tor distribution itself will
-remain a potential single point of failure so long as it includes the seed
+remain a single point of failure so long as it includes the seed
 keys for directory servers, a list of directory servers, or any other means
 to learn which nodes are on the network.  But omitting this information
-from the Tor distribution would only delegate the trust problem to the
-individual users, most of whom are presumably less informed about how to make
-trust decisions than the Tor developers.
+from the Tor distribution would only delegate the trust problem to each
+individual user. %, most of whom are presumably less informed about how to make
+%trust decisions than the Tor developers.

 %Network discovery, sybil, node admission, scaling. It seems that the code
 %will ship with something and that's our trust root. We could try to get
@@ -1310,20 +1305,19 @@ for views of a node's latency and/or bandwidth to vary wildly between
 observers.  Further, it is unclear whether total bandwidth is really
 the right measure; perhaps clients should instead be considering nodes
 based on unused bandwidth or observed throughput.
-% XXXX say more here?
-
 %How to measure performance without letting people selectively deny service
 %by distinguishing pings. Heck, just how to measure performance at all. In
 %practice people have funny firewalls that don't match up to their exit
 %policies and Tor doesn't deal.
-
+%
 %Network investigation: Is all this bandwidth publishing thing a good idea?
 %How can we collect stats better? Note weasel's smokeping, at
 %http://seppia.noreply.org/cgi-bin/smokeping.cgi?target=Tor
 %which probably gives george and steven enough info to break tor?
-
-Even if we can collect and use this network information effectively, we need
-to make sure that it is not more useful to attackers than to us.  While it
+%
+And even if we can collect and use this network information effectively,
+we must ensure
+that it is not more useful to attackers than to us.  While it
 seems plausible that bandwidth data alone is not enough to reveal
 sender-recipient connections under most circumstances, it could certainly
 reveal the path taken by large traffic flows under low-usage circumstances.
@@ -1331,24 +1325,27 @@ reveal the path taken by large traffic flows under low-usage circumstances.
 \subsection{Non-clique topologies}

 Tor's comparatively weak threat model may allow easier scaling than
-other mix-net
+other
 designs.  High-latency mix networks need to avoid partitioning attacks, where
 network splits let an attacker distinguish users in different partitions.
 Since Tor assumes the adversary cannot cheaply observe nodes at will,
 a network split may not decrease protection much.
 Thus, one option when the scale of a Tor network
 exceeds some size is simply to split it. Nodes could be allocated into
-partitions while hampering collobrating hostile nodes from taking over
+partitions while hampering collaborating hostile nodes from taking over
 a single partition~\cite{casc-rep}.
 Clients could switch between
-networks, even on a per-circuit basis.  Future analysis may uncover
-other dangers beyond those affecting mix-nets.
+networks, even on a per-circuit basis.
+%Future analysis may uncover
+%other dangers beyond those affecting mix-nets.

-More conservatively, we can try to scale a single Tor network.  Potential
+More conservatively, we can try to scale a single Tor network. Likely
 problems with adding more servers to a single Tor network include an
 explosion in the number of sockets needed on each server as more servers
-join, and an increase in coordination overhead as keeping everyone's view of
-the network consistent becomes increasingly difficult.
+join, and increased coordination overhead to keep each users' view of
+the network consistent. As we grow, we will also have more instances of
+servers that can't reach each other simply due to Internet topology or
+routing problems.

 %include restricting the number of sockets and the amount of bandwidth
 %used by each node.  The number of sockets is determined by the network's
@@ -1369,9 +1366,7 @@ extend to Tor, which has a weaker threat model but higher performance
 requirements: instead of analyzing the
 probability of an attacker's viewing whole paths, we will need to examine the
 attacker's likelihood of compromising the endpoints.
-
-% Nick edits these next 2 grafs.
-
+%
 Tor may not need an expander graph per se: it
 may be enough to have a single subnet that is highly connected, like
 an internet backbone. %  As an
@@ -1382,22 +1377,22 @@ an internet backbone. %  As an
 %center and anyone out of the center that they want to.  Then the
 %network easily scales to c. 2500 nodes with commensurate increase in
 %bandwidth.
-There are many open questions: how to distribute directory information
-(presumably information about the center nodes could
-be given to any new nodes with their codebase), whether center nodes
-will need to function as a `backbone', and so one. As above,
+There are many open questions: how to distribute connectivity information
+(presumably nodes will learn about the center nodes
+when they download Tor), whether center nodes
+will need to function as a `backbone', and so on. As above,
 this could create problems for the expected anonymity for a mix-net,
 but for a low-latency network where anonymity derives largely from
 the edges, it may be feasible.

-In a sense, Tor already has a non-clique topology.
-Individuals can set up and run Tor nodes without informing the
-directory servers. This allows groups to run a
-local Tor network of private nodes that connects to the public Tor
-network. This network is hidden behind the Tor network, and its
-only visible connection to Tor is at those points where it connects.
-As far as the public network, or anyone observing it, is concerned,
-they are running clients.
+%In a sense, Tor already has a non-clique topology.
+%Individuals can set up and run Tor nodes without informing the
+%directory servers. This allows groups to run a
+%local Tor network of private nodes that connects to the public Tor
+%network. This network is hidden behind the Tor network, and its
+%only visible connection to Tor is at those points where it connects.
+%As far as the public network, or anyone observing it, is concerned,
+%they are running clients.

 \section{The Future}
 \label{sec:conclusion}