Separate general fingerprinting defesnes from randomiation discussion. (646e0e73) · Commits · The Tor Project / Applications / tor-browser-spec

design-doc/design.xml

+153 −64

Original line number	Diff line number	Diff line
		@@ -1585,98 +1585,187 @@ url="https://amiunique.org/">Am I Unique</ulink>.
		<title>General Fingerprinting Defenses</title>
		<para>

		XXX: Stategies vs approaches? Approaches will include things like
		virtualization, spoofing, reimplementation, permissions, and disabling features..

		Without looking at a particular fingerprinting vector there are basically two
		strategies to thwart fingerprinting attacks in general:
		When implemented after an API or feature has been standardized and widely
		deployed, defenses to fingerprinting issues tend to take one of the following
		forms: value spoofing, subsystem reimplementation, virtualization, site
		permissions, and feature removal.

		</para>
		<orderedlist>
		<listitem>
		Making users uniform: This would render fingerprinting moot as it only works
		if there are detectable differences between targets.
		</listitem>
		<listitem>
		Giving randomized values back: This would bury the real device
		characteristics within noise. That way a fingerprinter cannot be sure to
		identify a user upon (re-)visit of a website which is rendering
		fingerprinting ineffective.
		</listitem>
		<listitem>Virtualization..</listitem>
		<listitem>Disabling features</listitem>
		</orderedlist>
		<listitem><command>Value Spoofing</command>
		<para>

		Although there is some research <ulink
		url="http://research.microsoft.com/pubs/209989/tr1.pdf">suggesting</ulink> the
		second approach we think the former is currently a better suited heuristic for
		Tor Browser for a couple of reasons:
		Value spoofing can be used for simple cases where the browser directly provides some
		aspect of the user's configuration details, devices, hardware, or operating
		system directly to a website. It becomes less useful when the fingerprinting
		method is instead relying on API behavior.

		<itemizedlist>
		<listitem>
		</para>
		</listitem>
		<listitem><command>Subsystem Reimplementation</command>
		<para>

		It might not be possible to randomize all fingerprintable characteristics.
		While it seems plausible that many end-user configuration details that the
		browser currently exposes may be replaced by false information, this approach
		seems to break down when it is applied to deeper issues. In particular, it is
		not clear how to randomize the capabilities of hardware attached to a computer
		in such a way that it convincingly behaves like other hardware, while still
		providing a consistent experience to the user from site to site. Similarly,
		concealing operating system version differences through randomization will
		require an implementation of the underlying support code for every version
		your randomization is trying to mimick.
		In cases where simple spoofing is not enough to properly conceal underlying
		device characteristics or operating system details, the underlying
		susbsystem that provides the functionality for a feature or API may need
		to be completely reimplemented. This is most common in cases where
		customizable or version-specific aspects of the user's operating system are
		visible through the browser's featureset or APIs, usually because the browser
		directly exposes OS-provided implementations of underlying features. In these
		cases, such OS-provided implementations must be replaced by a generic
		implementation, or at least an implementation wrapper that makes effort to
		conceal any user-customized aspects of the system.

		In both cases, randomizatin requires virtualization of many underlying
		implementations, where as uniformity only requires virtualization of one
		implementation.
		</para>
		</listitem>
		<listitem><command>Virtualization</command>
		<para>

		Virtualization is needed when simply reimplementing a feature in a different
		way is insufficient to fully conceal the underlying behavior. This is most
		common in instances of device and hardware fingerprinting, but since the
		notion of time can also be virtualized, it also can apply to any instance
		where an accurate measure of wallclock time is required for a fingerprinting
		vector to attain high accuracy.

		</para>
		</listitem>
		<listitem><command>Site Permissions</command>
		<para>

		XXX Virtualization
		In the event that virtualization is too expensive in terms of performance or
		engineering effort, and the relative expected usage of a feature is rare, site
		permissions can be used to prevent the usage of a feature execpt in cases
		where the user actually wishes to use it. Unfortunately, this mechanism
		becomes less effective once a feature becomes widely overused and abused by
		many websites, as warning fatigue quickly sets in for most users.

		</para>
		</listitem>
		<listitem>
		Usability.
		<listitem><command>Feature/Functionality Removal</command>
		<para>

		When extremely invasive features serve only a narrow domain or usecase, or
		there are alternate ways of accomplishing the same task, features and/or
		certain aspects of their functionality may be simply removed.

		</para>
		</listitem>
		<listitem>
		</orderedlist>
		</sect3>
		<sect3>
		<title>Randomization or Uniformity?</title>
		<para>

		When applying a form of defense to a specific fingerprinting vector or source,
		there are two general strategies available. Either the implementation for all
		users of a single browser implementation can be made to behave as uniformly as
		possible, or the user agent can attempt to randomize its behavior, so that
		each interaction between a user and a site provides a different fingerprint.

		</para>
		<para>

		Although <ulink url="http://research.microsoft.com/pubs/209989/tr1.pdf">some
		research suggests</ulink> that randomization can be effective, so far striving
		for uniformity has generally proved to be a better strategy for Tor Browser
		for the following reasons:

		It might not be easy to randomize values in a way that they are not
		distinguishable from noise. In particular, naive randomization
		</para>

		<orderedlist>
		<listitem><command>Randomization is not a shortcut</command>
		<para>

		While it appears that many end-user configuration details that the browser
		currently exposes may be safely replaced by false information, randomization
		of these details must be just as exhaustive as an approach that seeks to make
		these behaviors uniform. In the face of either strategy, the adversary can
		still make use of those features which have not been altered to be either
		sufficiently uniform or sufficiently random.

		</para>
		<para>

		Furthermore, the randomization approach seems to break down when it is applied
		to deeper issues where underlying system functionality is directly exposed. In
		particular, it is not clear how to randomize the capabilities of hardware
		attached to a computer in such a way that it either convincingly behaves like
		other hardware, or where the exact properties of the hardware that vary from
		user to user are sufficiently randomized. Similarly, truly concealing operating
		system version differences through randomization may require reimplementation
		of the underlying operating system functionality to ensure that every version
		that your randomization is trying to blend in with is covered by the range of
		possible behaviors.

		</para>
		</listitem>
		<listitem>
		<listitem><command>Evaluation and measurement difficulties</command>
		<para>

		The fact that randomization causes behaviors to differ slightly with every
		visit makes it appealing at first glance, but this same property makes it very
		difficult to objectively measure its effectiveness. By contrast, an
		implementation that strives for uniformity is very simple to measure. Despite
		their current flaws, a properly designed version of <ulink
		url="https://panopticlick.eff.org/">Panopticlick</ulink> or <ulink
		url="https://amiunique.org/">Am I Unique</ulink> could report the entropy and
		uniqueness rates for all users of a single user agent version, without the
		need for complicated statistics about the variance of the measured behaviors.

		</para>
		<para>

		Hard to measure success.
		Randomization (especially incomplete randomization) may also provide a false
		sense of security. When a fingerprinting attempt makes naive use of randomized
		information, a fingerprint will appear unstable, but may not actually be
		sufficiently randomized to prevent a dedicated adversary. Sophisticated
		fingerprinting mechanisms may either ignore randomized information, or
		incorportate knowledge of the distribution and range of randomized values into
		the creation of a more stable fingerprint (by either removing the randomness,
		modeling it, or averaging it).

		</para>
		</listitem>
		<listitem>
		<listitem><command>Usability issues</command>
		<para>

		Completeness. Randomization may provide a false sense of security - any items
		that are not randomized, or for which the randomization can be averaged away
		will still be desirable targets.
		When randomization is introduced to features that affect site behavior, it can
		be very distracting for this behavior to change between visits of a given
		site. For simple cases such as when this information affects layout behavior,
		this will lead to visual nuisances. However, when this information affects
		reported functionality or hardware characteristics, sometimes a site will
		function one way on one visit, and another way on a subsequent visit.

		</para>
		</listitem>
		<listitem>
		<listitem><command>Performance costs</command>

		<para>

		Randomizing involves performance costs. This is especially true if the
		fingerprinting surface is large (like in a modern browser) and one needs more
		elaborate randomizing strategies (including randomized virtualization) to
		ensure that the randomization fully conceals the true behavior.
		ensure that the randomization fully conceals the true behavior. Many calls to
		a cryptographically secure random number generator during the course of a page
		load will both serve to exhaust available entropy pools, as well as lead to
		increased computation while loading a page.

		</listitem>
		<listitem>
		Randomizing itself might introduce a new fingerprinting vector as the
		process of generating the values for the fingerprintable attributes
		could be susceptible to timing side-channel attacks.
		</listitem>
		</itemizedlist>
		We'll see in the next section that the idea of making users uniform does not
		work either in the general way expressed above mainly due to usability issues.
		However, we believe that it avoids a lot of the complications involved in
		randomization even if just used as a guiding principle.
		</para>
		</sect3>
		</listitem>
		<listitem><command>Increased vulnerability surface</command>
		<para>

		Randomizing itself might introduce a new fingerprinting vector as the process
		of generating the values for the fingerprintable attributes could be itself
		susceptible to side-channel attacks, analysis, or exploitation.

		</para>
		</listitem>
		</orderedlist>
		</sect3>
		<sect3 id="fingerprinting-defenses">
		<title>Fingerprinting Defenses in the Tor Browser</title>
		<title>Specific Fingerprinting Defenses in the Tor Browser</title>
		<para>

		The following defenses are listed roughly in order of most severe