Changes

Richard Pospesel · 3dbb8bc5
--- a/Design-Documents/Tor-Browser-Design-Doc.md
+++ b/Design-Documents/Tor-Browser-Design-Doc.md
@@ -34,49 +34,51 @@ June 15, 2018
    2.3 [Philosophy](#23-philosophy)
-    2.4 [Limitations](#24-limitations)
 3. [Adversary Model](#3-adversary-model)
-   3.1 [Adversary Goals](#31-adversary-goals)
+    3.1 [Adversary Goals](#31-adversary-goals)
-   3.2 [Adversary Positioning](#32-adversary-positioning)
+    3.2 [Adversary Positioning](#32-adversary-positioning)
+    3.3 [Adversary Attacks](#33-adversary-attacks)
+    3.4 [Limitations](#34-limitations)
-   3.3 [Adversary Attacks](#33-adversary-attacks)
 4. [Implementation](#4-implementation)
-   4.1 [Proxy Obedience](#41-proxy-obedience)
+    4.1 [Proxy Obedience](#41-proxy-obedience)
-   4.2 [State Separation](#42-state-separation)
+    4.2 [State Separation](#42-state-separation)
-   4.3 [Disk Avoidance](#43-disk-avoidance)
+    4.3 [Disk Avoidance](#43-disk-avoidance)
-   4.4 [Application Data Isolation](#44-application-data-isolation)
+    4.4 [Application Data Isolation](#44-application-data-isolation)
-   4.5 [Cross-Origin Identifier Unlinkability](#45-cross-origin-identifier-unlinkability)
+    4.5 [Cross-Origin Identifier Unlinkability](#45-cross-origin-identifier-unlinkability)
-   4.6 [Cross-Origin Fingerprinting Unlinkability](#46-cross-origin-fingerprinting-unlinkability)
+    4.6 [Cross-Origin Fingerprinting Unlinkability](#46-cross-origin-fingerprinting-unlinkability)
-   4.7 [Long-Term Unlinkability via "New Identity" button](#47-long-term-unlinkability-via-new-identity-button)
+    4.7 [Long-Term Unlinkability via "New Identity" button](#47-long-term-unlinkability-via-new-identity-button)
-   4.8 [Other Security Measures](#48-other-security-measures)
+    4.8 [Other Security Measures](#48-other-security-measures)
 5. [Build Security and Package Integrity](#5-build-security-and-package-integrity)
-   5.1 [Achieving Binary Reproducibility](51-achieving-binary-reproducibility)
+    5.1 [Achieving Binary Reproducibility](51-achieving-binary-reproducibility)
-   5.2 [Package Signatures and Verification](#52-package-signatures-and-verification)
+    5.2 [Package Signatures and Verification](#52-package-signatures-and-verification)
-   5.3 [Anonymous Verification](#53-anonymous-verification)
+    5.3 [Anonymous Verification](#53-anonymous-verification)
-   5.4 [Update Safety](#54-update-safety)
+    5.4 [Update Safety](#54-update-safety)
 6. [Towards Transparency in Navigation Tracking](#6-towards-transparency-in-navigation-tracking)
-   6.1 [Deprecation Wishlist](#61-deprecation-wishlist)
+    6.1 [Deprecation Wishlist](#61-deprecation-wishlist)
-   6.2 [Promising Standards](#62-promising-standards)
+    6.2 [Promising Standards](#62-promising-standards)
 ## 1. Introduction
@@ -194,36 +196,6 @@ In addition to the above design requirements, the technology decisions about the
    We believe that if we do not stay current with the support of new web technologies, we cannot hope to substantially influence or be involved in their proper deployment or privacy realization.
    However, we will likely disable high-risk features pending analysis, audit, and mitigation.
-### 2.4 Limitations
-1. **Application data isolation**
-    In the past, we have made [application data isolation](https://2019.www.torproject.org/projects/torbrowser/design/#app-data-isolation) an explicit goal, whereby all evidence of the existence of Tor Browser usage can be removed via secure deletion of the installation folder.
-    This is not generally achievable.
-    To hypothetically solve this problem in the general case, we would need to modify the browser to either work around any data-leaking external API calls or implement cleanup functionality for each platform to wipe the offending data from disk.
-    Some of this cleanup would necessarily require elevated privileges (e.g. Admin or root) to cleanup leaks made by the operating system itself, which goes against our principle of least privilege.
-    We would also need continual audits to identify all of the conditions under which the user's operating itself leaks information about their browsing session for each supported operating system and CPU architecture.
-    Practically speaking, it is not possible to provide this functionality with a level of confidence required for cases where physical access is a concern.
-    The majority of deployed Tor Browser installs run on platforms which either explicitly disrespect user agency and privacy (for-profit platforms such as Android, macOS, and Windows) or whose threat model may be less extreme than that of some of our users (the various flavours of Linux and BSD).
-    Users whose threat model *does* include the need to hide evidence of their usage of Tor Browser should use Tor Browser with the [Tails operating system](https://tails.net/).
-    Tails is a purpose-built Linux-based operating system which is ephemeral by default, and also supports full-disk encryption for optional persistent storage if needed.
-    It essentially provides whole operating system level data isolation to its users with a level of confidence unachievable for Tor Browser on its own.
-1. **Arbitrary code execution**
-    In the general case, we must also presume the adversary does not have the ability to run arbitrary code outside of the browser's sandbox.
-    That is to say, we presume the user's system has not been exploited and is free of malware, keyloggers, rootkits, etc.
-    For the purposes of our adversary model, we presume that user's operating system is not compromised or otherwise working against the user's own interests.
-    This assumption is most likely not true in the general case, particularly in the case of the aforementioned for-profit platforms or for computers which the user shares with others.
-    However, the browser is ultimately just another process running with limited privileges within a larger ecosystem which it has no control over.
-    We are therefore unable to make promises about the browser's capabilities or protections in such environments.
-    We would again direct users whose threat model necessitates being unable trust their computer to use the [Tails operating system](https://tails.net/).
 ## 3. Adversary Model
 The browser's adversaries have a number of possible goals, capabilities, and attack types that can be used to illustrate the design requirements for the browser.
@@ -411,106 +383,35 @@ The adversary can perform the following attacks from a number of possible positi
    - System logs
    - Recent files lists
-#### Old
+### 3.4 Limitations
-1. **Read and insert identifiers**
-    The browser contains multiple facilities for storing identifiers that the adversary creates for the purposes of tracking users.
-    These identifiers are most obviously cookies, but also include HTTP auth, DOM storage, cached scripts and other elements with embedded identifiers, client certificates, and even TLS Session IDs.
-    An adversary in a position to perform MITM content alteration can inject document content elements to both read and inject cookies for arbitrary domains.
-    In fact, even many "SSL secured" websites are vulnerable to this sort of [active sidejacking](http://seclists.org/bugtraq/2007/Aug/0070.html) `ma1 should review this`.
-    `TODO: ma1 should review this`
-    In addition, the ad networks of course perform tracking with cookies as well.
-    These types of attacks are attempts at subverting our [Cross-Origin Identifier Unlinkability](#45-cross-origin-identifier-unlinkability) and [Long-Term Unlinkability](#47-long-term-unlinkability-via-new-identity-button) design requirements.
-2. **Fingerprint users based on browser attributes**
-    There is an absurd amount of information available to websites via attributes of the browser.
-    This information can be used to reduce the anonymity set, or even uniquely fingerprint individual users.
-    Attacks of this nature are typically aimed at tracking users across sites without their consent, in an attempt to subvert our [Cross-Origin Identifier Unlinkability](#45-cross-origin-identifier-unlinkability)  and [Long-Term Unlinkability](#47-long-term-unlinkability-via-new-identity-button) design requirements.
-    Fingerprinting is an intimidating problem to attempt to tackle, especially without a metric to determine or at least intuitively understand and estimate which features will most contribute to linkability between visits.
-    The [Panopticlick study done](https://panopticlick.eff.org/about) by the EFF uses the [Shannon entropy](https://en.wikipedia.org/wiki/Entropy_%28information_theory%29) - the number of identifying bits of information encoded in browser properties - as this metric.
-    Their result data is definitely useful, and the metric is probably the appropriate one for determining how identifying a particular browser property is.
-    However, some quirks of their study means that they do not extract as much information as they could from display information: they only use desktop resolution and do not attempt to infer the size of toolbars.
-    In the other direction, they may be over-counting in some areas, as they did not compute joint entropy over multiple attributes that may exhibit a high degree of correlation.
-    Also, new browser features are added regularly, so the data should not be taken as final.
-    Despite the uncertainty, all fingerprinting attacks leverage the following attack vectors:
-    1. **Observing Request Behavior**
+1. **Application data isolation**
-        Properties of the user's request behavior comprise the bulk of low-hanging fingerprinting targets.
-        These include: User agent, Accept-* headers, pipeline usage, and request ordering.
-        Additionally, the use of custom filters such as AdBlock and other privacy filters can be used to fingerprint request patterns (as an extreme example).
-    2. **Inserting JavaScript**
-        JavaScript can reveal a lot of fingerprinting information.
-        It provides DOM objects such as window.screen and window.navigator to extract information about the user agent.
-        Also, JavaScript can be used to query the user's timezone via the `Date()` object, [WebGL](https://www.khronos.org/registry/webgl/specs/1.0/#5.13) can reveal information about the video card in use, and high precision timing information can be used to [fingerprint the CPU and interpreter speed](https://cseweb.ucsd.edu/~hovav/dist/jspriv.pdf).
-        JavaScript features such as [Resource Timing](https://www.w3.org/TR/resource-timing/) may leak an unknown amount of network timing related information.
-        And, moreover, JavaScript is able to [extract](https://seclab.cs.ucsb.edu/media/uploads/papers/sp2013_cookieless.pdf) [available](https://www.cosic.esat.kuleuven.be/fpdetective/) [fonts](https://hal.inria.fr/hal-01285470v2/document) on a device with high precision.
-    3. **~Inserting Plugins~**
-        ~The Panopticlick project found that the mere list of installed plugins (in navigator.plugins) was sufficient to provide a large degree of fingerprintability.
-        Additionally, plugins are capable of extracting font lists, interface addresses, and other machine information that is beyond what the browser would normally provide to content.
-        In addition, plugins can be used to store unique identifiers that are more difficult to clear than standard cookies.
-        [Flash-based cookies](https://epic.org/privacy/cookies/flash.html) fall into this category, but there are likely numerous other examples.
-        Beyond fingerprinting, plugins are also abysmal at obeying the proxy settings of the browser.~
-        `TODO: plugins are no longer relevant but we could arguably add a section about codec support as a fingerprinting vector`
-    4. **Inserting CSS**
-        [CSS media queries](https://developer.mozilla.org/En/CSS/Media_queries) can be inserted to gather information about the desktop size, widget size, display type, DPI, user agent type, and other information that was formerly available only to JavaScript.
-    `TODO: thorin should also review and suggest additions to this section`
-3. **Website traffic fingerprinting**
-    Website traffic fingerprinting is an attempt by the adversary to recognize the encrypted traffic patterns of specific websites.
-    In the case of Tor, this attack would take place between the user and the Guard node, or at the Guard node itself.
-    The most comprehensive study of the statistical properties of this attack against Tor was done by [Panchenko et al](https://lorre.uni.lu/~andriy/papers/acmccs-wpes11-fingerprinting.pdf).
-    Unfortunately, the publication bias in academia has encouraged the production of [a number of follow-on attack papers claiming "improved" success rates](https://blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks), in some cases even claiming to completely invalidate any attempt at defense.
-    These "improvements" are actually enabled primarily by taking a number of shortcuts (such as classifying only very small numbers of web pages, neglecting to publish ROC curves or at least false positive rates, and/or omitting the effects of dataset size on their results).
-    Despite these subsequent "improvements", we are skeptical of the efficacy of this attack in a real world scenario, especially in the face of any defenses.
-    In general, with machine learning, as you increase the [number and/or complexity of categories to classify](https://en.wikipedia.org/wiki/VC_dimension) while maintaining a limit on reliable feature information you can extract, you eventually run out of descriptive feature information, and either true positive accuracy goes down or the false positive rate goes up.
-    This error is called the [bias in your hypothesis space](https://www.cs.washington.edu/education/courses/csep573/98sp/lectures/lecture8/sld050.htm).
-    In fact, even for unbiased hypothesis spaces, the number of training examples required to achieve a reasonable error bound is [a function of the complexity of the categories](https://en.wikipedia.org/wiki/Probably_approximately_correct_learning#Equivalence) you need to classify.
-    In the case of this attack, the key factors that increase the classification complexity (and thus hinder a real world adversary who attempts this attack) are large numbers of dynamically generated pages, partially cached content, and also the non-web activity of the entire Tor network.
-    This yields an effective number of "web pages" many orders of magnitude larger than even [Panchenko's "Open World" scenario](https://lorre.uni.lu/~andriy/papers/acmccs-wpes11-fingerprinting.pdf), which suffered continuous near-constant decline in the true positive rate as the "Open World" size grew (see figure 4).
-    This large level of classification complexity is further confounded by a noisy and low resolution featureset - one which is also relatively easy for the defender to manipulate at low cost.
-    To make matters worse for a real-world adversary, the ocean of Tor Internet activity (at least, when compared to a lab setting) makes it a certainty that an adversary attempting examine large amounts of Tor traffic will ultimately be overwhelmed by false positives (even after making heavy tradeoffs on the ROC curve to minimize false positives to below 0.01%).
+    In the past, we have made [application data isolation](https://2019.www.torproject.org/projects/torbrowser/design/#app-data-isolation) an explicit goal, whereby all evidence of the existence of Tor Browser usage can be removed via secure deletion of the installation folder.
-    This problem is known in the IDS literature as the [Base Rate Fallacy](http://www.raid-symposium.org/raid99/PAPERS/Axelsson.pdf), and it is the primary reason that anomaly and activity classification-based IDS and antivirus systems have failed to materialize in the marketplace (despite early success in academic literature).
+    This is not generally achievable.
-    Still, we do not believe that these issues are enough to dismiss the attack outright.
+    To hypothetically solve this problem in the general case, we would need to modify the browser to either work around any data-leaking external API calls or implement cleanup functionality for each platform to wipe the offending data from disk.
-    But we do believe these factors make it both worthwhile and effective to [deploy light-weight defenses](https://2019.www.torproject.org/projects/torbrowser/design/#traffic-fingerprinting-defenses) that reduce the accuracy of this attack by further contributing noise to hinder successful feature extraction.
+    Some of this cleanup would necessarily require elevated privileges (e.g. Admin or root) to cleanup leaks made by the operating system itself, which goes against our principle of least privilege.
+    We would also need continual audits to identify all of the conditions under which the user's operating itself leaks information about their browsing session for each supported operating system and CPU architecture.
-    `TODO: so this whole section is very long to say not a lot, revise this down to the relevant bits`
+    Practically speaking, it is not possible to provide this functionality with a level of confidence required for cases where physical access is a concern.
+    The majority of deployed Tor Browser installs run on platforms which either explicitly disrespect user agency and privacy (for-profit platforms such as Android, macOS, and Windows) or whose threat model may be less extreme than that of some of our users (the various flavours of Linux and BSD).
-4. **Remotely or locally exploit browser and/or OS**
+    Users whose threat model *does* include the need to hide evidence of their usage of Tor Browser should use Tor Browser with the [Tails operating system](https://tails.net/).
+    Tails is a purpose-built Linux-based operating system which is ephemeral by default, and also supports full-disk encryption for optional persistent storage if needed.
+    It essentially provides whole operating system level data isolation to its users with a level of confidence unachievable for Tor Browser on its own.
-    Last, but definitely not least, the adversary can exploit either general browser vulnerabilities, plugin vulnerabilities, or OS vulnerabilities to install malware and surveillance software.
+1. **Arbitrary code execution**
-    An adversary with physical access can perform similar actions.
-    For the purposes of the browser itself, we limit the scope of this adversary to one that has passive forensic access to the disk after browsing activity has taken place.
+    In the general case, we must also presume the adversary does not have the ability to run arbitrary code outside of the browser's sandbox.
-    This adversary motivates our [Disk Avoidance](#43-disk-avoidance) defenses.
+    That is to say, we presume the user's system has not been exploited and is free of malware, keyloggers, rootkits, etc.
+    For the purposes of our adversary model, we presume that user's operating system is not compromised or otherwise working against the user's own interests.
-    An adversary with arbitrary code execution typically has more power, though.
+    This assumption is most likely not true in the general case, particularly in the case of the aforementioned for-profit platforms or for computers which the user shares with others.
-    It can be quite hard to really significantly limit the capabilities of such an adversary.
+    However, the browser is ultimately just another process running with limited privileges within a larger ecosystem which it has no control over.
-    [The Tails system](https://tails.boum.org/contribute/design/) can provide some defense against this adversary through the use of readonly media and frequent reboots, but even this can be circumvented on machines without Secure Boot through the use of BIOS rootkits.
+    We are therefore unable to make promises about the browser's capabilities or protections in such environments.
-    `TODO: adversaries with arbitrary code execution are outside the scope of what the browser can protect against`
+    We would again direct users whose threat model necessitates being unable trust their computer to use the [Tails operating system](https://tails.net/).
 ## 4. Implementation