This is a privacy preserving mechanism for checking HTTPS certificates against the [https://eff.org/observatory EFF SSL Observatory] to see if they might be malicious or known to have been compromised. The Observatory also uses this as a collection mechanism to detect certificates that are not visible on public IPs from our data center (we expect that most man-in-the-middle attacks are not publicly visible).
This is a privacy preserving mechanism for checking HTTPS certificates against the [EFF SSL Observatory](https://eff.org/observatory) to see if they might be malicious or known to have been compromised. The Observatory also uses this as a collection mechanism to detect certificates that are not visible on public IPs from our data center (we expect that most man-in-the-middle attacks are not publicly visible).
The feature can run from Firefox extensions like HTTPS Everywhere or Torbutton. This is an opt-in feature, and we suggest you run it over Tor. You can run it without Tor if you would like.
...
...
@@ -9,9 +9,9 @@ When a participating client sees a new HTTPS TLS certificate, it compares its fi
To prevent submission of private infrastructure certificates, the client also maintains a list of fingerprints of the superset of root CAs trusted by all versions of Firefox, as well as popular 3rd party CAs such as CACert. If a certificate chain is rooted in a CA not in this set, it is assumed to be private, and the certificates it signs are not submitted. Additionally, if the Observatory server cannot resolve the domain name in question, the certificate will not be recorded by default (if the .
The certificate is POSTed to https://observatory.eff.org/submit_cert. The EFF also runs a Tor [wiki:doc/ExitEnclaveExitEnclave] on this host, which prevents certain circuit activity correlation attacks against Tor. To prevent DNS requests for this hostname from leaving through a different exit, we will submit the requests to the IP as opposed to the hostname [actually we are not currently addressing the server by IP address, and people have observed that since your exit node can see the server IP address and SNI handshake of your TLS connections, it can't learn anything more by seeing the padded size and timing of your observatory submission. It does see that you are submitting to the observatory though.].
The certificate is POSTed to https://observatory.eff.org/submit_cert. The EFF also runs a Tor [ExitEnclave](./doc/ExitEnclave) on this host, which prevents certain circuit activity correlation attacks against Tor. To prevent DNS requests for this hostname from leaving through a different exit, we will submit the requests to the IP as opposed to the hostname [actually we are not currently addressing the server by IP address, and people have observed that since your exit node can see the server IP address and SNI handshake of your TLS connections, it can't learn anything more by seeing the padded size and timing of your observatory submission. It does see that you are submitting to the observatory though.].
= Client UI and configuration Variables =
# Client UI and configuration Variables
As of May 2011, the planned UI is as follows:
...
...
@@ -35,29 +35,29 @@ When you visit https://www.example.com, the Observatory will learn that
somebody visited that site, but will not know who or what page they looked at.
Mouseover the options for further details:
{{{
```
(*) Check certificates using Tor for anonymity
( ) Check certificates even if Tor is not available (TOOLTIP: We will still try to keep the data anonymous, but this option is less secure)
}}}
```
Advanced options >>
{{{
```
[ ] Check/submit certificates for private DNS domains
[ ] Check/submit certificates with non-standard root CAs
[ ] Do not check/submit certificates for the following domain wildcards:
[input field]
}}}
```
= Submission API =
== submit_cert ==
## submit_cert
{{{
```
POST /submit_cert
}}}
```
=== Arguments ===
### Arguments
domain:
The value of the host piece of the url. If there is a port specified, it is present after a ':'
...
...
@@ -72,7 +72,7 @@ private_opt_in:
padding:
An arbitrary amount of random data, used to pad the POST to a total of 4096*2^n^ bytes.
=== Response ===
### Response
Normally, the status code is 200, with XXX OBSOLETE the body being "1" or "0".
...
...
@@ -84,12 +84,12 @@ If the fplist contains a certificate that the observatory knows to be dangerous
If there is an error, an appropriate response code is set and the error message is included in the body.
= Server Side Design =
# Server Side Design
== Input SQL schema ==
## Input SQL schema
Current schema (this is still not final, we will probably change the reports table to be more efficient & comprehensive)
{{{
```
CREATE TABLE `certs` (
`fp` binary(36) NOT NULL,
`raw_cert` blob NOT NULL,
...
...
@@ -128,9 +128,9 @@ CREATE TABLE `bad_cert` (
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
}}}
```
Previous schema:
{{{
```
CREATE TABLE certs (
fp binary(36) NOT NULL, -- sha1(cert) + md5(cert); because sha256 is not easily available in the client
...
...
@@ -149,9 +149,9 @@ CREATE TABLE reports (
KEY (fp),
timestamp datetime
) ENGINE=MyISAM DEFAULT CHARSET=UTF8;
}}}
```
== DB Integration With Observatory ==
## DB Integration With Observatory
The Observatory project's non-distributed system uses a schema that is partly dynamically generated. This is more suitable to a system which has discrete scans that complete infrequently, and will need to be adjusted to handle the higher frequency of updates the distributed observatory has, as well as to be able to include information about the source of the certificate chains. During a data import operation, certificate chains are processed, the SSL messages are interpreted, certificates are extracted, and their contents are added to "certs" tables. More than one of these certs tables exists, allowing the process to be run over a period of time and in parallel. These certs tables are created by hack_parse.py, use BLOBS or TEXT for most data types (things like subject, issuer, and extensions). Validation of certificate's and their chains happens in multiple passes, and is mostly coordinated by stich_tables.py, which also creates more usable tables like valid_certs. This "finishing process" involves a few other scripts, and results in time stamps, extraction of names, creating other summary tables like roots, seen, all_certs and adding of indexes. The process leaves the certs tables which are still sometimes needed.
...
...
@@ -163,7 +163,7 @@ Currently the only reliable way to determine certificate chains is to group the
The "seen" table has an entry for each certificate, each time it is seen in a chain. Its fields are currently only IP, fingerprint, fetchtime, path and valid. This table has a lot of rows (12.67M in the December 2010 dataset). To determine what certificate the seen table is referring to a join on fingerprint is used. This isn't enough information to distinguish between submissions to a distributed observatory and scans from a central one, or handle ideas like source ASN.
== Improving the schema ==
## Improving the schema
To make certificate chains easier find, and the certificate order easier to ascertain as well as to support a wider description of sources for data we need to make chains a clearer concept. The seen table could be augmented to do this and to include the optional data from distributed observatory submissions.
Proposed additions to "seen"
...
...
@@ -188,7 +188,7 @@ When new submissions are available from the distributed observatory, an import s
Storing a parsed representation would make it a little simpler to do higher risk certificate parsing operations on a separate machine from some of the other finishing tasks.