This is a privacy preserving mechanism for checking HTTPS certificates against the EFF SSL Observatory to see if they might be malicious or known to have been compromised. The Observatory also uses this as a collection mechanism to detect certificates that are not visible on public IPs from our data center (we expect that most man-in-the-middle attacks are not publicly visible).
The feature can run from Firefox extensions like HTTPS Everywhere or Torbutton. This is an opt-in feature, and we suggest you run it over Tor. You can run it without Tor if you would like.
When a participating client sees a new HTTPS TLS certificate, it compares its fingerprint against a local list of the top T most popular TLS fingerprints. If it does not find the fingerprint in this list, it submits the entire certificate chain to the EFF SSL Observatory using Tor or other available proxies.
To prevent submission of private infrastructure certificates, the client also maintains a list of fingerprints of the superset of root CAs trusted by all versions of Firefox, as well as popular 3rd party CAs such as CACert. If a certificate chain is rooted in a CA not in this set, it is assumed to be private, and the certificates it signs are not submitted. Additionally, if the Observatory server cannot resolve the domain name in question, the certificate will not be recorded by default (if the .
The certificate is POSTed to https://observatory.eff.org/submit_cert. The EFF also runs a Tor Exit Enclave on this host, which prevents certain circuit activity correlation attacks against Tor. To prevent DNS requests for this hostname from leaving through a different exit, we will submit the requests to the IP as opposed to the hostname [actually we are not currently addressing the server by IP address, and people have observed that since your exit node can see the server IP address and SNI handshake of your TLS connections, it can't learn anything more by seeing the padded size and timing of your observatory submission. It does see that you are submitting to the observatory though.].
Client UI and configuration Variables
As of May 2011, the planned UI is as follows:
The Observatory is opt-in. Its settings are controlled by a settings page. If the user has Torbutton installed they will also receive a popup at install/upgrade time asking them if they want to opt in. If they don't have torbutton isntalled, they will have to go and find the Observatory to turn it on.
POPUP WINDOW FOR USERS W/ TORBUTTON:
:HTTPS Everywhere can detect attacks against your browser by sending certificates to the Observatory. Turn it on?
(YES / NO / Details and privacy info / Ask me later )
DETAILED SETTINGS PAGE.
HTTPS Everywhere can use EFF's SSL Observatory. It does two things: (1) sends copies of HTTPS certificates to the Observatory, to help us detect 'man in the middle' attacks and improve the Web's security; and (2) lets us warn you about insecure connections or attacks on your browser.
When you visit https://www.example.com, the Observatory will learn that somebody visited that site, but will not know who or what page they looked at. Mouseover the options for further details:
(*) Check certificates using Tor for anonymity ( ) Check certificates even if Tor is not available (TOOLTIP: We will still try to keep the data anonymous, but this option is less secure)
Advanced options >>
[ ] Check/submit certificates for private DNS domains [ ] Check/submit certificates with non-standard root CAs [ ] Do not check/submit certificates for the following domain wildcards: [input field]
= Submission API =
domain: The value of the host piece of the url. If there is a port specified, it is present after a ':' server_ip: The Server IP. May be -1 if unknown, may also be inaccurate due to API limitations. certlist: A JSON-encoded Array of the base64 representation of each certificate in the chain. client_asn: The Autonmous System number of the client (or their Tor exit, if the cert was used through Tor). May be -1 if unknown. private_opt_in: Whether the client opts in to the logging of certs for non-public DNS names (either a "1" or "0"). padding: An arbitrary amount of random data, used to pad the POST to a total of 4096*2^n^ bytes.
Normally, the status code is 200, with XXX OBSOLETE the body being "1" or "0".
The return value is 1 if (cert_sha256 not in certs) and (private_opt_in or visible_in_DNS(domain))
If the fplist contains a certificate that the observatory knows to be dangerous (eg, revoked or using a broken key), the status code is 403 and a JSON object describing the problem(s) is returned, with the following structure:
If there is an error, an appropriate response code is set and the error message is included in the body.
Server Side Design
Input SQL schema
Current schema (this is still not final, we will probably change the reports table to be more efficient & comprehensive)
CREATE TABLE `certs` ( `fp` binary(36) NOT NULL, `raw_cert` blob NOT NULL, `known_bad` varchar(255) NOT NULL, `bad_cert_id` int(11) DEFAULT NULL, PRIMARY KEY (`fp`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; CREATE TABLE `reports` ( `id` int(11) NOT NULL AUTO_INCREMENT, `fp` binary(36) NOT NULL, `server_ip` varchar(39) NOT NULL, `domain` varchar(255) NOT NULL, `client_asn` int(11) NOT NULL DEFAULT '-1', `chain_fp` binary(32) NOT NULL, PRIMARY KEY (`id`), KEY `fp` (`fp`), KEY `cfp` (`chain_fp`) ) ENGINE=MyISAM AUTO_INCREMENT=57479 DEFAULT CHARSET=utf8; CREATE TABLE `chains` ( `id` int(11) NOT NULL AUTO_INCREMENT, `chain_fp` binary(32) NOT NULL, `first_seen` datetime DEFAULT NULL, `last_seen` datetime DEFAULT NULL, `count` int(11) NOT NULL DEFAULT '1', PRIMARY KEY (`id`), KEY `chain_fp` (`chain_fp`) ) ENGINE=MyISAM AUTO_INCREMENT=20378 DEFAULT CHARSET=utf8; CREATE TABLE `bad_cert` ( `id` int(11) NOT NULL AUTO_INCREMENT, `short_desc` varchar(255) NOT NULL, `long_desc` text NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE certs ( fp binary(36) NOT NULL, -- sha1(cert) + md5(cert); because sha256 is not easily available in the client raw_cert blob NOT NULL, known_bad varchar(255) NOT NULL, PRIMARY KEY (fp) ) ENGINE=MyISAM DEFAULT CHARSET=UTF8; CREATE TABLE reports ( id int(11) NOT NULL AUTO_INCREMENT, fp binary(36) NOT NULL, -- sha1(cert) + md5(cert) server_ip varchar(39) NOT NULL, domain varchar(262) NOT NULL, -- 256 chars for domain name, plus up to 7 for a port client_asn int(11) NOT NULL DEFAULT '-1', PRIMARY KEY (id), KEY (fp), timestamp datetime ) ENGINE=MyISAM DEFAULT CHARSET=UTF8;
DB Integration With Observatory
The Observatory project's non-distributed system uses a schema that is partly dynamically generated. This is more suitable to a system which has discrete scans that complete infrequently, and will need to be adjusted to handle the higher frequency of updates the distributed observatory has, as well as to be able to include information about the source of the certificate chains. During a data import operation, certificate chains are processed, the SSL messages are interpreted, certificates are extracted, and their contents are added to "certs" tables. More than one of these certs tables exists, allowing the process to be run over a period of time and in parallel. These certs tables are created by hack_parse.py, use BLOBS or TEXT for most data types (things like subject, issuer, and extensions). Validation of certificate's and their chains happens in multiple passes, and is mostly coordinated by stich_tables.py, which also creates more usable tables like valid_certs. This "finishing process" involves a few other scripts, and results in time stamps, extraction of names, creating other summary tables like roots, seen, all_certs and adding of indexes. The process leaves the certs tables which are still sometimes needed.
Key weaknesses impacting integration:
- Determining certificate chain order
- Dealing with data from different collection times
Currently the only reliable way to determine certificate chains is to group the certificates by the path from which they are loaded. This can be done in the individual certs tables, or on the seen table but not in tables that unique certificates like valid_certs or all_certs. Path is just the name of the results file which was the transcript of the SSL connection. The "id" of the entries in the certs table are sequential with respect to any given path, and the order of the ids allows you to determine the order of the certificates as presented in the chain. This makes working with certificate chains and deterimining their position slow and burdensome, as well as unlikely to work with chains that lack a path such as those from the distributed observatory.
The "seen" table has an entry for each certificate, each time it is seen in a chain. Its fields are currently only IP, fingerprint, fetchtime, path and valid. This table has a lot of rows (12.67M in the December 2010 dataset). To determine what certificate the seen table is referring to a join on fingerprint is used. This isn't enough information to distinguish between submissions to a distributed observatory and scans from a central one, or handle ideas like source ASN.
Improving the schema
To make certificate chains easier find, and the certificate order easier to ascertain as well as to support a wider description of sources for data we need to make chains a clearer concept. The seen table could be augmented to do this and to include the optional data from distributed observatory submissions.
Proposed additions to "seen"
- domain - varchar(255) Optional domain, available in distributed observatory
- client_asn int(11) Optional ASN if known from distributed observatory. Might include this for centralized scans too as it seems useful.
- ChainID - int(11) (type used for other numeric keys like certid) A value unique to each certificate chain, and shared by all the seen entries for certs in that chain
- CertPosition - tinyint A small number, 0 for the leaf, and monotonically increasing down the cert chain. The combination of ChainID and CertPosition are unique.
- Source Not sure how to represent this, but an indicator where this cert came from (distributed obs? which one? central scan?)
Proposed change to "seen"
- The "fetchtime" timestamp could be renamed to "collectiontime". It would remain as the time at which a cert chain was received.
When new submissions are available from the distributed observatory, an import script similar to hack_parse could process them into a certs table, and then re-run the same finishing process used to create valid_certs, validation information and other useful tables. The loading process could be altered to either:
- Store DER encodings of each cert in the DB
- Store text representations of OpenSSLs interpretation of each cert in the DB
Storing a parsed representation would make it a little simpler to do higher risk certificate parsing operations on a separate machine from some of the other finishing tasks.