Commit f3f289f1 authored by Karsten Loesing's avatar Karsten Loesing
Browse files

Temp commit: tweak exit list spec based on discussion.

parent 1eeefc46
Loading
Loading
Loading
Loading
+36 −68
Original line number Diff line number Diff line
@@ -13,128 +13,96 @@ This document defines the Tor exit list document format as written by Tor exit l

1. Document meta-format

Exit lists follow the same document meta-format as Tor descriptors.
Exit lists follow the same document meta-format as specified in Section 1.2 of the Tor directory protocol, version 3.

The highest level object is a Document, which consists of one or more Items. Every Item begins with a KeywordLine, followed by zero or more Objects. A KeywordLine begins with a Keyword, optionally followed by whitespace and more non-newline characters, and ends with a newline. A Keyword is a sequence of one or more characters in the set [A-Za-z0-9-]. An Object is a block of encoded data in pseudo-Privacy-Enhanced-Mail (PEM) style format: that is, lines of encoded data MAY be wrapped by inserting an ascii linefeed ("LF", also called newline, or "NL" here) character (cf. RFC 4648 §3.1). When line wrapping, implementations MUST wrap lines at 64 characters. Upon decoding, implementations MUST ignore and discard all linefeed characters.

More formally:

NL = The ascii LF character (hex value 0x0a).
Document ::= (Item | NL)+
Item ::= KeywordLine Object*
KeywordLine ::= Keyword NL | Keyword WS ArgumentChar+ NL
Keyword = KeywordChar+
KeywordChar ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-'
ArgumentChar ::= any printing ASCII character except NL.
WS = (SP | TAB)+
Object ::= BeginLine Base64-encoded-data EndLine
BeginLine ::= "-----BEGIN " Keyword "-----" NL
EndLine ::= "-----END " Keyword "-----" NL

A Keyword may not be "-----BEGIN".

The BeginLine and EndLine of an Object must use the same keyword.

When interpreting a Document, software MUST ignore any KeywordLine that starts with a keyword it doesn't recognize; future implementations MUST NOT require current clients to understand any KeywordLine not currently described.

In our document descriptions below, we tag Items with a multiplicity in brackets. Possible tags are:

"At start, exactly once": These items MUST occur in every instance of the document type, and MUST appear exactly once, and MUST be the first item in their documents.

"Exactly once": These items MUST occur exactly one time in every instance of the document type.

"At end, exactly once": These items MUST occur in every instance of the document type, and MUST appear exactly once, and MUST be the last item in their documents.

"At most once": These items MAY occur zero or one times in any instance of the document type, but MUST NOT occur more than once.

"Any number": These items MAY occur zero, one, or more times in any instance of the document type.

"Once or more": These items MUST occur at least once in any instance of the document type, and MAY occur more.

For forward compatibility, each item MUST allow extra arguments at the end of the line unless otherwise noted. Whenever an item DOES NOT allow extra arguments, we will tag it with "no extra arguments".
Timestamps contained in exit lists use the same date/time format as timestamps in Tor directory protocol data formats, YYYY-MM-DD SP HH:MM:SS.


2. Document header

"ExitList" SP version NL
"exit-list" SP version NL

[At most once in version 1.]
[At start, exactly once in version 2 or later.]
[At start, exactly once, only version 2 or later.]

Version of this document format.

"ScannerIdentity" SP identity NL
"identity" SP identity NL

[At most once, version 2 or later.]

Identity of the exit scanner host. Identity is a fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded in hex) for a router's identity key.

"ScannerAddress" SP address NL
"ScannerAddress6" SP address NL
"address4" SP address NL
"address6" SP address NL

[Any number.]
[Any number, version 2 or later.]

Address of the exit scanner host, which is an IPv4 address, represented as a dotted quad ("ScannerAddress" line), or an IPv6 address, surrounded by square brackets ("ScannerAddress6" line).
Address of the exit scanner host, which is an IPv4 address, represented as a dotted quad ("address4" line), or an IPv6 address, surrounded by square brackets ("address6" line).

"ScannerLocation" SP country SP asn NL
"location" SP country SP asn NL

[At most once.]
[At most once, version 2 or later.]

Location of this exit scanner host. Country is loosely based on ISO 3166-1 alpha-2 with possible extensions added by whichever data source is used for resolving the host's IP address. asn is the autonomous system number.
Location of this exit scanner host: country part is loosely based on ISO 3166-1 alpha-2 with possible extensions added by whichever data source is used for resolving the host's IP address; asn is the autonomous system number.

"ScannerContact" SP contact NL
"contact" SP contact NL

[At most once.]
[At most once, version 2 or later.]

A human-readable string describing a way to contact the exit scanner's administrator, preferably including an email address and a PGP key fingerprint.
A human-readable string describing a way to contact the exit scanner's administrator, preferably including an email address and a PGP key fingerprint. This line is similar to the "contact" line specified in the Tor directory protocol, version 3.

"ScannerSoftware" SP software NL
"software" SP software NL

[At most once.]
[At most once, version 2 or later.]

A human-readable string describing the name and version of the software that created this exit list.

"Created" SP YYYY-MM-DD SP HH:MM:SS NL
"created" SP created NL

[At most once.]
[At most once, version 2 or later.]

The time, in UTC, when this exit list was generated. The software generating this exit list may use its own schedule for generating exit lists. Typically, it would generate a new list every hour, but this is not required.

"Downloaded" SP YYYY-MM-DD SP HH:MM:SS NL
"Downloaded" SP downloaded NL

[At most once, version 1 only.]

[At most once.]
The time, in UTC, when this exit list was downloaded from the exit scanner. Only included by the software downloading the exit list if it doesn't already include a "created" timestamp.

The time, in UTC, when this exit list was downloaded from the exit scanner. Only included by the software downloading the exit list if it doesn't already include a "Created" timestamp.

3. Document body

The document body of an exit list contains zero or more exit list entries. Multiplicities refer to the exit list entry.

"ExitNode" SP identity NL
"ExitNode" SP identity NL    [version 1]
"exit-node" SP identity NL   [version 2 or later]

[At start, exactly once.]

Identity is a fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded in hex) for a router's identity key.

"Published" SP YYYY-MM-DD SP HH:MM:SS NL
"Published" SP published NL

[Exactly once.]
[Exactly once, version 1 only.]

The time, in UTC, when the last known descriptor was published by the router. The software may use this timestamp to decide not to perform another test until a newer descriptor arrives.

"LastStatus" SP YYYY-MM-DD SP HH:MM:SS NL
"LastStatus" SP received NL

[Exactly once.]
[Exactly once, version 1 only.]

The time, in UTC, when the software last received a network status update for this router. This time typically does not match statuses' publication or valid-after time. The software may use this timestamp to decide when to discard a router.

"ExitAddress" SP address SP YYYY-MM-DD SP HH:MM:SS NL
"ExitAddress6" SP address SP YYYY-MM-DD SP HH:MM:SS NL
"ExitAddress" SP address SP scanned NL     [version 1]
"exit-address4" SP address SP scanned NL   [version 2 or later]
"exit-address6" SP address SP scanned NL   [version 2 or later]

[Once or more.]

An address used by the router as exit address and the time, in UTC, when this address was last seen in an exit scan. Address can be an IPv4 address, represented as a dotted quad ("ExitAddress" line), or an IPv6 address, surrounded by square brackets ("ExitAddress6" line).
An address used by the router as exit address and the time, in UTC, when this address was last seen in an exit scan. Address can be an IPv4 address, represented as a dotted quad ("ExitAddress" or "exit-address4" line), or an IPv6 address, surrounded by square brackets ("exit-address6" line).


4. Document footer

empty
Exit lists currently do not contain a document footer. They end with the last contained exit list entry.