= Part 1: Analysis of Referenced Descriptor Completeness
This page summarizes the current findings.
The discussion and questions can be found [https://trac.torproject.org/projects/tor/ticket/18798 here].
== Log Entries
The archiving component of CollecTor logs the missing descriptors of various types in a special format.
The following log entry explanation was extracted from Karsten's description in [https://trac.torproject.org/projects/tor/ticket/18798#comment:2 ticket 18798].
1. The first line means that there's a microdescriptor with digest `38F2..` missing from the microdescriptor consensus with valid-after time `2016-04-11 22:00:00`. That missing microdescriptor adds a value of `0.0279` to the total missing descriptor count which is then `0.0279`. The idea is to only warn if that total value passes `1.0`.
1. The second line says that the same missing microdescriptor is also referenced from the microdescriptor consensus with valid-after time `2016-04-11 23:00:00`. Given that we shouldn't double-count that missing descriptor, we're not increasing the total count there.
1. The third line mentions another microdescriptor with digest `597C..` that is missing, and in this case it's referenced from the microdescriptor consensus with valid-after time `2016-04-11 23:00:00`. That one raises the total count by another `0.0279` to then `0.0558`.
Other log entry examples listing missing descriptors are
-`S-`: a server descriptor references an extra-info descriptor that is missing,
-`V-`: a vote references a server descriptor that we're missing,
-`C-`: a consensus references a server descriptor that we're missing, and
-`M-`: a microdescriptor consensus references a microdescriptor that is missing (see above).
== Method
The missing descriptor log entries are parsed and collected in sets according to the time-stamp of the log entry and the referrer type.
Using sets we avoid counting a missing descriptor referenced by multiple entities (e.g. different votes, different microconsensus, etc.). Missing server descriptors are listed for votes and consensus separately, i.e., a missing server descriptor referenced by votes and consensus will increase the count in both types.
From these sets two numbers are calculated for each time-stamp and referrer type:
* the number of currently missing descriptors of a certain type belonging to a certain type of referrer and
* the number of new missing descriptors for each time-stamp compared to the previous run.
== Data
The log files last from 2016-03-08 to 2016-04-13 with missing parts 2016-03-09 to 2016-03-18 and 2016-03-24 to 2016-03-31.
There was one known incident of a full server hard drive that prevented storing descriptors around 2016-03-19.
Another peak in missing descriptors is visible around 2016-04-01, which is also explained by a full hard drive.
=== Deciles
The following deciles are calculated without excluding the peaks:
Each of the following diagrams shows the number of total missing descriptor in lighter colors and the number of newly encountered missing descriptors in a darker color.
The y-axis depicts the count, the x-axis the time of measurement.
Counts are discrete, so the lines connecting the data points are just there to make perception easier, they are **not** an interpolation for the time in between measurements.