Raw import from Trac using Trac markup language. authored by Alexander Hansen Færøy's avatar Alexander Hansen Færøy
[[TOC]]
= Part 1: Analysis of Referenced Descriptor Completeness
This page summarizes the current findings.
The discussion and questions can be found [https://trac.torproject.org/projects/tor/ticket/18798 here].
== Log Entries
The archiving component of CollecTor logs the missing descriptors of various types in a special format.
The following log entry explanation was extracted from Karsten's description in [https://trac.torproject.org/projects/tor/ticket/18798#comment:2 ticket 18798].
{{{
M-2016-04-11T22:00:00Z -> D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0279 -> 0.0279)
M-2016-04-11T23:00:00Z -> D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0279 -> 0.0279)
M-2016-04-11T23:00:00Z -> D-597C4455AF049B147337BBFF35CE4817676339FF5C94E971A05D416FD1A2DD95 (0.0279 -> 0.0558)
M-2016-04-12T00:00:00Z -> D-38F20E16457647CCFF5BD131692D5FCA129E87DC210B456DA983AB291141C85D (0.0280 -> 0.0558)
M-2016-04-12T00:00:00Z -> D-597C4455AF049B147337BBFF35CE4817676339FF5C94E971A05D416FD1A2DD95 (0.0280 -> 0.0558)
}}}
1. The first line means that there's a microdescriptor with digest `38F2..` missing from the microdescriptor consensus with valid-after time `2016-04-11 22:00:00`. That missing microdescriptor adds a value of `0.0279` to the total missing descriptor count which is then `0.0279`. The idea is to only warn if that total value passes `1.0`.
1. The second line says that the same missing microdescriptor is also referenced from the microdescriptor consensus with valid-after time `2016-04-11 23:00:00`. Given that we shouldn't double-count that missing descriptor, we're not increasing the total count there.
1. The third line mentions another microdescriptor with digest `597C..` that is missing, and in this case it's referenced from the microdescriptor consensus with valid-after time `2016-04-11 23:00:00`. That one raises the total count by another `0.0279` to then `0.0558`.
Other log entry examples listing missing descriptors are
{{{
C-2016-03-19T07:00:00Z -> S-BD9E2444C8416A29467463F6B228CEB75B1216B7 (0.0281 -> 0.0281)
S-000A13E991700CB0A356CD08DDC0CDAB022F8B7E -> E-8A8DB3818A2CEE9D2844F8A9AD6FB89E04CFA7D1 (0.0100 -> 8.6512)
V-2016-03-19T09:00:00Z-14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4 -> S-010612B70E18CB3E0CCA72A464E8FD683FDF029B (0.0254 -> 15.5266)
}}}
The short explanation for all four types:
- `S-`: a server descriptor references an extra-info descriptor that is missing,
- `V-`: a vote references a server descriptor that we're missing,
- `C-`: a consensus references a server descriptor that we're missing, and
- `M-`: a microdescriptor consensus references a microdescriptor that is missing (see above).
== Method
The missing descriptor log entries are parsed and collected in sets according to the time-stamp of the log entry and the referrer type.
Using sets we avoid counting a missing descriptor referenced by multiple entities (e.g. different votes, different microconsensus, etc.). Missing server descriptors are listed for votes and consensus separately, i.e., a missing server descriptor referenced by votes and consensus will increase the count in both types.
From these sets two numbers are calculated for each time-stamp and referrer type:
* the number of currently missing descriptors of a certain type belonging to a certain type of referrer and
* the number of new missing descriptors for each time-stamp compared to the previous run.
== Data
The log files last from 2016-03-08 to 2016-04-13 with missing parts 2016-03-09 to 2016-03-18 and 2016-03-24 to 2016-03-31.
There was one known incident of a full server hard drive that prevented storing descriptors around 2016-03-19.
Another peak in missing descriptors is visible around 2016-04-01, which is also explained by a full hard drive.
=== Deciles
The following deciles are calculated without excluding the peaks:
||referenced by||0%||10%||20%||30%||40%||50%||60%||70%||80%||90%|| 100%||
||consensus|| 0|| 0|| 0|| 0|| 0|| 0|| 0|| 0|| 0|| 0|| 1339||
||votes|| 0|| 0|| 0|| 0|| 0|| 0|| 0|| 0|| 1|| 3|| 2375||
||server|| 0|| 2|| 2|| 3|| 7|| 11|| 16|| 19 || 26|| 35|| 55||
||microconsensus|| 0|| 3|| 4|| 5|| 7|| 8|| 12|| 15|| 26|| 56|| 798||
== Graphs
Each of the following diagrams shows the number of total missing descriptor in lighter colors and the number of newly encountered missing descriptors in a darker color.
The y-axis depicts the count, the x-axis the time of measurement.
Counts are discrete, so the lines connecting the data points are just there to make perception easier, they are **not** an interpolation for the time in between measurements.
=== Total Picture
[[Image(mdesc-all-0308-0413-2016.png, width=900)]]
=== April 1st Closeup
[[Image(mdesc-0401-2016.png, width=900)]]
=== April 2nd to April 13th
[[Image(mdesc-0402-0413-2016.png, width=900)]]