Add note to dir-spec.txt that cell-stats can be inaccurate
Rob and I found that cell-stats can be inaccurate under certain circumstances. We should update dir-spec.txt to say this. Making cell-stats more accurate should wait until we're clearer what other cell-stats we'd like to see.
See the attached diagram for an example. For cell stats, we only look at processed cells, here purple. For the mean queue length (
cell-queued-cells), we sum up waiting times, here 750 + 750, and divide by measurement interval length. This circuit was opened before the statistics interval and not closed during this interval, so the interval length is 1000. As a result, we come up with 1.5 as mean queue length. That's not quite correct. As one can see, there's one cell in the queue for most of the time, with only two short phases with two cells.
The correct way to calculate mean queue length would be to only sum up waiting times within the statistics interval, and to include the green cell that was not processed during the interval. The result would be a mean queue length of 1.25.
Similarly, mean waiting time (
cell-time-in-queue) is currently defined as mean waiting time of processed cells, so (750 + 750) / 2. We might also consider only waiting times inside the observed interval and include non-processed cells. However, this is more complicated to calculate in a meaningful way.
These inaccuracies are hardly relevant for 24-hour intervals, because only a small fraction of circuits is open in two such intervals. But it matters for shorter intervals, like in Rob's simulated Shadow networks. We think that adding a note to dir-spec.txt and possibly coming up with more accurate cell-stats later should be fine for now.
Please find branch cell-stats-note in my public torspec repository.