Analyze number of bytes spent on answering directory requests
The new code for counting the number of bytes spent on answering directory requests is merged into 0.2.2.15-alpha, and since August 25, 2010, at least 10 % of all relays have upgraded to that version. Time to pretend these 10 % of relays are a representative subset of all relays/directory mirrors.
I attached four graphs to this ticket:
-
dir-bytes-perc-2010-08-31.png shows the percentage of relays reporting directory bytes. This percentage crossed the 10 % line on August 25, which is why the subsequent graphs start on this date.
-
dir-bytes-total-2010-08-31.png shows the summed up {read,write}-history lines of all extra-info descriptors as solid lines and the summed up and normalized dirreq-{read,write}-history lines as dashed lines. It's obvious that a directory mirror is writing more dir bytes than it's reading, because it receives short http-like requests and replies with the much larger directory objects. What wasn't obvious to me before is that relays write more bytes in total than they read. This difference could be roughly the same as the difference between written and read directory bytes.
-
dir-bytes-diff-2010-08-31.png again shows written and read directory bytes, plus the difference between written and read dir bytes (dashed blue line) and the difference between total written and total read bytes (solid purple line). If the theory was correct that the difference between total written and total read bytes can be explained from answering directory requests, the blue and the purple line would overlap. It could be that we need more than 20 % of all relays to report directory bytes. Or it could be there's another explanation for the remaining difference.
-
dir-bytes-frac-2010-08-31.png finally gives an early answer to the original question: Relays spend roughly 0.2 % of read bytes and 3.5 % of written bytes on answering directory requests.
Mike, is this the kind of answer you expected? What else should we analyze from the given data?
I'm going to re-run this analysis in two to four weeks from now when more relays have upgraded to have more solid results.