Measure HSDir usage to guide parameter choices
Split off legacy/trac#24425 (moved):
Replying to [ticket:24425#comment:5 asn]:
Replying to [ticket:24425#comment:4 teor]:
If you write down a list of exactly what you want to know, we can probably collect some stats on ~18 HSDirs using PrivCount. ... here are some basic ideas:
- How many v2/v3/both descs per HSDir?
How is this different to "rate of incoming"?
If you mean "cached right now", then I'd need a timeframe so I could design an event. I could do this in December or January.
- How much total RAM do all v2/v3/both descs occupy on your hsdirs? (max,min,avg,mean over your 18 hsdirs)
I think we have some of the data, but I'd need a list of the objects that contribute to RAM usage. Do you just want descriptors, or is there a replay cache? I could do this in December or January.
- Size variance of v2/v3 descs? (max,min,avg,mean)
Already implemented as a histogram, needs defined bin sizes.
- What's the rate of incoming v2/v3/both descs?
Already implemented, needs a time period.
- How many failed requests for HS descriptors over time? (percentage over total requests?)
I'm going to implement this in December.
These are just the obvious stats that I came up with. We can come up with more stuff as we see some results and understand the space better.
Let me know if you need help in turning the above sentences into methodologies.
We will also need an estimate of how much 1 client / service would contribute to each statistic in 10 minutes.
Is that to figure out the noise for differential privacy? Let's try to come up with the final stats list and then we can figure this out.
Yes, that's fine. They only need to be rough estimates.