PrivCount in Tor
PrivCount makes Tor relay statistics more secure. It secure aggregates and adds noise to Tor relay statistics, which makes it much harder to identify individual tor users from their network usage. PrivCount uses differential privacy to ensure that the final statistics hide individual users' activity.
Background
Proposals
- Proposal 288: Privacy-Preserving Statistics with Privcount in Tor (Shamir version)
- Proposal 280: Privacy-Preserving Statistics with Privcount in Tor (Superseded)
Notes
- Mexico City 2018: PrivCount In Tor Status
- Mexico City 2018: PrivCount In Tor Technical Workshop
- Rome 2018: PrivCount In Tor Explanation
- Rome 2018: PrivCount In Tor Statistics Priorities
- Seattle 2018: PrivCount In Tor Planning
- July 2018: PrivCount Research Retrospective
Next Steps
Optimise for "simplest possible decisions at first" so that we can deploy it.
Upcoming Tickets
[[TicketQuery(order=id,desc=1,format=table,col=resolution|summary|component|milestone|modified|owner|reporter|cc|parent,id=25669&or&id=26637&or&id=23061&or&id=25381&or&id=25153,status!=closed)]]
Noise
-
need to design api for allocating noise using an optimisation method that aaron created. for that we need an action bound and estimated value. the estimated value is not a security parameter; the action bound is
-
we'll need to do measurements with an actual client implementation to discover an appropriate action bound for our desired anonymity set size/security bounds.
- should we protect the average case statistically, or some factor of the average?
-
need detailed spec on what stats and their noise levels
-
versioning for stats when we want to change and/or tweak noisiness
- With PrivCount mixed versioning is tricky because the total noise across all statistics on all reporting relays determines user privacy
- do we just pick the latest counter version, as long as enough relays support it? (it's not safe to report multiple copies of counters)
- if a statistic's version is too old or we believe its noise to be insufficient to maintain privacy:
- we should have a mechanism for telling those clients to simply not report that data
- or we could increase the noise on old statistics
- how do we impose a delay when the noise parameters change? (this delay ensures differential privacy even when the old and new counters are compared)
- or should we try to monotonically increase counter noise?
- With PrivCount mixed versioning is tricky because the total noise across all statistics on all reporting relays determines user privacy
-
we still need to specify how to allocate noise between counters, between relay partitions (to avoid outliers), between relays, and between consensuses
- need threat modelling and decisions on potential bad relays that decide to stop adding noise to their collected statistics
- the proposed attack is that a relay could not add noise in order to discover more from the collected data from other relays
- we could not care because any relay which wanted to be malicious could more effectively do so by exposing their own users
- we could add additional noise based on consensus weight
- we can allocate noise based on the number of N relays in the network such that each relay gets 1/Nth of the noise
- if the noise budget is X, and each relay adds X1/R to the noise, where R is the number of relays that support PrivCount in Tor. (Technically it's Xsqrt(1/R), because noise standard deviation isn't additive.)
- another proposed attack is that a relay adds infinite noise to destroy the statistic for the day
- with N relay partitions, we can resist O(N) malicious or broken relays destroying stats for a day, but we probably need better resistance than that
- the tally reporters can run a large-noise round where they add large additional noise to each relay (multiple times the total network noise), total each relay individually, and eliminate the ones that are larger than the expected noise. This leaks an extra bit of information (large/not large) to a malicious tally reporter.
- the proposed attack is that a relay could not add noise in order to discover more from the collected data from other relays
- need threat modelling and decisions on potential bad relays that decide to stop adding noise to their collected statistics
-
run a simulation of splitting the noise from 6000 relays, and work out if the integer truncation makes our noise too low. (Or we could just use ceil() to be safe.)
-
if we ran privcount on all our current statistics, how many of them would we not be able to collect anymore because it's not possible to add sufficient noise.
Cryptography
- Should we do a multi-level thing for the signing keys? That is, have an identity key for each TR and each DC, and use those to sign short-term keys?
Configuration
- How to tell the DCs the parameters of the system, including:
- who the TRs are, and what their keys are?
- what the counters are, and how much noise to add to each?
- when the collection intervals start and end?
Transmission
- What to say about persistence on the DC side?
- How data is uploaded to DCs?
Aggregation
- How the TRs agree on which DCs' counters to collect?
Tickets
PrivCount Parent Ticket #22898:
[[TicketQuery(order=id,desc=1,format=table,col=resolution|summary|component|milestone|modified|owner|reporter|cc|parent,parent=#22898,status!=closed)]]
All tickets tagged PrivCount:
[[TicketQuery(order=id,desc=1,format=table,col=resolution|summary|component|milestone|modified|owner|reporter|cc|parent,keywords~=privcount,status!=closed)]]