Remove data structure containing unique IP address sets
Relays keep a data structure of unique connecting IP addresses for statistics and for informational purposes.
We should consider removing that data structure. There's a privacy risk in gathering unique IP address sets in memory and in reporting aggregate statistics based on them. If we don't need these statistics, we should stop reporting them and stop gathering the underlying data.
The main (and only?) data structure containing unique IP address sets is clientmap
in src/or/geoip.c
. If we remove that data structure, we would also have to remove:
- the
dirreq-v3-ips
line from extra-info descriptors, - all "bridge statistics" including
bridge-stats-end
,bridge-ips
,bridge-ip-versions
, andbridge-ip-transports
lines from extra-info descriptors, - all "entry node statistics" including
entry-stats-end
andentry-ips
from extra-info descriptors, - the log line
"Heartbeat: In the last %d hours, I have seen %d unique clients."
, and - the
CLIENTS_SEEN
controller event.
1 and 3 are not used. 2 is used by Metrics to estimate the number of daily bridge users, and we'd need to implement legacy/trac#8786 (moved) before removing bridge statistics. atagar thinks that 4 was added by Sebastian a few years back, so that relay operators with certain simple use cases don't need to open a control port and run something like arm. 5 is used by arm for one of its dialogs, and atagar thinks it's not the end of the world to lose that.
Thoughts?