Looking at the analysis, our storage for family lists accounted for 1.3MB out of 24.9MB total allocation. That's around 5% of our total allocation, not counting malloc overhead.
Fortunately, this will be pretty easy to fix.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I've got an untested, undocumented, unintegrated sketch design in a branch called ticket27359. It needs to get tested, documented, and integrated with microdesc. (maybe someday it could get integrated with routerdesc -- but that wouldn't work for authorities yet.)
So the idea of the scheme here is to encode families in a more compact representation: roughly, as a sorted array of 21-byte elements, where each element is a one byte tag, and 20 bytes of either a sha1 ID or a nickname. These objects are reference-counted, so that they can be shared by relays with the same family members.
To optimize the number of family objects that are shared by multiple relays, I want to add each relay's hex ID to its own family -- doing this will make all the relays in the same family have the same encoded family. (It's harmless to consider a relay a member of its own family.)
I've run into a few issues with this, and I'm not sure which of them need to be solved pre-merge, and how.
A small number of relays specify invalid hex IDs in their family lines -- typically, with the wrong number of hex digits.
A small number of relays use the $hexid=nickname syntax, which can't be encoded in the manner above, and which is probably not what they want anyway.
Authorities need to process family lines exactly as they are received, and rely on a lossless encoding of their inputs.
When parsing a microdescriptor, we do not know which relay it is associated with, so we can't easily add its own ID to its family when parsing it. (Further, it is in theory possible for two misconfigured relays to wind up with the same microdescriptor, I think.)
I'm not sure quite what to do about each of these issues.