Store raw descriptor contents as UTF-8 encoded Strings rather than byte[]

When we're reading descriptors from disk we're storing raw descriptor contents as byte[] and returning them in Descriptor#getRawDescriptorBytes(). Also, we're storing partial raw descriptor contents in DirSourceEntry#getDirSourceEntryBytes() and NetworkStatusEntry#getStatusEntryBytes().

Storing byte[] can be useful when writing raw contents back to disk, because we can be sure that contents are exactly the same as when we read them from disk. Namely, we don't have to worry about character encoding.

However, support for handling (large) byte[] content is limited. Today I looked into ways to handle large descriptor files (#20395 (moved)), and I found that most libraries work best with character streams, not with byte streams. And I only briefly considered implementing Knuth-Morris-Pratt myself...

So, I looked at the four main code bases using metrics-lib (CollecTor, ExoneraTor, metrics-web, Onionoo) to see which of them use raw descriptor bytes and how. After all, if we're not using them ourselves, we can as well get rid of them. Here's what I found:

  1. Onionoo's DescriptorQueue uses raw bytes to keep statistics on processed bytes, which seems like something that would still work reasonably well with character lengths.
  2. CollecTor's DescriptorPersistence indeed uses raw descriptor bytes to write descriptors obtained from another CollecTor instance to disk. We'd have to change that.
  3. CollecTor's VotePersistence uses raw descriptor bytes to calculate the digest of votes, which is something we should implement in metrics-lib directly (#20333 (moved)).
  4. ExoneraTor's ExoneraTorDatabaseImporter imports raw status entry bytes into the database, but we know that those are just ASCII, so this would work as well with UTF-8 strings.
  5. metrics-web's RelayDescriptorDatabaseImporter also imports raw status entry bytes into the database, which works with strings for the same reason as above.

I might have overlooked something.

But if not, CollecTor's DescriptorPersistence is the only place where we really need byte[] rather than String. If we can change that, we can switch from Descriptor#getRawDescriptorBytes() to Descriptor#getRawDescriptor() and deprecate the former (and do the same with the other two partial contents).

And then we can resume #20395 (moved) with a much more complete toolbox.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information