Authorities should reject non-UTF-8 content in ExtraInfo descriptors
In #18656 (moved), we discovered that authorities don't validate that ExtraInfo descriptors are printable ASCII before accepting them.
Authorities (and HSDirs) should check every
directory extrainfo document they receive consists only of "printing ASCII" UTF-8, as defined in torspec... prop285:
https://gitweb.torproject.org/torspec.git/tree/proposals/285-utf-8.txt I've heard others say that the following lines allow non-ASCII content, but I'm not sure if that's actually the case, and if it is, how many relays this would affect: the "platform" line in relay descriptors, which is a "human-readable string", the contact "info" line in relay descriptors, which has an undefined format.
Edit: allowing users to spell their names correctly is important. That's why we'll use utf-8 for relay descriptors, votes, and consensuses.
If it is, I'd recommend we make them all ASCII for consistency, and update torspec to clarify, and include it as a "major" change in an 0.2.x tor release. (This means that some users will be unable to spell their names correctly. But there was never any guarantee that 8-bit characters in "info" would be interpreted as users intended. I think security is more important here.)