Trac: Summary: Authorities should reject non-UTF-8 in relay descriptors to Authorities and relays should reject non-UTF-8 in relay descriptors Description: Part of #24033 (moved).
When we check different kinds of directory documents, we'll want the BOM check as part of a descriptor UTF-8 checking function in util_string.c. Would you mind moving it in this branch?
Since the UTF-8 checking function accepts a length, we should probably also check for NUL. If we fail on the first NUL, this check becomes a last-ditch memory safety check, as most bytes in RAM are NUL.
We should probably also log a bug warning if the function encounters a NUL byte:
if (BUG(failure-condition) { failure-action;}
or
if (success-condition) { return}tor_assert_nonfatal_unreached();
I wonder if the compiler that we're using has been updated to a broken version.
I pushed the common ancestor of your branch and master to master-unicode-descriptors1, and the current master to master. If they both fail, I'll open a ticket to disable that build:
https://github.com/teor2345/tor/branches
Since the UTF-8 checking function accepts a length, we should probably also check for NUL. If we fail on the first NUL, this check becomes a last-ditch memory safety check, as most bytes in RAM are NUL.
We should probably also log a bug warning if the function encounters a NUL byte:
Receiving a NUL byte isn't a bug though. This function is processing untrusted input from fetch_from_buf_http() that might contain anything including NUL bytes.
Related a few past tickets where this has bitten us.
Just a quick note that dirauths could use Stem as a tool for rejecting malformed content. It does stricter validation than the tor binary that descriptors are conformant with the spec. I've been performing this check through DocTor since 2013, filing tickets each time more bad data makes it into the consensus...
Bad data chokes not only Stem, but metrics-lib and anything else that ingests it.
Clearly in an ideal world the tor binary itself would do better validation but in the absence of that if we took advantage of Stem's validator I wouldn't need to keep filing tickets every few months. Using Stem on dirauths to reject malformed descriptors would prevent these issues upfront, saving Karsten and I hassle.
If that's a no-go I could also redirect the DocTor check I mention above to email other folks (Nick? teor? Maybe the network team list?) so I don't need to file tickets each time this comes up.
Oh, and I think that the changes file still mentions relays?
The changes file was reverted actually. Fixed.
But this ticket says 'and relays.' Could edit the summary and have a separate ticket for making relays reject in the future?
Sure. I changed the title.
The separate ticket for relays checking their own descriptors is #27414 (moved).
(The torrc check should be enough, but I'd like to add another check right before the descriptor and extrainfo are uploaded.)
The separate ticket for relays rejecting other relays' descriptors is #27369 (moved).
Trac: Summary: Authorities and relays should reject non-UTF-8 in relay descriptors to Authorities should reject non-UTF-8 in relay descriptors
Since the UTF-8 checking function accepts a length, we should probably also check for NUL. If we fail on the first NUL, this check becomes a last-ditch memory safety check, as most bytes in RAM are NUL.
We should probably also log a bug warning if the function encounters a NUL byte:
Receiving a NUL byte isn't a bug though. This function is processing untrusted input from fetch_from_buf_http() that might contain anything including NUL bytes.
Right, it's not a bug in tor, but it is a protocol violation. (NUL bytes are ok in compressed content. But when it's uncompressed, it should all be UTF-8.)
Created a squashed version of the branch at unicode-descriptors-dirauth1 that's rebased on top of some additions to the unit tests in #27373 (moved).
Thanks!
Can this not make it into 0.3.5, it being the next LTS?
I'm not sure.
This code is a major change on directory authorities, and it hasn't had much testing at all.
Only a few relay operators are affected, but there hasn't been any warning about the change.
Also, LTS isn't used for directory authorities:
Some features and configurations are currently only maintained in the two most recent stable releases, either because they are less mature than the rest of Tor, because there are relatively few users who need them, or for some other reason:Running a directory authority.…
LGTM, but was missing tests for the new no_bom function. Other than that, looks sane and simple.
I added some test in my branch unicode-descriptors-dirauth2.
Also check out the PR: https://github.com/torproject/tor/pull/464
(all new patches should come with a github PR).