Add UTF-8 validation unit tests (#32845) · Issues · The Tor Project / Core / Tor

Add UTF-8 validation unit tests

We should add unit tests for the following UTF-8 sequences. Their validity varies between different programming languages. We should go with the common case (if it matches the standard). Invalid: surrogate nullsurrog threehigh EDA081 3000EDA081 EDBFBF fourhigh fivebyte sixbyte sixhigh F490BFBF FB80808080 FD80808080 FDBFBFBFBF Valid: fourbyte fourbyte2 F0908D88 F0BFBFBF Valid in the Unicode standard, invalid in torrcs and directory documents: nullbyte 3031320033 See proposal 285 for details, and for the null byte exception: https://gitweb.torproject.org/torspec.git/tree/proposals/285-utf-8.txt Test Case Source: `POC||GTFO 19`, page 43 https://www.alchemistowl.org/pocorgtfo/

issue