Enable -Wnormalized=nfkc when available to avoid source code identifier confusion
In https://people.torproject.org/~nickm/warnings.html , nickm asks:
We use -Wnormalized=id now; should we switch?
Yes, we should switch to -Wnormalized=nfkc
, as a precaution against patches that are submitted with similar-looking characters. Ideally, we would use -Wnormalized=ban-unicode-in-identifiers
, but that's not something gcc has implemented yet.
From https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Some characters in ISO 10646 have distinct meanings but look identical in some fonts or display methodologies, especially once formatting has been applied. For instance \u207F, “SUPERSCRIPT LATIN SMALL LETTER N”, displays just like a regular n that has been placed in a superscript. ISO 10646 defines the NFKC normalization scheme to convert all these into a standard form as well, and GCC warns if your code is not in NFKC if you use -Wnormalized=nfkc. This warning is comparable to warning about every identifier that contains the letter O because it might be confused with the digit 0, and so is not the default, but may be useful as a local coding convention if the programming environment cannot be fixed to display these characters distinctly.
clang hasn't implemented -Wnormalized yet: