practracker.py codec exception in some locales
practracker.py, implemented in #29221 (moved), seems to have a locale dependency when python3 is being used. If the locale isn't a UTF-8 locale, UTF-8 characters in sources can result in an exception:
$ LANG=en_US.US-ASCII make check-best-practices PYTHON=python
python ../scripts/maint/practracker/practracker.py ..
mirkwood:build-norust tlyu$ LANG=en_US.US-ASCII make check-best-practices
python3 ../scripts/maint/practracker/practracker.py ..
Traceback (most recent call last):
File "../scripts/maint/practracker/practracker.py", line 151, in <module>
main()
File "../scripts/maint/practracker/practracker.py", line 134, in main
found_new_issues = consider_all_metrics(files_list)
File "../scripts/maint/practracker/practracker.py", line 89, in consider_all_metrics
found_new_issues |= consider_metrics_for_file(fname, f)
File "../scripts/maint/practracker/practracker.py", line 104, in consider_metrics_for_file
found_new_issues |= consider_file_size(fname, f)
File "../scripts/maint/practracker/practracker.py", line 51, in consider_file_size
file_size = metrics.get_file_len(f)
File "/Users/tlyu/src/tor/scripts/maint/practracker/metrics.py", line 11, in get_file_len
for i, l in enumerate(f):
File "/Users/tlyu/src/brew/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)
make: *** [check-best-practices] Error 1
I'm also seeing this on gitlab.com CI, but I don't know offhand what its locale environment variables are.
We might want to use the encoding=
keyword parameter to open()
, but I think that would no longer be python2 compatible.