Compass' command-line script can't encode unicode characters

Today I found that tail and less are unhappy about the task legacy/trac#6329 (moved) script printing out unicode characters. When piping its output into tail or less, the script exits with a traceback. When writing to stdout directly, Python is happy.

Here's how to reproduce the problem:

  • Clone the metrics-tasks repository.

  • Navigate to the legacy/trac#6329 (moved) script and make it download required data: cd task-6329/; ./tor-relays-stats.py -d

  • Find a unicode character in an AS name: grep -B1 "as_name.*\\\\u" details.json

  • Display relays in that AS, e.g. AS28548: ./tor-relays-stats.py -i -a 28548 | tail

Python should print out the following traceback:

Traceback (most recent call last):
  File "./tor-relays-stats.py", line 197, in <module>
    short=70 if options.short else None)
  File "./tor-relays-stats.py", line 110, in print_groups
    print formatted_group[:short]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 144: ordinal not in range(128)

I found that a possible solution is to replace all Unicode characters with '?'s, but that doesn't seem very elegant:

-                              exit, guard, country, as_number, as_name)
+                              exit, guard, country, as_number, as_name.encode('ascii', 'replace'))

Are there better solutions?