Skip to content

i18n support

GeopJr requested to merge GeopJr/onionsproutsbot:i18n into rewrite

This MR adds i18n support using gettext. The thought process is to pass the language to callback queries and on each one set it at the start.

How does the bot handle the languages?

Apart from the regular textdomain binding, on i18n.py, you'll see two private functions.

_get_available_langs: Using gettext#find with babel, we get a list of the available/translated languages

_get_full_names: Using the above function, we create a list of all the locales mapped as {locale: { full_name, translation }}. full_name is the locale name it's own language (using babel) (eg. el_GR => Ελληνικά) and translation is a gettext#translation (with a fallback).

Both of these functions are quite expensive, so we only execute them once by creating a variable available_locales.

get_translation: The "public" function that returns the correct gettext#translation using available_locales (and a param).

On bot.py, we run i18n.setup_gettext() on init and replace the start command with a new one. On the new start command, we set the shorthand to the gettext#translation of the Telegram user provided language (even if it's unreliable - gettext should handle it) _ = i18n.get_translation(user.language_code).

We then return a list of all available languages using the available_locales variable (sorted by user.language_code, so if eg. Telegram returned that the user speaks Greek, Greek would also be the first button in the list as it's most likely the one they probably want).

Now all callback queries get a suffix of :{locale} which they parse to translate correctly.

How do translations get compiled and bundled with setup.py?

On setup, a new parameter got added, data_files, that calls create_catalogs. That function goes through the locales folder, finds all the .po files, compiles them using msgfmt (from host) and returns an acceptable formatted list of them.

Creating an archive python setup.py sdist shows the folder correctly bundled.

Additional changes

There are two additional changes that might deserve their own MRs but 🤷.

  • Harden the regex filters: Filters were too loose, someone could provide less arguments and the bot would crash (eg. Index out of bounds if platform is missing). Additionally, (this is a speculation) someone could run multiple commands at the same time and "DOS" the bot. They are now hardened eg. ^download_tor:[^:]+:[^:]+:[^:]+$, which allows only download_tor:something:something:something and nothing else.

  • Answer callback on wrong data: On a previous MR, I made it return None when a wrong platform or locale was provided. I now changed that to await client.answer_callback_query(callback.id) so telegram doesn't show the little clock icon.

Gettext

I'll include some gettext info:

POTFILES & Generating osbtg.pot

.pot is the "template" file for all the translations. You can add files to be included in the string search in the POTFILES file (one per line).

Then using xgettext -L Python --files-from=locales/POTFILES --output=locales/onionsproutsbot.pot, the file will automatically be generated with all the strings that need translations extracted.

Marking for & Translating strings

There's a shorthand used, an underscore, that translates strings. All you have to do is wrap it, eg. _("What is Tor?"). However, when formatting strings, you can't use f"Hey {name}", as by the time it gets formatted, is no longer something gettext knows about. Instead, fall back to the other formatting option, _("Hey %s") % "Devon". This way, gettext first translates "Hey %s" and the formatting happens afterwards.

Other info

Gettext supports a number of language quirks, like plurals. The way get_translation is set up at the moment, it only calls gettext.gettext since I didn't find anything requiring plurals. If needed in the future, I'll add support for it.

Lastly, gettext files have some other stuff going on (binary translations, additional files, creating new entries etc.). I've written about some of them on ugcg/gettext even if it's mostly GTK flavored. You probably don't have to worry about creating new entries as Weblate or other translation frontends will probably handle them for you.

What else can be done?

titles.locales can theoretically be replaced with babel, similarly to _get_available_langs(), instead of manually maintaining the list.

What I didn't test

I installed osbtg with --user, I'm unsure how the locales will play out on root.

fixes: #9 (closed)

Edited by GeopJr

Merge request reports