i18n support
This MR adds i18n support using gettext. The thought process is to pass the language to callback queries and on each one set it at the start.
How does the bot handle the languages?
Apart from the regular textdomain binding, on i18n.py
, you'll see two private functions.
_get_available_langs
: Using gettext#find
with babel, we get a list of the available/translated languages
_get_full_names
: Using the above function, we create a list of all the locales mapped as {locale: { full_name, translation }}. full_name
is the locale name it's own language (using babel) (eg. el_GR => Ελληνικά) and translation
is a gettext#translation
(with a fallback).
Both of these functions are quite expensive, so we only execute them once by creating a variable available_locales
.
get_translation
: The "public" function that returns the correct gettext#translation
using available_locales
(and a param).
On bot.py
, we run i18n.setup_gettext()
on init and replace the start command with a new one. On the new start command, we set the shorthand to the gettext#translation
of the Telegram user provided language (even if it's unreliable - gettext should handle it) _ = i18n.get_translation(user.language_code)
.
We then return a list of all available languages using the available_locales
variable (sorted by user.language_code
, so if eg. Telegram returned that the user speaks Greek, Greek would also be the first button in the list as it's most likely the one they probably want).
Now all callback queries get a suffix of :{locale}
which they parse to translate correctly.
How do translations get compiled and bundled with setup.py?
On setup, a new parameter got added, data_files
, that calls create_catalogs
.
That function goes through the locales
folder, finds all the .po
files, compiles them using msgfmt
(from host) and returns an acceptable formatted list of them.
Creating an archive python setup.py sdist
shows the folder correctly bundled.
Additional changes
There are two additional changes that might deserve their own MRs but
-
Harden the regex filters: Filters were too loose, someone could provide less arguments and the bot would crash (eg. Index out of bounds if platform is missing). Additionally, (this is a speculation) someone could run multiple commands at the same time and "DOS" the bot. They are now hardened eg.
^download_tor:[^:]+:[^:]+:[^:]+$
, which allows onlydownload_tor:something:something:something
and nothing else. -
Answer callback on wrong data: On a previous MR, I made it return None when a wrong platform or locale was provided. I now changed that to
await client.answer_callback_query(callback.id)
so telegram doesn't show the little clock icon.
Gettext
I'll include some gettext info:
POTFILES & Generating osbtg.pot
.pot is the "template" file for all the translations. You can add files to be included in the string search in the POTFILES file (one per line).
Then using xgettext -L Python --files-from=locales/POTFILES --output=locales/onionsproutsbot.pot
, the file will automatically be generated with all the strings that need translations extracted.
Marking for & Translating strings
There's a shorthand used, an underscore, that translates strings. All you have to do is wrap it, eg. _("What is Tor?"). However, when formatting strings, you can't use f"Hey {name}"
, as by the time it gets formatted, is no longer something gettext knows about. Instead, fall back to the other formatting option, _("Hey %s") % "Devon"
. This way, gettext first translates "Hey %s" and the formatting happens afterwards.
Other info
Gettext supports a number of language quirks, like plurals. The way get_translation
is set up at the moment, it only calls gettext.gettext
since I didn't find anything requiring plurals. If needed in the future, I'll add support for it.
Lastly, gettext files have some other stuff going on (binary translations, additional files, creating new entries etc.). I've written about some of them on ugcg/gettext even if it's mostly GTK flavored. You probably don't have to worry about creating new entries as Weblate or other translation frontends will probably handle them for you.
What else can be done?
titles.locales
can theoretically be replaced with babel, similarly to _get_available_langs()
, instead of manually maintaining the list.
What I didn't test
I installed osbtg with --user
, I'm unsure how the locales will play out on root.
fixes: #9 (closed)