GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

Commit 3d3ed213 authored by Hiro's avatar Hiro 🏄

Move packages to lego

parent ce972d30
lego @ 55784cf5
Subproject commit a389bf68b178ffcbd7b9005c322febf312ddbb8c
Subproject commit 55784cf553a4dbac0ef1bb49e33be6d0b23c91f1
lego/packages
\ No newline at end of file
This diff is collapsed.
This plugin enables a smarter way to translate a Lektor static website using old-good PO files. So you can use your beloved translation processes and tools.
See README.md
# Lektor i18n plugin
This plugin enables a smarter way to translate a [Lektor](http://getlektor.com) static website using old-good PO files. So you can use your beloved translation processes and tools.
## Principles
The idea of this plugin is to capture the **sentences** or **paragraphs** from your **content** and **templates**, and populate a standard *Gettext* [PO file](https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html). Using usual tools, user can translate these files, very easily. Then the plugin will merge the translations into new [_alternative_](https://www.getlektor.com/docs/content/alts/) content files, providing a translated website.
## Configuration
### Configuration file
#### `configs/i18n.ini`
content = en
translations = fr,es,it
i18npath = i18n
translate_paragraphwise = False
Where :
* `content` is the language used to write `contents.lr` files (default is `en`)
* `translations` is the list of target languages (you want to translate into).
* `i18npath` is the directory where translation files will be produced/stored (default is `i18n`). This directory needs to be relative to root path.
* `translate_paragraphwise` specifies whether translation strings are created per line or per paragraph. The latter is helpful for documents wrapping texts at 80 character boundaries. It is set to `False` by default.
#### `babel.cfg`
If you plan to localise your templates as well, you can use
`{{ _("some string") }}` in your templates. To make this work, pybabel should be installed (pip install pybabel; maybe pip3). A `babel.cfg` also has to exist in your project root with this content:
[jinja2: **/templates/**.html]
encoding = utf-8
extensions=jinja2.ext.autoescape,jinja2.ext.with_
### Translatable fields
In order for a field to be marked as translatable, an option has to be set in the field definition. Both blocks and flowblocks fields are subjects to translations.
in `flowblocks/*.ini` and/or `models/*.ini`, mark a field as translatable with :
[model]
name = Page
label = {{ this.title }}
[fields.title]
label = Title
type = string
translate = True
[fields.body]
label = Body
type = markdown
translate = True
Both `title` and `body` are now translatable. It means that during the parsing phase, all sentences from `title` or `body` fields from the `contents.lr` files with `Page` model will populate the collected PO file.
Another flowblock example:
[block]
name = Section Block
button_label = Section
[fields.title]
label = Title
type = string
translate = True
[fields.body]
label = Body
type = markdown
translate = True
[fields.image]
label = Image
type = select
source = record.attachments.images
[fields.image_position]
label = Image Position
type = select
choices = left, right
choice_labels = Left, Right
default = right
Here again, `body` and `title` will be translated. But `image` and `image_position` won't.
### Non-english content
Thanx to a limitation of msginit it's not so easy to translate a website with default language set to anything but english.
So if your default content language is not english, you will have to edit the first `contents-en.po` file and remove the translations (by hand ?)...
## Installation
### Prerequisites
#### Lektor
This plugin has been tested with `Lektor 3..0.x`.
#### GetText
Both Gettext and Pybabel are required. For a Debian/Ubuntu system, this means a simple :
sudo apt-get install gettext python3-babel
On macOS, use a decent package manager, like MacPorts or Homebrew. With Homebrew:
brew install gettext
and then pip to fetch pybabel:
pip install babel
### Installation
Very straightforward :
$ lektor plugins add lektor-i18n
Verify installation with a simple :
$ lektor plugins list
...
lektor-i18n (version 0.1)
...
## Usage
The translation mechanism is hooked into the build system. So translating a website just means building the website.
$ lektor build
On first call, a new `i18n` directory (can be changed in configuration file) will be created on top the lektor tree.
This directory will be populated with a single `contents.pot` file, compiling all the sentences found by the plugin. The list of fields eligible to translation is configured in the models/flows definition with `translate=True` added to each field.
For each translation language (still from the configuration file), a `content-<language>.po` file will be created/updated. These are the files that need to be translated with your prefered tool (like [POEdit](http://poedit.net) or [Transifex](http://transifex.com)).
All translation files (`contents-*.po`) are then compiled and merged with the original `contents.lr` files to produce all the `contents-<language>.lr` files in their respective directories.
Due to the way Lektor building system is designed, all these steps happen on every build. This means that sometime, after translating the `contents-*.po` files, it might be required to run the build system twice to see the translation appear in the final HTML files.
### Project file
It's still the user responsability to modify the project file in order to include the expected languages :
[alternatives.en]
name = English
primary = yes
locale = en_US
[alternatives.fr]
name = French
url_prefix = /fr/
locale = fr
See [Lektor Documentation](https://www.getlektor.com/docs/content/alts/) for more information.
## Support
This plugin is provided as-is by [NumeriCube](http://numericube.com) a human-sized Paris-based company prodiving tailored services to smart customers.
We will be happy to try to help you with this plugin if need. Just file an issue on our [GitHub account](https://gihub.com/numericube/lektor-i18n-plugin/).
# -*- coding: utf-8 -*-
#pylint: disable=wrong-import-position
import sys
PY3 = sys.version_info > (3,)
import collections
import datetime
import gettext
import os
from os.path import relpath, join, exists, dirname
from pprint import PrettyPrinter
import re
import tempfile
import time
if PY3:
from urllib.parse import urljoin
else:
from urlparse import urljoin
from lektor.pluginsystem import Plugin
from lektor.db import Page
from lektor.metaformat import tokenize
from lektor.reporter import reporter
from lektor.types.flow import FlowType, process_flowblock_data
from lektor.utils import portable_popen, locate_executable
from lektor.environment import PRIMARY_ALT
from lektor.filecontents import FileContents
from lektor.context import get_ctx
command_re = re.compile(r'([a-zA-Z0-9.-_]+):\s*(.*?)?\s*$')
# derived from lektor.types.flow but allows more dash signs
block2re = re.compile(r'^###(#+)\s*([^#]*?)\s*###(#+)\s*$')
POT_HEADER = """msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\\n"
"Report-Msgid-Bugs-To: \\n"
"POT-Creation-Date: %(NOW)s\\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\\n"
"Language-Team: %(LANGUAGE)s <LL@li.org>\\n"
"Language: %(LANGUAGE)s\\n"
"MIME-Version: 1.0\\n"
"Content-Type: text/plain; charset=UTF-8\\n"
"Content-Transfer-Encoding: 8bit\\n"
"""
# python2/3 compatibility layer
encode = lambda s: (s if PY3 else s.encode('UTF-8'))
def trans(translator, s):
"""Thin gettext translation wrapper to allow compatibility with both Python2
and 3."""
if PY3:
return translator.gettext(s)
else:
return translator.ugettext(s)
def truncate(s, length=32):
return (s[:length] + '..') if len(s) > length else s
#pylint: disable=too-few-public-methods,redefined-variable-type
class TemplateTranslator(object):
def __init__(self, i18npath):
self.i18npath = i18npath
self.__lastlang = None
self.translator = None
self.init_translator()
def init_translator(self):
ctx = get_ctx()
if not ctx:
self.translator = gettext.GNUTranslations()
return super(TemplateTranslator, self).__init__()
if not self.__lastlang == ctx.locale:
self.__lastlang = ctx.locale
self.translator = gettext.translation("contents",
join(self.i18npath, '_compiled'),
languages=[ctx.locale], fallback=True)
def gettext(self, x):
self.init_translator() # lagnuage could have changed
return self.translator.gettext(x)
def ngettext(self, *x):
self.init_translator()
return self.translator.ngettext(*x)
class Translations():
"""Memory of translations"""
def __init__(self):
# dict like {'text' : ['source1', 'source2',...],}
self.translations = collections.OrderedDict()
def add(self, text, source):
if not text in self.translations.keys():
self.translations[text]=[]
reporter.report_debug_info('added to translation memory : ', truncate(text))
if not source in self.translations[text]:
self.translations[text].append(source)
def __repr__(self):
return PrettyPrinter(2).pformat(self.translations)
def as_pot(self, content_language):
"""returns a POT version of the translation dictionnary"""
now = datetime.datetime.now().strftime('%Y-%m-%d %H:%M')
now += '+%s'%(time.tzname[0])
result = POT_HEADER % {'LANGUAGE' : content_language, 'NOW' : now}
for msg, paths in self.translations.items():
result += "#: %s\n"%" ".join(paths)
for token, repl in {'\n': '\\n', '\t': '\\t', '"': '\\"'}.items():
msg = msg.replace(token, repl)
result+='msgid "%s"\n' % msg
result+='msgstr ""\n\n'
return result
def write_pot(self, pot_filename, language):
if not os.path.exists(os.path.dirname(pot_filename)):
os.makedirs(os.path.dirname(pot_filename))
with open(pot_filename,'w') as f:
f.write(encode(self.as_pot(language)))
def merge_pot(self, from_filenames, to_filename):
msgcat=locate_executable('msgcat')
cmdline=[msgcat, "--use-first"]
cmdline.extend(from_filenames)
cmdline.extend(("-o", to_filename))
reporter.report_debug_info('msgcat cmd line', cmdline)
portable_popen(cmdline).wait()
def parse_templates(self, to_filename):
pybabel=locate_executable('pybabel')
cmdline=[pybabel, 'extract', '-F', 'babel.cfg', "-o", to_filename, "./"]
reporter.report_debug_info('pybabel cmd line', cmdline)
portable_popen(cmdline).wait()
translations = Translations() # let's have a singleton
class POFile():
FILENAME_PATTERN = "contents+%s.po"
def __init__(self, language, i18npath):
self.language=language
self.i18npath=i18npath
def _exists(self):
"""Returns True if <language>.po file exists in i18npath"""
filename=self.FILENAME_PATTERN%self.language
return exists( join(self.i18npath, filename) )
def _msg_init(self):
"""Generates the first <language>.po file"""
msginit=locate_executable('msginit')
cmdline=[msginit, "-i", "contents.pot", "-l", self.language, "-o", self.FILENAME_PATTERN%self.language, "--no-translator"]
reporter.report_debug_info('msginit cmd line', cmdline)
portable_popen(cmdline, cwd=self.i18npath).wait()
def _msg_merge(self):
"""Merges an existing <language>.po file with .pot file"""
msgmerge=locate_executable('msgmerge')
cmdline=[msgmerge, self.FILENAME_PATTERN%self.language, "contents.pot", "-U", "-N", "--backup=simple"]
reporter.report_debug_info('msgmerge cmd line', cmdline)
portable_popen(cmdline, cwd=self.i18npath).wait()
def _prepare_locale_dir(self):
"""Prepares the i18n/<language>/LC_MESSAGES/ to store the .mo file ; returns the dirname"""
directory = join('_compiled',self.language, "LC_MESSAGES")
try:
os.makedirs(join(self.i18npath, directory))
except OSError:
pass # already exists, no big deal
return directory
def _msg_fmt(self, locale_dirname):
"""Compile an existing <language>.po file into a .mo file"""
msgfmt=locate_executable('msgfmt')
cmdline=[msgfmt, self.FILENAME_PATTERN%self.language, "-o", join(locale_dirname,"contents.mo")]
reporter.report_debug_info('msgfmt cmd line', cmdline)
portable_popen(cmdline, cwd=self.i18npath).wait()
def generate(self):
if self._exists():
self._msg_merge()
else:
self._msg_init()
locale_dirname=self._prepare_locale_dir()
self._msg_fmt(locale_dirname)
def line_starts_new_block(line, prev_line):
"""Detect a new block in a lektor document. Blocks are delimited by a line
containing 3 or more dashes. This actually matches the definition of a
markdown level 2 heading, so this function returns False if no colon was
found in the line before, so if it isn't a new block with a key: value pair
before."""
if not prev_line or ':' not in prev_line:
return False # could be a markdown heading
line = line.strip()
return line == u'-' * len(line) and len(line) >= 3
def split_paragraphs(document):
if isinstance(document, (list, tuple)):
document = ''.join(document) # list of lines
return re.split('\n(?:\\s*\n){1,}', document)
# We cannot check for unused arguments here, they're mandated by the plugin API.
#pylint:disable=unused-argument
class I18NPlugin(Plugin):
name = u'i18n'
description = u'Internationalisation helper'
#pylint: disable=attribute-defined-outside-init
def on_setup_env(self):
"""Setup `env` for the plugin"""
# Read configuration
self.enabled = self.get_config().get('enable', 'true') in ('true','True','1')
if not self.enabled:
reporter.report_generic('I18N plugin disabled in configs/i18n.ini')
self.i18npath = self.get_config().get('i18npath', 'i18n')
self.url_prefix = self.get_config().get('url_prefix', 'http://localhost/')
# whether or not to use a pargraph as smallest translatable unit
self.trans_parwise = self.get_config().get('translate_paragraphwise',
'false') in ('true','True','1')
self.content_language=self.get_config().get('content', 'en')
self.env.jinja_env.add_extension('jinja2.ext.i18n')
self.env.jinja_env.policies['ext.i18n.trimmed'] = True # do a .strip()
self.env.jinja_env.install_gettext_translations(TemplateTranslator(self.i18npath))
# ToDo: is this stil required
try:
self.translations_languages=self.get_config().get('translations').replace(' ','').split(',')
except AttributeError:
raise RuntimeError('Please specify the "translations" configuration option in configs/i18n.ini')
if not self.content_language in self.translations_languages:
self.translations_languages.append(self.content_language)
def process_node(self, fields, sections, source, zone, root_path):
"""For a give node (), identify all fields to translate, and add new
fields to translations memory. Flow blocks are handled recursively."""
for field in fields:
if ('translate' in field.options) \
and (source.alt in (PRIMARY_ALT, self.content_language)) \
and (field.options['translate'] in ('True', 'true', '1', 1)):
if field.name in sections.keys():
section = sections[field.name]
# if blockwise, each paragraph is one translatable message,
# otherwise each line
chunks = (split_paragraphs(section) if self.trans_parwise
else [x.strip() for x in section if x.strip()])
for chunk in chunks:
translations.add(chunk.strip('\r\n'),
"%s (%s:%s.%s)" % (
urljoin(self.url_prefix, source.url_path),
relpath(source.source_filename, root_path),
zone, field.name)
)
if isinstance(field.type, FlowType):
if sections.has_key(field.name):
section = sections[field.name]
for blockname, blockvalue in process_flowblock_data("".join(section)):
flowblockmodel = source.pad.db.flowblocks[blockname]
blockcontent=dict(tokenize(blockvalue))
self.process_node(flowblockmodel.fields, blockcontent, source, blockname, root_path)
def __parse_source_structure(self, lines):
"""Parse structure of source file. In short, there are two types of
chunks: those which need to be translated ('translatable') and those
which don't ('raw'). "title: test" could be split into:
[('raw': 'title: ',), ('translatable', 'test')]
NOTE: There is no guarantee that multiple raw blocks couldn't occur and
in fact due to implementation details, this actually happens."""
blocks = []
count_lines_block = 0 # counting the number of lines of the current block
is_content = False
prev_line = None
for line in lines:
stripped_line = line.strip()
if not stripped_line: # empty line
blocks.append(('raw', '\n'))
continue
# line like "---*" or a new block tag
if line_starts_new_block(stripped_line, prev_line) or \
block2re.search(stripped_line):
count_lines_block=0
is_content = False
blocks.append(('raw', line))
else:
count_lines_block+=1
match = command_re.search(stripped_line)
if count_lines_block==1 and not is_content and match: # handle first line, while not in content
key, value = match.groups()
blocks.append(('raw', encode(key) + ':'))
if value:
blocks.append(('raw', ' '))
blocks.append(('translatable', encode(value)))
blocks.append(('raw', '\n'))
else:
is_content=True
if is_content:
blocks.append(('translatable', line))
prev_line = line
# join neighbour blocks of same type
newblocks = []
for type, data in blocks:
if len(newblocks) > 0 and newblocks[-1][0] == type: # same type, merge
newblocks[-1][1] += data
else:
newblocks.append([type, data])
return newblocks
def on_before_build(self, builder, build_state, source, prog):
"""Before building a page, produce all its alternatives (=translated pages)
using the gettext translations available."""
if self.enabled and isinstance(source,Page) and source.alt in (PRIMARY_ALT, self.content_language):
contents = None
for fn in source.iter_source_filenames():
try:
contents=FileContents(fn)
except IOError:
pass # next
for language in self.translations_languages:
translator = gettext.translation("contents",
join(self.i18npath,'_compiled'), languages=[language], fallback = True)
translated_filename = join(dirname(source.source_filename),
"contents+%s.lr"%language)
with contents.open(encoding='utf-8') as file:
chunks = self.__parse_source_structure(file.readlines())
with open(translated_filename,"w") as f:
for type, content in chunks: # see __parse_source_structure
if type == 'raw':
f.write(content)
elif type == 'translatable':
if self.trans_parwise: # translate per paragraph
f.write(self.__trans_parwise(content,
translator))
else:
f.write(self.__trans_linewise(content,
translator))
else:
raise RuntimeError("Unknown chunk type detected, this is a bug")
def __trans_linewise(self, content, translator):
"""Translate the chunk linewise."""
lines = []
for line in content.split('\n'):
line_stripped = line.strip()
trans_stripline = trans(translator, line_stripped) # trnanslate the stripped version
# and re-inject the stripped translation into original line (not stripped)
lines.append(line.replace(line_stripped,
trans_stripline, 1))
return '\n'.join(lines)
def __trans_parwise(self, content, translator):
"""Extract translatable strings block-wise, query for translation of
block and re-inject result."""
result = []
for paragraph in split_paragraphs(content):
stripped = paragraph.strip('\n\r')
paragraph = paragraph.replace(stripped, trans(translator,
stripped))
result.append(paragraph)
return '\n\n'.join(result)
def on_after_build(self, builder, build_state, source, prog):
if self.enabled and isinstance(source,Page):
try:
text = source.contents.as_text()
except IOError:
pass
else:
fields = source.datamodel.fields
sections = dict(tokenize(text.splitlines())) # {'sectionname':[list of section texts]}
self.process_node(fields, sections, source, source.datamodel.id, builder.env.root_path)
def on_before_build_all(self, builder, **extra):
if self.enabled:
reporter.report_generic("i18n activated, with main language %s"% self.content_language )
templates_pot_filename = join(tempfile.gettempdir(), 'templates.pot')
reporter.report_generic("Parsing templates for i18n into %s" \
% relpath(templates_pot_filename, builder.env.root_path))
translations.parse_templates(templates_pot_filename)
def on_after_build_all(self, builder, **extra):
"""Once the build process is over :
- write the translation template `contents.pot` on the filesystem,
- write all translation contents+<language>.po files """
if not self.enabled:
return
contents_pot_filename = join(builder.env.root_path, self.i18npath, 'contents.pot')
pots = [contents_pot_filename,
join(tempfile.gettempdir(), 'templates.pot'),
join(builder.env.root_path, self.i18npath, 'plugins.pot')]
# write out contents.pot from web site contents
translations.write_pot(pots[0], self.content_language)
reporter.report_generic("%s generated" % relpath(pots[0],
builder.env.root_path))
pots = [p for p in pots if os.path.exists(p) ] # only keep existing ones
if len(pots) > 1:
translations.merge_pot(pots, contents_pot_filename)
reporter.report_generic("Merged POT files %s" % ', '.join(
relpath(p, builder.env.root_path) for p in pots))
for language in self.translations_languages:
po_file=POFile(language, self.i18npath)
po_file.generate()
from setuptools import setup