I am working on a rather multilingual project in the office currently. And, as always, we tried a few alternatives before ending up with gettext again. For those of you who don’t know, gettext is the de facto standard for managing language translations in software, especially when it comes to messages and user interface elements. It’s a nice, powerful system but it’s a bit awkward when things come to web development.
Anyways, we started using it in a bit of a rush, without doing all the necessary planning, and quite soon ended up in a bit of a mess. Different people used different editors to update translations. And each person’s environment was setup in a different way. All that made its way into the PO files that hold translations. More so, we didn’t really define the procedure for the updates of translations. That became a bigger problem when we realized that Arabic has only 50 translated strings, while English has 220, and Chinese 350. All languages were supposed to have exactly the same amount of strings, even if the actual translations were missing.
So today I had to rethink and redefine how we do it. First of all, I had to figure out and try the process outside of the project. It took me a good couple of hours to brush up my gettext knowledge and find some useful documentation online. Here is a very helpful article that got me started.
After reading the article, a few manuals and playing with the actual commands, I decided on the following:
- The source of all translations will be a single POT file. This file will be completely dropped and regenerated every time any strings are updated in the source code.
- Each language will have a PO file of its own. However, the strings for the language won’t be extracted from the source code, but from the common POT file.
- All editors will use current project folder as the primary path. In other words, “.” instead of full path to “/var/www/foobar”. This will make all file references in PO/POT files point to a relative location to the project folder, ignoring the specifics of each contributor’s setup.
- Updating language template files (PO) and building of MO files will be a part of the project build/deploy script, to make sure everything stays as up to date as possible.
Now for the actual code. Here is the shell script that does the job. (Here is a link to the Gist, just in case I’ll update it in the future.)
#!/bin/bash
DOMAIN="project_tag"
POT="$DOMAIN.pot"
LANGS="en_US ru_RU"
SOURCES="*.php"
# Create template
echo "Creating POT"
rm -f $POT
xgettext \
--copyright-holder="2012 My Company Ltd" \
--package-name="Project Name" \
--package-version="1.0" \
--msgid-bugs-address="translations@company.com" \
--language=PHP \
--sort-output \
--keyword=__ \
--keyword=_e \
--from-code=UTF-8 \
--output=$POT \
--default-domain=$DOMAIN \
$SOURCES
# Create languages
for LANG in $LANGS
do
if [ ! -e "$LANG.po" ]
then
echo "Creating language file for $LANG"
msginit --no-translator --locale=$LANG.UTF-8 --output-file=$LANG.po --input=$POT
fi
echo "Updating language file for $LANG from $POT"
msgmerge --sort-output --update --backup=off $LANG.po $POT
echo "Converting $LANG.po to $LANG.mo"
msgfmt --check --verbose --output-file=$LANG.mo $LANG.po
done
Now, all you need to do is run the script once to get the default POT file and a PO file for every language. You can edit PO files with translations for as much as you want. Then simply run the script again and it will update generated MO files. No parameters, no manuals, no nothing. If you need to add another language, just put the appropriate locale in the $LANGS variable and run the script again. You are good to go.
Enjoy!