I am working on a rather multilingual project in the office currently. Â And, as always, we tried a few alternatives before ending up with gettext again. Â For those of you who don’t know, gettext is the de facto standard for managing language translations in software, especially when it comes to messages and user interface elements. Â It’s a nice, powerful system but it’s a bit awkward when things come to web development.
Anyways, we started using it in a bit of a rush, without doing all the necessary planning, and quite soon ended up in a bit of a mess. Â Different people used different editors to update translations. Â And each person’s environment was setup in a different way. Â All that made its way into the PO files that hold translations. Â More so, we didn’t really define the procedure for the updates of translations. Â That became a bigger problem when we realized that Arabic has only 50 translated strings, while English has 220, and Chinese 350. Â All languages were supposed to have exactly the same amount of strings, even if the actual translations were missing.
So today I had to rethink and redefine how we do it. Â First of all, I had to figure out and try the process outside of the project. Â It took me a good couple of hours to brush up my gettext knowledge and find some useful documentation online. Â Here is a very helpful article that got me started.
After reading the article, a few manuals and playing with the actual commands, I decided on the following:
- The source of all translations will be a single POT file. Â This file will be completely dropped and regenerated every time any strings are updated in the source code.
- Each language will have a PO file of its own. However, the strings for the language won’t be extracted from the source code, but from the common POT file.
- All editors will use current project folder as the primary path. Â In other words, “.” instead of full path to “/var/www/foobar”. Â This will make all file references in PO/POT files point to a relative location to the project folder, ignoring the specifics of each contributor’s setup.
- Updating language template files (PO) and building of MO files will be a part of the project build/deploy script, to make sure everything stays as up to date as possible.
Now for the actual code.  Here is the shell script that does the job. (Here is a link to the Gist, just in case I’ll update it in the future.)
#!/bin/bash
DOMAIN="project_tag"
POT="$DOMAIN.pot"
LANGS="en_US ru_RU"
SOURCES="*.php"
# Create template
echo "Creating POT"
rm -f $POT
xgettext \
--copyright-holder="2012 My Company Ltd" \
--package-name="Project Name" \
--package-version="1.0" \
--msgid-bugs-address="translations@company.com" \
--language=PHP \
--sort-output \
--keyword=__ \
--keyword=_e \
--from-code=UTF-8 \
--output=$POT \
--default-domain=$DOMAIN \
$SOURCES
# Create languages
for LANG in $LANGS
do
if [ ! -e "$LANG.po" ]
then
echo "Creating language file for $LANG"
msginit --no-translator --locale=$LANG.UTF-8 --output-file=$LANG.po --input=$POT
fi
echo "Updating language file for $LANG from $POT"
msgmerge --sort-output --update --backup=off $LANG.po $POT
echo "Converting $LANG.po to $LANG.mo"
msgfmt --check --verbose --output-file=$LANG.mo $LANG.po
done
Now, all you need to do is run the script once to get the default POT file and a PO file for every language. Â You can edit PO files with translations for as much as you want. Â Then simply run the script again and it will update generated MO files. Â No parameters, no manuals, no nothing. Â If you need to add another language, just put the appropriate locale in the $LANGS variable and run the script again. Â You are good to go.
Enjoy!