I’m currently writing a multilingual application using Zend framework. This framework is very great, it’s well thought and use most of industry’s best practices.
To handle translations, well I’ve used Zend_Translate component, and to go into further details I’ve used Xliff adapter.
I am a complete newbie regarding translation management, so my analysis can be completely wrong. But so far, I am not comfortable at all with Zend_Translate approach.
Below is Zend approach:
// we create $translate instance
$translate = new Zend_Translate('xliff', '/my/path/source-de.mo', 'de');
$translate->addTranslation('//my/path/fr-source.mo', 'fr');
// later in our view script
print $translate->_("Here is line one !");
and here is how it works (taken from online documentation):
You always must imagine how translation works. First Zend_Translate looks if the set language has a translation for the given message id or string. If no translation string has been found it refers to the next lower language as defined within Zend_Locale. So “de_AT” becomes “de” only. If there is no translation found for “de” either, then the original message is returned. This way you always have an output, in case the message translation does not exist in your message storage. Zend_Translate never throws an error or exception when translating strings.
To me, there are two problems with this approach.
-
doublons within source language
let’s say I did a mistake in one of my translations, to correct it I must do as follow:
- update content within the translation file
now let’s see the process to correct a mistake we would have made within source language (In the above example, there’s an extra space before exclamation mark). To correct it, we must do as follow:
- update content within source file
- update source-entry within each translation file (such as ‘fr’, ‘es’, ‘de’, ‘cn’…)
- look within all your php scripts where you call $translate to translate this content
Step 1 is just necessary, as we did previously to correct translated content, step 2 is done automatically using translation tools (such as Pootle for instance) and is logical, source has changed so all translation-files must update their source content so that every language is synchronized with source language.
But step 3 is not normal. Basically correcting a typo makes you update your php files when you have all your content already externalized. It’s a bit like mixing formating and html within the same document; in the end we used css to handle formating, and html for semantic html.
this approach will lead to mistakes, for sure, since you need to update in multiple places your php code to correct a single typo.
-
errors hard to debug
As a general rule, I prefer explicit over implicit. Here, if $translate does not find any entry matching passed argument, it silently defaults to this argument.
This is very dangerous. let’s say that we corrected above typo as mentionned previously, and we forgot to update source content of one of our translation file. If we refresh the page, content will now be displayed in source language and not in translated language and I might not see it if it’s a short sentence within a huge page full of text. The worst part being that in this case, I have the content translated, but it does not get displayed because I forgot to update my source string used as key.
I would rather prefer get an exception thrown so that when I’ve updated my source content, i refresh my page in all languages and see immediately if I forgot to update a translation file.
My approach is to:
-
use string ids as keys for translation array
Instead of calling ‘print $translate->_(“Here is line one!”);’ you call ‘print $translate->_(“line_one”);’ which refers to ‘Here is line one!’ in source content.
Therefore if you want to correct it, you simply have to update language files and no longer any php file.
-
throw exception if an id is not found in source language
Doing this, you know that the string you refer to does not exist within source language. Then you simply need to update source language to add the string and resynchronize all translations with source language (normally done automatically with translation tools).
I even go deeper into control, when I load a translation file, I check that if source content has already been loaded using another translation file, source content matching parsed id is exactly the same. If not, I throw an exception. This means that my translation files are not all synchronized with source content. => I simply need to update them using my translation tool.
This approach looks less error prone to me and far more natural. With it, content is handled within dedicated languages files and is no longer mixed within php scripts. You don’t have doublons, and you always not within a second if your translation files are not synchronized (and they always should be), ie. language problems stick with languages files and not with php files anymore.
And with this approach, we can still have some untranslated content and default to source language as currently implemented within Zend_Translate. You simply need to have your source entry within your translation file, matching an empty string or a null statement.
Well the idea is pretty easy (fully separate content from php code as we do with style and html), I managed to keep it clear.
[extra content]
basically a translation array could look like this:
$_translate = new array(
'line_one' => new array(
'src' => 'Here is line one!',
'fr' => 'Ceci est la première ligne !'
)
);
As for me, as stated I use Pootle with xliff format. What I do is pretty easy, I edit .pot files manually and replace line_number with my string_id. Then I implement an xliff parser that automatically use this id to create an array similar to above.
[my customized xliff adapter]
Feel free to use it (you can download file content here: XliffAdapter.php)
<?php
/**
* Xliff is an xml-format used to handle translation content
*
* At first I thought of extending Zend_Translate_Adapter_Xliff, but I just need to make small changes to existing adapter. Unfortunately changes must occur within private methods => I've copy-pasted Zend_Translate_Adapter_Xliff content in this class and made my changes.
*
* With Zend_Translate_Adapter_Xliff, array keys of $this->_translate are source strings, I don't want to use them as keys because:
* 1. they can be very long
* 2. if you make changes to them, first I need to update source string within php files where we call for translation. Then I need to update source string into pootle .pot files too (and from this point update .po file for each translations + regenerate xliff files)
* => double amount of work + not really readable + not good to have to edit php files each time you correct a typo in your source language
*
* Instead I do the following:
* 1. when creating Pootle .pot files, make sure to include source file + a unique id (through all translations) instead of line number
* 2. this unique id will appear when generating .xlf files with po2xliff, within <context context-type="linenumber"> node.
* 3. this class here is designed so that this id will be used as array key instead of source string.
*
* By doing this, the process to update source language (French in this case) is far simpler (you don't change php code in any file):
* - simply update Pootle .pot file
* - update pootle files for all languages using Pootle web interface (+ make translation updates into languages if required too)
* - regenerate xliff files and export them into languages folder (using pootle's po2xliff python script)
*/
require_once 'Zend/Locale.php';
require_once 'Zend/Translate/Adapter.php';
require_once 'Zend/Translate/Exception.php';
class Library_XliffAdapter extends Zend_Translate_Adapter {
/**
* Pootle xlf translator generate a source-language as "en-US" (which differ from "en_US" as Zend would expect it), I don't know where to change it but in any case, .pot files are generated using French as base language => we set it here and do not take into account source-language value
*/
const SOURCE_LANG = 'fr';
/**
* Does not load an already loaded file => keep memory of loaded files
*/
protected $_loaded_files = array();
// Internal variables
private $_file = false;
private $_cleared = array();
private $_transunit = null;
private $_source = null;
private $_target = null;
private $_scontent = null;
private $_tcontent = null;
private $_stag = false;
private $_ttag = false;
#rd_modification: added case "context"
private $_rd_tlinenumber = false;
private $_rd_linenumber = null;
/**
* Generates the xliff adapter
* This adapter reads with php's xml_parser
*
* @param string $data Translation data
* @param string|Zend_Locale $locale OPTIONAL Locale/Language to set, identical with locale identifier,
* see Zend_Locale for more information
* @param array $options OPTIONAL Options to set
*/
public function __construct($data, $locale = null, array $options = array())
{
parent::__construct($data, $locale, $options);
}
/**
* Override ->translate() method.
* Zend_Translate_Adapter->translate($message_id, $locale = null) behaves as follow:
* - if isset($this->_translate[$locale][$message_id]), return $this->_translate[$locale][$message_id]; (trying to go from regional locale to language locale if not defined)
* - otherwise: return $message_id;
* It behaves like this because it expects $message_id to store source string, so that at least source string is to be displayed.
* In fact we already have source strings saved into $this->_translate too, but we do not use them! instead we use $message_id, which I don't like to be a source string (because you have to update php files each time you find a typo in your source language, which is a nonsense to me)
*
* My subclassing does more or less the same thing (ie. try to return translated content, if not, default to source string -- but really using loaded source content and not using $message_id) but throw an error is $message_id is not found in translated content nor in source string. Therefore you do know when some content simply does not exist in your xliff files as source content.
*
* @param string $message_id
* @param sring|Zend_Locale $locale
* @return string
*/
public function translate($message_id, $locale = null, $wrong_param = null)
{
if ($locale === null) {
$locale = (string)Zend_Registry::get('Zend_Locale');
}
$content = parent::translate($message_id, $locale);
if ($content == $message_id) { // means translation not found
if (isset($this->_translate[$this->_source][$message_id])) {
$content = $this->_translate[$this->_source][$message_id];
} else {
throw new Zend_Translate_Exception(sprintf('no content found for id "%s"', $message_id));
}
}
return $content;
}
/**
* Test if passed string_id exists
*
* @param string $message_id
* @return bool
*/
public function hasMessageId($message_id)
{
return isset($this->_translate[$this->_source][$message_id]);
}
/**
* Parse all ->_translate entries to search for a locale that match all components of given array.
*
* Return an array of locales matching all these components, or an empty array otherwise.
* When testing it, we cannot test content of $this->_translate[$locale] array only. If an uri component has not been translated yet, this would result in a language considered not matching $parts (when currently it does since non-translated components defaults to source language) and could result in source language being the only one matching all parts => being considered as $uri languages => triggering a uri rewriting and creating an infinite loop on uri redirections!
* To avoid this, we test array components not against $translated array, but agains array_merge($untranslated, $translated), therefore having all entries, both translated and non-translated ones.
*
* @param array $parts
* @return array
*/
public function getLocalesMatchingAllParts(array $parts)
{
$matching_locales = array();
foreach (array_keys($this->_translate) as $locale) {
$universe = $this->_getTranslatedArrayWithDefaultContent($locale);
if (count(array_intersect($parts, $universe)) == count($parts)) array_push($matching_locales, $locale);
}
return $matching_locales;
}
/**
* Return an array merging both translated and not-yet translated components for given locale
*
* @param string $locale
* @return array
*/
protected function _getTranslatedArrayWithDefaultContent($locale)
{
if ($locale === $this->_source) {
return $this->_translate[$locale];
} else {
return array_merge($this->_translate[$this->_source], $this->_translate[$locale]);
}
}
/**
* Take as argument a locale plus translated strings, and return an array of corresponding message ids (matching both translated and not yet translated components).
*
* If passed $strings is of type string, return a string, otherwise return an array
*
* WARNING: return first matching key only => make sure you don't have doublons on translated values
*
* @param string $locale
* @param string|array $strings
* @return string|array
*/
public function getMessageIds($locale, $strings = null)
{
$universe = $this->_getTranslatedArrayWithDefaultContent($locale);
if (null === $strings) return array_keys($universe);
$ids = array();
foreach ((array)$strings as $str) {
$ids[$str] = $str ? array_search($str, $universe) : null;
}
return is_string($strings) ? reset($ids) : $ids;
}
/**
* Load translation data (XLIFF file reader)
*
* Does not load a translation file that has already been loaded
* I've added "force_locale" option too to load this specific locale and not system-wide locale
*
* @param string $locale Locale/Language to add data for, identical with locale identifier,
* see Zend_Locale for more information
* @param string $filename XLIFF file to add, full path must be given for access
* @param array $option OPTIONAL Options to use
* @throws Zend_Translation_Exception
*/
protected function _loadTranslationData($filename, $locale, array $options = array())
{
$options = $options + $this->_options;
if ($options['clear']) {
$this->_translate = array();
}
if (!$filename) return;
#rd_modification: in my case, $locale will not be used, and always refer to Zend_Registry::get('Zend_Locale') instead
if (!Zend_Registry::isRegistered('Zend_Locale')) { // can happen if 'Zend_Locale' has not been defined yet, and an error occured
require_once 'controllers/plugins/Language.php';
Zend_Registry::set('Zend_Locale', Controller_Plugin_Language::getQuickDefaultLocale());
}
$this->_target = isset($options['force_locale']) ? $options['force_locale'] : (string)Zend_Registry::get('Zend_Locale');
$filename = $this->_resolveFilename($filename);
if (in_array($filename, $this->_loaded_files)) return;
array_push($this->_loaded_files, $filename);
if (!is_readable($filename)) {
require_once 'Zend/Translate/Exception.php';
throw new Zend_Translate_Exception('Translation file \'' . $filename . '\' is not readable.');
}
$encoding = $this->_findEncoding($filename);
$this->_file = xml_parser_create($encoding);
xml_set_object($this->_file, $this);
xml_parser_set_option($this->_file, XML_OPTION_CASE_FOLDING, 0);
xml_set_element_handler($this->_file, "_startElement", "_endElement");
xml_set_character_data_handler($this->_file, "_contentElement");
if (!xml_parse($this->_file, file_get_contents($filename))) {
$ex = sprintf('XML error: %s at line %d',
xml_error_string(xml_get_error_code($this->_file)),
xml_get_current_line_number($this->_file));
xml_parser_free($this->_file);
require_once 'Zend/Translate/Exception.php';
throw new Zend_Translate_Exception($ex);
}
}
/**
* Transform file name into file path (rd_modification/addition)
*
* @param string $filename
* @return string
*/
protected function _resolveFilename($filename)
{
$basepath = '../application/languages/%s/%s';
return sprintf($basepath, $this->_target, $filename);
}
private function _startElement($file, $name, $attrib)
{
if ($this->_stag === true) {
$this->_scontent .= "<".$name;
foreach($attrib as $key => $value) {
$this->_scontent .= " $key=\"$value\"";
}
$this->_scontent .= ">";
} else if ($this->_ttag === true) {
$this->_tcontent .= "<".$name;
foreach($attrib as $key => $value) {
$this->_tcontent .= " $key=\"$value\"";
}
$this->_tcontent .= ">";
} else {
switch(strtolower($name)) {
case 'file':
#rd_modification: cf. self::SOURCE_LANG comment
$this->_source = self::SOURCE_LANG;
#rd_modification: cf. bug http://framework.zend.com/issues/browse/ZF-4087
#$this->_target = array_key_exists('target-language', $attrib) ? $attrib['target-language'] : null;
$this->_target = array_key_exists('target-language', $attrib) ? $attrib['target-language'] : $this->_target;
#rd_modified: to allow multiple translation files to be loaded within same $translate instance
#$this->_translate[$this->_source] = array();
#$this->_translate[$this->_target] = array();
if (!isset($this->_translate[$this->_source]) || !is_array($this->_translate[$this->_source]))
$this->_translate[$this->_source] = array();
if (!isset($this->_translate[$this->_target]) || !is_array($this->_translate[$this->_target]))
$this->_translate[$this->_target] = array();
break;
case 'trans-unit':
$this->_transunit = true;
break;
case 'source':
if ($this->_transunit === true) {
$this->_scontent = null;
$this->_stag = true;
$this->_ttag = false;
}
break;
case 'target':
if ($this->_transunit === true) {
$this->_tcontent = null;
$this->_ttag = true;
$this->_stag = false;
}
break;
#rd_modification: added case "contex"
case 'context':
if ($this->_transunit === true) {
$this->_rd_tlinenumber = $attrib['context-type'] == 'linenumber';
$this->_rd_linenumber = null;
}
break;
default:
break;
}
}
}
private function _endElement($file, $name)
{
if (($this->_stag === true) and ($name !== 'source')) {
$this->_scontent .= "</".$name.">";
} else if (($this->_ttag === true) and ($name !== 'target')) {
$this->_tcontent .= "</".$name.">";
} else {
switch (strtolower($name)) {
case 'trans-unit':
$this->_transunit = null;
$this->_scontent = null;
$this->_tcontent = null;
break;
case 'source':
if (!empty($this->_scontent) and !empty($this->_tcontent) or
(isset($this->_translate[$this->_source][$this->_scontent]) === false)) {
$this->_translate[$this->_source][$this->_scontent] = $this->_scontent;
}
$this->_stag = false;
break;
case 'target':
if (!empty($this->_scontent) and !empty($this->_tcontent) or
(isset($this->_translate[$this->_source][$this->_scontent]) === false)) {
$this->_translate[$this->_target][$this->_scontent] = $this->_tcontent;
}
$this->_ttag = false;
break;
#rd_modification: added case "context", cf. improvement http://framework.zend.com/issues/browse/ZF-4114
#replace key $this->_scontent with $this->_rd_linenumber
case 'context':
if (!$this->_rd_tlinenumber) break;
if (isset($this->_translate[$this->_source][$this->_rd_linenumber]) && $this->_translate[$this->_source][$this->_rd_linenumber] != $this->_scontent) {
throw new Zend_Translate_Exception(sprintf('doublon on translation id "%s" (used twice, with different source values)', $this->_rd_linenumber));
}
if (isset($this->_translate[$this->_source][$this->_scontent])) {
$this->_translate[$this->_source][$this->_rd_linenumber] = $this->_translate[$this->_source][$this->_scontent];
unset($this->_translate[$this->_source][$this->_scontent]);
}
if (isset($this->_translate[$this->_target][$this->_scontent])) {
$this->_translate[$this->_target][$this->_rd_linenumber] = $this->_translate[$this->_target][$this->_scontent];
unset($this->_translate[$this->_target][$this->_scontent]);
}
$this->_rd_tlinenumber = false;
break;
default:
break;
}
}
}
private function _contentElement($file, $data)
{
if (($this->_transunit !== null) and ($this->_source !== null) and ($this->_stag === true)) {
$this->_scontent .= $data;
}
if (($this->_transunit !== null) and ($this->_target !== null) and ($this->_ttag === true)) {
$this->_tcontent .= $data;
}
#rd_modification: added case "context", cf. improvement http://framework.zend.com/issues/browse/ZF-4114
if ($this->_transunit !== null and $this->_rd_tlinenumber === true) {
$this->_rd_linenumber .= $data;
}
}
private function _findEncoding($filename)
{
$file = file_get_contents($filename, null, null, 0, 100);
if (strpos($file, "encoding") !== false) {
#rd_modification: cf. bug http://framework.zend.com/issues/browse/ZF-4085
#$encoding = substr($file, strpos($file, "encoding") + 10);
#$encoding = substr($encoding, 0, strpos($encoding, '"'));
$encoding = substr($file, strpos($file, "encoding") + 9);
$encoding = substr($encoding, 1, strpos($encoding, $encoding[0]));
return $encoding;
}
return 'UTF-8';
}
/**
* Returns the adapter name
*
* @return string
*/
public function toString()
{
return "Xliff";
}
}
you can download file content here: XliffAdapter.php
[sources]
- http://framework.zend.com/manual/en/zend.translate.html