Multilingual Document Support

cancel
Showing results for 
Search instead for 
Did you mean: 

Multilingual Document Support

resplin
Intermediate
0 0 8,562

Obsolete Pages{{Obsolete}}


Overview


This page discusses the multilingual capabilities of the core Alfresco repository. Translating the Alfresco interfaces is discussed on the page Language Packs.

Information about how Alfresco user interfaces handle multilingual content is here:


Alfresco supports multilingual at many levels:


  • Multilingual Text Properties
    • It is possible to declare text properties that support multilingual values.
    • Backward compatibility is maintained with existing API clients and metadata using a specific interceptor.
  • Grouping of Translations
    • A data model will be defined for managing documents and their translations.
    • A service was written to support the management of the translation structure.
      • The AnalysisMode
      • The Indexing Analysis mode
      • The Indexing Analysis mode
  • Locale-specific Filtering
    • Where required, multilingual (ML) text fields will be filtered to the user's preference.
    • Document translations will be filtered out according to the the user's preference.
    • Searching
    • Searching on Multilingual properties.
    • Searching on content in a particular language

There are two locale properties on content:


  • sys:locale
  • cm:content.locale which is dependent on the user's locale settings (if the client's locale preference is set to 'de', any content uploaded is set to 'de' even if the sys:locale is set to something else)

This page refers to example code contained in this archive: FirstFoundationMultilingual.7z.sample


Multilingual Text Properties


Dictionary Support


  • The definition of a d:mltext data type.
  • Modification of the existing cm:title and cm:description

Persistence Support


An elementary type d:mltext was added to the dictionary model
(dictionaryModel.xml). If one of your textual property needs to be localized,
use the d:mltext elementary type. The
MLPropertyInterceptor helps hiding and controlling the localized
node properties behavior. The MLPropertyInterceptor intercepts the
node service methods: getProperty, getProperties,
setProperties, setProperty. The interception mechanism can be activated or deactivated: setMLAware(false) activate the interceptor, setMLAware(true) deactivate the interceptor. It only impacts the current thread.

By positioning content Locale, you will tell the property interceptor what language you give to the setProperty method. The same is done for getProperty.

See: doEnterData1 in the given example.


Locale Support


I18NUtil controls the default locale and the content locale. setLocale(Locale locale) sets the locale for the current thread (ThreadLocal variable). setContentLocale(Locale locale) sets the locale for the content locale variable. The contentLocale variable controls what is the preference in terms of language for d:mltext properties and content.


AnalysisMode


Analysis modes will govern the way d:mltext properties will be indexed and searched.

Example: have a look in the FirstMLFoundationClient.doDemoAnalysisMode()

The possible values for AlalysisMode are:

LOCALE_ONLY
No expansion is performed, the locale value is kept has it is.

LOCALE_AND_ALL
Idem previous plus the empty locale representing all the existing locales.

LOCALE_AND_ALL_CONTAINING_LOCALES
If a locale is LANG_COUNTRY_VARIENT, it will be expanded to LANG_COUNTRY_VARIENT, LANG_COUNTRY, LANG_COUNTRY.

LOCALE_AND_ALL_CONTAINING_LOCALES_AND_ALL
Idem previous plus the empty locale symbolising all locales.

LOCALE_AND_ALL_CONTAINED_LOCALES
Will be the locale as it is and its expansion using Locale.getAvailableLocales() matching the root given by the locale.

ALL_ONLY
Only the empty locale representing all the locales.

ALL_LANGUAGES
Takes the language par of the locale and expand it using Locale.getAvailableLocales() and using the language given by the locale as root for filering.

ALL_LANGUAGES_AND_ALL
Idem previous and the empty locale symbolizing all the locales.

EXACT_LANGUAGE
Expand the locale to language part only and exactly.

EXACT_LANGUAGE_AND_ALL
Expand the locale to language part only and the locale symbolizing all the locales.

EXACT_COUNRTY
Look at the given examples.

EXACT_COUNTRY_AND_ALL
Look at the given examples.

ALL_COUNTRIES
Look at the given examples.

ALL_COUNTRIES_AND_ALL

Examples:



******************************************************
Locale to expand:fr
ExpandingMode:LOCALE_ONLY
*fr*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:LOCALE_ONLY
*fr_FR*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:LOCALE_AND_ALL
**
*fr*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:LOCALE_AND_ALL
**
*fr_FR*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:LOCALE_AND_ALL_CONTAINING_LOCALES
*fr*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:LOCALE_AND_ALL_CONTAINING_LOCALES
*fr_FR*
*fr*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:LOCALE_AND_ALL_CONTAINING_LOCALES_AND_ALL
**
*fr_FR*
*fr*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:LOCALE_AND_ALL_CONTAINING_LOCALES_AND_ALL
**
*fr*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:LOCALE_AND_ALL_CONTAINED_LOCALES
*fr_FR*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:LOCALE_AND_ALL_CONTAINED_LOCALES
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:ALL_ONLY
**
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:ALL_ONLY
**
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:ALL_LANGUAGES
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:ALL_LANGUAGES
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:ALL_LANGUAGES_AND_ALL
**
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:ALL_LANGUAGES_AND_ALL
**
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:EXACT_LANGUAGE
*fr*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:EXACT_LANGUAGE
*fr*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:EXACT_LANGUAGE_AND_ALL
**
*fr*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:EXACT_LANGUAGE_AND_ALL
**
*fr*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:EXACT_COUNRTY
*fr_FR*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:EXACT_COUNRTY
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:EXACT_COUNTRY_AND_ALL
**
*fr_FR*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:EXACT_COUNTRY_AND_ALL
**
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
Locale to expand:fr_FR
ExpandingMode:ALL_COUNTRIES
*fr_FR*
******************************************************
******************************************************
Locale to expand:fr
ExpandingMode:ALL_COUNTRIES
*fr_CA*
*fr_FR*
*fr*
*fr_CH*
*fr_LU*
*fr_BE*
******************************************************
******************************************************
 

Searching support


The default value for defaultMLSearchAnalysisMode is given as parameter to the luceneIndexerAndSearcherFactory. Current default value is EXACT_LANGUAGE_AND_ALL.
The parameter pilots the default analysis mode that will be used when querying. Example:

You are searching for the term “A“ in French ( locale = fr ). Assume that your are searching with analysis mode is ALL_COUNTRIES and the locale in your searching parameter is « fr » then the terms of the query will be expanded to: looking for “A” in the section of the lucene field fr_CA, fr_FR, fr, fr_CH, fr_LU, fr_BE. The query will match if the term “A” appears in one of those languages in the field.


Indexing Support


All multi-lingual fields are indexed carrying locale information. Their content is tokenised according to the locale. So the text for each locale is indexed and tokenised in its own right. The tokens are prefixed in the index with locale information.

When searching, the locale(s) to use can be specified on the SearchParameters or taken as the user's default locale.
The search is resticted to the sepecific strings in just those locales. By default, the locale 'fr' will only match 'fr' and not 'fr_CA'. How locales expand can be configured in the search parameters. 'fr' can match 'fr' only, or 'fr' and all countries and all varients. 'fr_CA_SomeVarient' can match only 'fr_CA_SomeVarien' or 'fr_CA_SomeVarient', 'fr_CA', and 'fr'.

If cm:mltitle were a ML string. It could be queried in lucene using

@cm\:mltitle:'banana'

The locales specified on the search parameters would govern what locales were matched.

It is intended to specify locales in the lucene search at some point for interlingual searches; e.g.

@cm\:mltitle_en:'banana' @cm\:mltitle_ja:'�?ナナ'

The tokenisation for each locale is picked up as defined by the data dictionary localisation. By default, the locales are:
default(en), cn, cs, da, de, el, en, es, fr, it, ja, ko, nl, no, pt_BR, pt, ru, and sv. Some locales have alternatives.

Remaks: before changing defaultMLSearchAnalysisMode and luceneIndexerAndSearcherFactory be sure they exactly correspond to what you are expecting. Consider the terms duplications involved by an indexing mode like ALL_COUNTRIES. In that case, if a term is given to the indexer with a locale=fr then the term will be duplicated for fr_CA, fr_FR, fr, fr_CH, fr_LU, fr_BE. The consequence is bigger indexes.

In the index part containing «ALL» the text in all the languages a gap of 1000 terms is introduced at indexing time to avoid phrases matching between separate the languages.

For deeper understanding have a look at FirstMLFoundationClient.doSearchData1(). Comments there should help.


Multilingual Document Translations


Model


Details on model localization are here:
http://wiki.alfresco.com/wiki/Data_Dictionary_Guide#Model_Localization


  • Extensions to contentModel.xml

<type name='cm:mlRoot'>
  <title>Multilingual Root</title>
  <parent>sys:container</parent>
  <associations>
    <child-association name='cm:mlContainer'>
      <source>
        <mandatory>false</mandatory>
        <many>false</many>
      </source>
      <target>
        <class>cm:mlContainer</class>
        <mandatory>false</mandatory>
        <many>true</many>
      </target>
    </child-association>
  </associations>
</type>
 
<type name='cm:mlContainer'>
  <title>Multilingual Container</title>
  <parent>sys:container</parent>
  <associations>
    <child-association name='cm:mlChild'>
      <source>
        <mandatory>false</mandatory>
        <many>false</many>
      </source>
      <target>
        <class>cm:mlDocument</class>
        <mandatory>true</mandatory>
        <many>true</many>
      </target>
    </child-association>
  </associations>
  <mandatory-aspects>
    <aspect>cm:versionable</aspect>
  </mandatory-aspects>
</type>
 
<aspect name='cm:mlDocument'>
  <title>Multilingual Document</title>
  <mandatory-aspects>
    <aspect>sys:localized</aspect>
    <aspect>cm:versionable</aspect>
  </mandatory-aspects>
</aspect>

<aspect name='cm:titled'>
  <title>Titled</title>
  <properties>
    <property name='cm:title'>
      <title>Title</title>
      <type>d:mltext</type>
    </property>
    <property name='cm:description'>
      <title>Description</title>
      <type>d:mltext</type>
    </property>
  </properties>
</aspect>

The cm:mlRoot type represents the root of a «well known location»
containing all the cm:mlContainer. The cm:mlRoot is
located just under the root of the container. One cm:mlContainer contains all the translations of one single document.


MultilingualContentService


Controlling the language for content


The analysis mode for cm:content property type is MLAnalysisMode.ALL_ONLY. The parameter can't be changed.

The content field are indexed using an analyzer corresponding to the locale of the server by default.

Be aware that some analyzers like frenchAnalyser are changing the words before they are given to the indexer (ex:removing «e» at the end of the words).

All the content properties will be indexed with the same analyzer. If you need to search on content properties in a given language, you have to add a property containing the language code (ex:fr,en) to your document node but thankfully this is done for you by the MultilingualContentService.



@PublicService
public interface MultilingualContentService
{
/**
* Rename an existing sys:localized by adding locale suffixes to the base name.
* Where there are name clashes with existing documents, a numerical naming scheme will be
* adopted.
*
* @param localizedNodeRef An existing sys:localized
*/
@Auditable(key = Auditable.Key.ARG_0, parameters = {'localizedNodeRef'})
void renameWithMLExtension(NodeRef localizedNodeRef);
 
/**
* Make an existing document into a translation by adding the cm:mlDocument aspect and
* creating a cm:mlContainer parent. If it is already a translation, then nothing is done.
*
* @param contentNodeRef An existing cm:content
* @return Returns the cm:mlContainer translation parent
*
* @see org.alfresco.model.ContentModel#ASPECT_MULTILINGUAL_DOCUMENT
*/
@Auditable(key = Auditable.Key.ARG_0, parameters = {'contentNodeRef', 'locale'})
NodeRef makeTranslation(NodeRef contentNodeRef, Locale locale);
 
/**
* Make a translation out of an existing document. The necessary translation structures will be created
* as necessary.
*
* @param newTranslationNodeRef An existing cm:content
* @param translationOfNodeRef An existing cm:mlDocument or cm:mlContainer
* @return Returns the cm:mlContainer translation parent
*/
@Auditable(key = Auditable.Key.ARG_0, parameters = {'newTranslationNodeRef', 'translationOfNodeRef', 'locale'})
NodeRef addTranslation(NodeRef newTranslationNodeRef, NodeRef translationOfNodeRef, Locale locale);
 
/**
*
* @return Returns the cm:mlContainer translation parent
*/
@Auditable(key = Auditable.Key.ARG_0, parameters = {'translationNodeRef'})
NodeRef getTranslationContainer(NodeRef translationNodeRef);
 
/**
* Create a new edition of an existing cm:mlContainer using any one of the
* associated cm:mlDocument transalations.
*
* @param translationNodeRef The specific cm:mlDocument to use as the starting point
* of the new edition. All other translations will be removed.
*/
@Auditable(key = Auditable.Key.ARG_0, parameters = {'translationNodeRef'})
void createEdition(NodeRef translationNodeRef);
 
/**
* Gets the set of sibling translations associated with the given cm:mlDocument or
* cm:mlContainer.
*
* @param translationOfNodeRef An existing cm:mlDocument or cm:mlContainer
* @return Returns a map of translation nodes keyed by locale
*/
@Auditable(key = Auditable.Key.ARG_0, parameters = {'translationOfNodeRef'})
Map<Locale, NodeRef> getTranslations(NodeRef translationOfNodeRef);
/**
* Given a cm:mlDocument, this method attempts to find the best translation for the given
* locale. If there is not even a
* {@link org.alfresco.i18n.I18NUtil#getNearestLocale(Locale, Set) partial match}, then null
* is returned.
*
* @param translationNodeRef the cm:mlDocument
* @param locale the target locale
* @return Returns Returns the best match for the locale, or null if there
* is no near match.
*
* @see #getTranslations(NodeRef)
* @see org.alfresco.i18n.I18NUtil#getNearestLocale(Locale, Set)
*/
@Auditable(key = Auditable.Key.ARG_0, parameters = {'translationNodeRef', 'locale'})
NodeRef getTranslationForLocale(NodeRef translationNodeRef, Locale locale);
}

For how to create multilingual content have a look at
doCreateMultiContent in the FirstMLFoundationClient example.
When creating a content with Alfresco, the locale associated to the content is
the one put in the system using I18NUtil.setLocale(<locale value>).

Code extract:



I18NUtil.setLocale(language_of_the_content);
//add the content
//
// write some content to new node
//
ContentService contentService = serviceRegistry.getContentService();
ContentWriter writer = contentService.getWriter(childNode, ContentModel.PROP_CONTENT, true);
writer.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN);
 
writer.setEncoding('UTF-8');
 
writer.putContent(cont);

Remark: associating the language on the content should also be possible using
MultilingualContentService.


Searching content in translations


The analysis mode is not relevant when searching and indexing content.
For querying the analysys mode is only relevant for the clauses involving d:mltext properties.

Retrieving translations using in a particular language is possible using a query like:



String query = 'ASPECT:\'' + ContentModel.ASPECT_MULTILINGUAL_DOCUMENT.toString() + '\' and ' +
'@' + LuceneQueryParser.escape(ContentModel.PROP_CONTENT.toString()) + '.locale:fr';
/* The default server locale is used in that case but it is not relevent because this is ALL_ONLY */
executeQuery(storeRef,searchService, MLAnalysisMode.ALL_ONLY, query , null);

Please have a look in
FirstMLFoundationClient.doSearchContent(ServiceRegistry
serviceRegistry)


 
Remarks:
there is a typo in EXACT_COUNRTY defined in the constants.

The previous search can be combined with d:mltext properties searching.


Web Scripts and Templates


Accessing a web script via /alfresco/wcs returns content in whatever language is selected by the user's profile in Alfresco Explorer.

Accessing a web script via /alfresco/service returns content in whatever language is requested by the browser.

For email sent out by scheduled tasks, Alfresco doesn't have a user profile language or browser language so it uses the language that Tomcat runs under. Set that with JAVA_OPTs (JAVA_OPTS=%JAVA_OPTS% -Duser.language=fr -Duser.country=FR). Or you can replace the email template with copy in your preferred language.


Editions


The repository supports versioning all of the translated content for a single document together through a featured called 'editions'. This functionality is currently only exposed in the Explorer UI.

http://docs.alfresco.com/4.0/topic/com.alfresco.enterprise.doc/tasks/tuh-multilingual.html


Limitations


Alfresco does not currently have support for bidirectional text, right-to-left text, or vertical text.
https://issues.alfresco.com/jira/browse/ALF-17472


Tags


If you are working with a document set, each translation needs to have its own set of language specific tags.

Tags have a locale set on them, but are not translatable because they do not have the 'titled' aspect.

I18N