Translation and localization

DITA has markup that facilitates translation and localization. This markup includes the @xml:lang attribute, the @dir attribute, and the @translate attribute.

The @xml:lang attribute

The @xml:lang attribute specifies the language and optional locale of the content that is contained in an element. The @xml:lang attribute is described in the XML Recommendation.

Since the @xml:lang attribute is an inherent property of the XML document, it does not behave in the same way as other DITA metadata attributes do.

Within topic and map documents, the @xml:lang attribute applies to the content and attributes that are contained by the element on which it is specified. This means that it supplies a value for lower-level elements in the containment hierarchy that do not supply their own value for the @xml:lang attribute. However, any such value is overridden when an @xml:lang attribute with a different value is specified on lower-level elements in the containment hierarchy.

When the @xml:lang attribute is specified on a topic reference, it does not apply to the referenced resource. This means that the value of the @xml:lang attribute on a topic reference (or the root element of the map) does not automatically supply a default value for the referenced topic or DITA map.

For topic and map documents, if no value for the @xml:lang value is specified explicitly or on a higher-level element in the containment hierarchy, a processor-determined default value is assumed.

Recommendations for the @xml:lang attribute

Specifying the @xml:lang attribute in the DITA source facilitates translation and helps ensure that processors will handle content appropriately. Accordingly, this specification makes certain best-practices recommendations for where the @xml:lang attribute should be set.

Setting the @xml:lang attribute in the source-language document facilitates the translation process. Some translation tools do not support adding new markup to the document that is being translated, so if the source language content does not set the @xml:lang attribute, it might be difficult or impossible for a translator to add the @xml:lang attribute to the translated document.

In addition, setting the @xml:lang attribute in the DITA source ensures that processors handle content in a language- and locale-appropriate way. If the @xml:lang attribute is not set, processors might assume a default value which is inappropriate for the DITA content.

The following table outlines the recommended use of the @xml:lang attribute in topics and maps. These recommendations ensure that DITA resources have an effective default language.

DITA resource Recommended use
DITA topic document that contains a single language Specify the @xml:lang attribute on the root element of the document.
DITA topic document that contains more than one language Specify the primary language and locale that applies to the topic on the highest-level element that contains content. If part of a topic is written in a different language, enclose that content in an element with the @xml:lang attribute set appropriately. This applies to both block and inline elements that use the alternate language.
DITA map Specify the @xml:lang attribute on the root element of the map. This applies both to the root map and any submaps.

Processing expectations regarding the @xml:lang attribute

When the @xml:lang attribute is specified as recommended, a language for the content is clearly indicated. However, when the @xml:lang attribute is not specified, processors might need to assign a default value.

If the root element of a map or a top-level topic has no value for the @xml:lang attribute, a processor SHOULD assume a default value. The default value of the processor can be either fixed, configurable, or derived from the content itself, such as the @xml:lang attribute on the root map.

When a @conref or @conkeyref attribute is used to include content from one element into another, the processor MUST use the effective value of the @xml:lang attribute from the referenced element. If the referenced element does not have an explicit value for the @xml:lang attribute, the processor SHOULD use the default value.

Processors SHOULD render each element in a way that is appropriate for its language as identified by the @xml:lang attribute.

Example: content reference and the @xml:lang attribute

This section is non-normative.

This example outlines how processors determine the effective value of the @xml:lang attribute for content that is referenced by the @conref or @conkeyref attribute.

In this scenario, a company has a notices topic that contains warnings in multiple languages. The notices topic specifies an @xml:lang attribute of en. However, it contains content that is reused from topics that explicitly set the @xml:lang attribute to fr and de.

The following code block shows the content of the DITA topic that contains the referencing elements:

Figure 1. Topic that contains the conrefs
<topic xml:lang="en" id="notices">
 <title>NOTICES</title>
 <shortdesc>Be sure to read all product safety information before using the product.</shortdesc>
 <body>
   <note id="warning-english" conref="warnings-en.dita#warnings/general"/>
   <note id="warning-french" conref="warnings-fr.dita#warnings/general"/>
   <note id="warning-german" conref="warnings-de.dita#warnings/general"/>
   <!-- ... All supported languages for the product ... -->
 </body>
</topic>

The following code blocks show the content of the topics that contains the referenced elements:

Figure 2. English warnings topic: warnings-en.dita
<topic id="warnings" xml:lang="en">
 <title>Reusable warnings (English)</title>
 <body>
  <note id="general">General notice about using the product...</note>
  <note id="water">Warning about using the product near water...</note>
  <!-- Other reusable warnings -->
 </body>
</topic>
Figure 3. French warnings topic: warnings-fr.dita
<topic id="warnings" xml:lang="fr">
 <title>Reusable warnings (French)</title>
 <body>
  <note id="general">(French translation of: General notice about using the product...)</note>
  <note id="water">(French translation of: Warning about using the product near water...)</note>
  <!-- Other reusable warnings -->
 </body>
</topic>
Figure 4. German warnings topic: warnings-de.dita
<topic id="warnings" xml:lang="de">
 <title>Reusable warnings (German)</title>
 <body>
  <note id="general">(German translation of: General notice about using the product...)</note>
  <note id="water">(German translation of: Warning about using the product near water...)</note>
  <!-- Other reusable warnings -->
 </body>
</topic>

When the topic that contains the conrefed notes is processed, the following occurs:

  • The <note> element with the @id attribute set to warning-french has an effective value for the @xml:lang attribute of fr.
  • The <note> element with the @id attribute set to warning-german has an effective value for the @xml:lang attribute of de.

In each case, the effective value of the @xml:lang attribute for the note is determined by the value of the @xml:lang attribute that is specified on the topic that contains the referenced element, instead of the value of the @xml:lang attribute that is specified on the notices topic that contains the referencing elements.

The @dir attribute

The @dir attribute provides instructions to processors about how bidirectional text is rendered.

The @dir attribute identifies or overrides the text directionality. The following values are valid:

lro
Indicates an override of the Unicode Bidirectional Algorithm, forcing the element into left-to-right mode.
ltr
Indicates left-to-right.
rlo
Indicates an override of the Unicode Bidirectional Algorithm, forcing the element into right-to-left mode.
rtl
Indicates right-to-left.
-dita-use-conref-target
See Using the -dita-use-conref-target value for more information.

The Unicode Bidirectional Algorithm

The Unicode Bidirectional Algorithm plays a critical role in ensuring that bidirectional text is correctly rendered.

Bidirectional text is text that contains text in both text directionalities, right-to-left (RTL) and left-to-right (LTR). Common examples of bidirectional text include the following:

  • Documents in RTL languages such as Arabic, Hebrew, Farsi, Urdu, and Yiddish that include numerics or embedded sections of LTR text
  • Documents that contain text in both LTR and RLT languages, for example, a topic that lists the names of a movie in multiple languages

The Unicode Bidirectional Algorithm specifies how text should be rendered for a given language. For more information about the Unicode Bidirectional Algorithm, see the following resources:

Processing expectations regarding the Unicode Bidirectional Algorithm

Processor support for the Unicode Bidirectional Algorithm is critical.

DITA processors SHOULD fully support the Unicode Bidirectional Algorithm. This ensures that processors can implement the script and directionality for each language that is used in a document.

The @translate attribute

The @translate attribute provides information about whether the content of an element should be translated.

The following values are valid: yes, no, and -dita-use-conref-target.

A few elements have the @translate attribute set by default to no. These elements include <draft-comment> and <required-cleanup>, all elements that are designed to hold content that is not intended for publication.

The non-normative appendix, Element-by-element recommendations for translators, includes information on whether the element is block or inline, whether the element contents are likely to be suitable for translation, and whether the element has attributes whose values might need translation.