Translation and localization
Translation and localization
DITA has markup that facilitates translation and localization.
This markup includes the @xml:lang
attribute, the
@dir
attribute, and the @translate
attribute.
The @xml:lang
attribute
The @xml:lang
attribute specifies the
language and optional locale of the content that is contained in an
element. The @xml:lang
attribute is described in the
XML Recommendation.
Since the @xml:lang
attribute is an inherent
property of the XML document, it does not behave in the same way as
other DITA metadata attributes do.
Within topic and map documents, the @xml:lang
attribute applies to the content and attributes that are contained by
the element on which it is specified. This means that it supplies a
value for lower-level elements in the containment hierarchy that do
not supply their own value for the @xml:lang
attribute. However, any such value is overridden when an
@xml:lang
attribute with a different value is
specified on lower-level elements in the containment hierarchy.
When the @xml:lang
attribute is specified on a topic reference, it does not
apply to the referenced resource. This means that the value of the @xml:lang
attribute on a topic reference (or the root element of the map) does not automatically supply
a default value for the referenced topic or DITA map.
For topic and map documents, if no value for the @xml:lang
value is specified
explicitly or on a higher-level element in the containment hierarchy, a processor-determined
default value is assumed.
Recommendations for the @xml:lang
attribute
Specifying the @xml:lang
attribute in the
DITA source facilitates translation and helps ensure that processors
will handle content appropriately. Accordingly, this specification
makes certain best-practices recommendations
for where the @xml:lang
attribute should be
set.
Setting the @xml:lang
attribute in the source-language document facilitates
the translation process. Some translation tools do not support adding new markup to the
document that is being translated, so if the source language content does not set the
@xml:lang
attribute, it might be difficult or impossible for a translator to
add the @xml:lang
attribute to the translated document.
In addition, setting the @xml:lang
attribute in the
DITA source ensures that processors handle content in a language- and
locale-appropriate way. If the @xml:lang
attribute is
not set, processors might assume a default value which is
inappropriate for the DITA content.
The following table outlines the recommended use of the @xml:lang
attribute
in topics and maps. These recommendations ensure that DITA resources have an effective default
language.
DITA resource | Recommended use |
---|---|
DITA topic document that contains a single language | Specify the @xml:lang attribute on the root element of the
document. |
DITA topic document that contains more than one language | Specify the primary language and locale that applies to
the topic on the highest-level element that contains content. If
part of a topic is written in a different language, enclose that
content in an element with the @xml:lang
attribute set appropriately. This
applies
to both block and inline elements that use the alternate
language. |
DITA map | Specify the @xml:lang attribute on the
root element of the map. This applies both to the root map and
any submaps. |
Processing expectations regarding the @xml:lang
attribute
When the @xml:lang
attribute is specified as
recommended, a language for the content is clearly indicated. However,
when the @xml:lang
attribute is not specified,
processors might need to assign a default value.
If the root element of a map or a top-level topic has no value for the
@xml:lang
attribute, a processor SHOULD
assume a default value. The default value of the processor can be either fixed, configurable,
or derived from the content itself, such as the @xml:lang
attribute on the
root map.
When a @conref
or @conkeyref
attribute is used to include content from one element into another,
the processor MUST use the
effective value of the @xml:lang
attribute from the
referenced element. If the referenced element does
not have an explicit value for the @xml:lang
attribute, the processor SHOULD
use the default value.
Processors SHOULD render each
element in a way that is appropriate for its language as identified
by the @xml:lang
attribute.
Example: content reference and the @xml:lang
attribute
This section is non-normative.
This example outlines how processors
determine the effective value of the @xml:lang
attribute for content that is referenced by the @conref
or @conkeyref
attribute.
In this scenario, a company has a notices topic
that contains warnings in multiple languages. The notices topic
specifies an @xml:lang
attribute of
en. However, it contains content that is reused
from topics that explicitly set the @xml:lang
attribute to fr and
de.
The following code block shows the content of the DITA topic that contains the referencing elements:
<topic xml:lang="en" id="notices">
<title>NOTICES</title>
<shortdesc>Be sure to read all product safety information before using the product.</shortdesc>
<body>
<note id="warning-english" conref="warnings-en.dita#warnings/general"/>
<note id="warning-french" conref="warnings-fr.dita#warnings/general"/>
<note id="warning-german" conref="warnings-de.dita#warnings/general"/>
<!-- ... All supported languages for the product ... -->
</body>
</topic>
The following code blocks show the content of the topics that contains the referenced elements:
<topic id="warnings" xml:lang="en">
<title>Reusable warnings (English)</title>
<body>
<note id="general">General notice about using the product...</note>
<note id="water">Warning about using the product near water...</note>
<!-- Other reusable warnings -->
</body>
</topic>
<topic id="warnings" xml:lang="fr">
<title>Reusable warnings (French)</title>
<body>
<note id="general">(French translation of: General notice about using the product...)</note>
<note id="water">(French translation of: Warning about using the product near water...)</note>
<!-- Other reusable warnings -->
</body>
</topic>
<topic id="warnings" xml:lang="de">
<title>Reusable warnings (German)</title>
<body>
<note id="general">(German translation of: General notice about using the product...)</note>
<note id="water">(German translation of: Warning about using the product near water...)</note>
<!-- Other reusable warnings -->
</body>
</topic>
When the topic that contains the conrefed notes is processed, the following occurs:
- The
<note>
element with the@id
attribute set to warning-french has an effective value for the@xml:lang
attribute of fr. - The
<note>
element with the@id
attribute set to warning-german has an effective value for the@xml:lang
attribute of de.
In each case, the effective value of the @xml:lang
attribute for the note is determined by the value of the
@xml:lang
attribute that is specified on the
topic that contains the referenced element, instead of the value of
the @xml:lang
attribute that is specified on the
notices topic that contains the referencing elements.
The @dir
attribute
The @dir
attribute provides instructions to
processors about how bidirectional text is rendered.
The @dir
attribute identifies or overrides the
text directionality. The following values are valid:
- lro
- Indicates an override of the Unicode Bidirectional Algorithm, forcing the element into left-to-right mode.
- ltr
- Indicates left-to-right.
- rlo
- Indicates an override of the Unicode Bidirectional Algorithm, forcing the element into right-to-left mode.
- rtl
- Indicates right-to-left.
- -dita-use-conref-target
- See Using the -dita-use-conref-target value for more information.
The Unicode Bidirectional Algorithm
The Unicode Bidirectional Algorithm plays a critical role in ensuring that bidirectional text is correctly rendered.
Bidirectional text is text that contains text in both text directionalities, right-to-left (RTL) and left-to-right (LTR). Common examples of bidirectional text include the following:
- Documents in RTL languages such as Arabic, Hebrew, Farsi, Urdu, and Yiddish that include numerics or embedded sections of LTR text
- Documents that contain text in both LTR and RLT languages, for example, a topic that lists the names of a movie in multiple languages
The Unicode Bidirectional Algorithm specifies how text should be rendered for a given language. For more information about the Unicode Bidirectional Algorithm, see the following resources:
- Unicode Bidirectional Algorithm, Unicode Standard Annex #9
- Specifying the direction of text and tables: the dir attribute, HTML 4.01 Specification
- Inline markup and bidirectional text in HTML, W3C internationalization article
- XHTML Bi-directional Text Attribute Module, XHTML 2.0 W3C Working Draft 22
Recommended usage of the @dir
attribute
Typically, processors that fully support the Unicode
Bidirectional Algorithm handle bidirectional text without the need to
specify directionality in the DITA source, if the
@xml:lang
attribute is specified on the highest-level
element.
The need to specify the @dir
attribute primarily
occurs in the following situations:
- Processors that do not fully support the Unicode Bidirectional Algorithm
- Documents that contain bidirectional text and characters with neutral bidirectionality
For the above situations, we recommend that DITA source documents,
in addition to specifying the @xml:lang
attribute,
also specify the @dir
attribute on the highest-level
element that is necessary.
Processing expectations regarding the Unicode Bidirectional Algorithm
Processor support for the Unicode Bidirectional Algorithm is critical.
DITA processors SHOULD fully support the Unicode Bidirectional Algorithm. This ensures that processors can implement the script and directionality for each language that is used in a document.
The @translate
attribute
The @translate
attribute provides information
about whether the content of an element should be
translated.
The following values are valid: yes, no, and -dita-use-conref-target.
A few elements have the @translate
attribute set by
default to no. These elements include
<draft-comment>
and
<required-cleanup>
, all elements that are
designed to hold content that is not intended for publication.
The non-normative appendix, Element-by-element recommendations for translators, includes information on whether the element is block or inline, whether the element contents are likely to be suitable for translation, and whether the element has attributes whose values might need translation.