Specialization

The specialization feature of DITA allows for the creation of new element types and attributes that are explicitly and formally derived from existing types. This facilitates interchange of conforming DITA content and ensures a minimum level of common processing for all DITA content. It also allows specialization-aware processors to add specialization-specific processing to existing base processing.

Overview of specialization

Specialization allows information architects to define new kinds of information (new structural types or new domains of information), while reusing as much of existing design and code as possible, and minimizing or eliminating the costs of interchange, migration, and maintenance.

Specialization modules enable information architects to create new element types and attributes. These new element types and attributes are derived from existing element types and attributes.

In traditional XML applications, all semantics for a given element instance are bound to the element type, such as <para> for a paragraph or <title> for a title. The XML specification provides no built-in mechanism for relating two element types to say "element type B is a subtype of element type A".

In contrast, the DITA specialization mechanism provides a standard mechanism for declaring that an element type or attribute is derived from an ancestor type. This means that a specialized type inherits the semantics and default processing behavior from its ancestor type. Additional processing behavior optionally can be associated with the specialized descendant type.

For example, the <section> element type is part of the DITA base. It represents an organizational division in a topic. Within the task information type (itself a specialization of <topic>), the <section> element type is further specialized to other element types (such as <prereq> and <context>) that provide more precise semantics about the type of organizational division that they represent. The specialized element types inherit both semantic meaning and default processing from the ancestor elements.

There are two types of DITA specializations:

Structural specialization

Structural specializations are developed from either topic or map types. Structural specializations enable information architects to add new document types to DITA. The structures defined in the new document types either directly use, or inherit from, elements found in other document types. For example, concept, task, and reference are specialized from topic, and bookmap is specialized from map.

Domain specialization
Domain specializations are developed from elements defined within topic or map, or from the @props or @base attributes. They define markup for a specific information domain or subject area. Domain specializations can be added to document-type shells.

Each type of specialization module represents an is a hierarchy, in object-oriented terms, with each structural type or domain being a subclass of its parent. For example, a specialization of task is still a task, and a specialization of the user interface domain is still part of the user interface domain. A given domain can be used with any map or topic type. In addition, specific structural types might require the use of specific domains.

Use specialization when you need a new structural type or domain. Specialization is appropriate in the following circumstances:

  • You need to create markup to represent new semantics (meaningful categories of information). This might enable you to have increased consistency or descriptiveness in your content model.
  • You have specific needs for output processing and formatting that cannot be addressed using the current content model.

Do not use specialization to simply eliminate element types from specific content models. Use constraint modules to restrict content models and attribute lists without changing semantics.

Modularization

Modularization is at the core of DITA design and implementation. It enables reuse and extension of the DITA specialization hierarchy.

The DITA XML grammar files are a set of module files that declare the markup and entities that are required for each specialization. The document-type shell then integrates the modules that are needed for a particular authoring and publishing context.

Because all the pieces are modular, the task of developing a new information type or domain is simplified. An information architect can start with existing base types (topic or map)—or with an existing specialization if it comes close to matching their business requirements—and only develop an extension that adds the extra semantics or functionality that is required. A specialization reuses elements from ancestor modules, but it only needs to declare the elements and attributes that are unique to the specialization. This saves considerable time and effort, reduces error, enforces consistency, and makes interoperability possible.

Because all the pieces are modular, it is simpler to reuse different modules in different contexts.

For example, a company that produces machines can use the hazard statement domain, while a company that produces software can use the software, user interface, and programming domains. A company that produces health information for consumers can avoid using the standard domains. Instead, it develops a new domain that contains the elements necessary for capturing and tracking the comments made by medical professionals who review information for accuracy and completeness.

Because all the pieces are modular, new modules can be created and put into use without affecting existing document-type shells.

For example, a marketing division of a company can develop a new specialization for message campaigns and have their content authors begin using that specialization, without affecting any of the other information types that they have in place.

Vocabulary modules

A DITA element type or attribute is declared in exactly one vocabulary module.

The following terminology is used to refer to DITA vocabulary modules:

structural module
A vocabulary module that defines a top-level map or topic type.
element domain module
A vocabulary module that defines one or more specialized element types that can be integrated into maps or topics.
attribute domain module
A vocabulary module that defines exactly one specialization of either the @base or @props attribute.

For structural types, the module name is typically the same as the root element. For example, "task" is the name of the structural vocabulary module whose root element is <task>.

For element domain modules, the module name is typically a name that reflects the subject domain to which the domain applies, such as "highlight" or "software". Domain modules often have an associated short name, such as hi-d for the highlighting domain or sw-d for the software domain.

The name (or short name) of an element domain module is used to identify the module in @class attribute values. While module names need not be globally unique, module names must be unique within the scope of a given specialization hierarchy. The short name must be a valid XML name token.

Structural modules based on topic MAY define additional topic types that are then allowed to occur as subordinate topics within the top-level topic.

For example, a top-level topic type might require the use of subordinate topic types that would only ever be meaningful in the context of their containing type and thus would never be candidates for standalone authoring or aggregation using maps. In that case, the subordinate topic type can be declared in the module for the top-level topic type that uses it. However, in most cases, potential subordinate topics are best defined in their own vocabulary modules.

Domain elements intended for use in topics MUST ultimately be specialized from elements that are defined in the topic module. Domain elements intended for use in maps MUST ultimately be specialized from elements defined by or used in the map module. Maps share some element types with topics but no map-specific elements can be used within topics.

Structural modules also can define specializations of, or reuse elements from, domain or other structural modules. When this happens, the structural module becomes dependent.

Specialization rules for element types

There are certain rules that apply to element type specializations.

Characteristics

A specialized element type has the following characteristics:

  • A properly-formed @class attribute that specifies the specialization hierarchy of the element
  • A content model that is the same or less inclusive than that of the element from which it was specialized
  • A set of attributes that are the same or a subset of those of the element from which it was specialized, except for specializations of @base or @props
  • Values or value ranges of attributes that are the same or a subset of those of the element from which it was specialized
Namespaces

DITA elements are never in a namespace. Only the @DITAArchVersion attribute is in a DITA-defined namespace. All other attributes, except for those defined by the XML standard, are in no namespace.

This limitation is imposed by the details of the @class attribute syntax, which makes it impractical to have namespace-qualified names for either vocabulary modules or individual element types or attributes. Elements included as descendants of the DITA <foreign> element type can be in any namespace.

Note (non-normative):
Domain modules that are intended for wide use should define element type names that are unlikely to conflict with names used in other domains, for example, by using a domain-specific prefix on all names.

Specialization rules for attributes

There are certain rules that apply to attribute specializations.

A specialized attribute has the following characteristics:

  • It is specialized from @props or @base.
  • It can be integrated into a document-type shell either globally, which makes it available on all elements, or it can be assigned to specific elements by using an expansion module.
  • It does not have values or value ranges that are more extensive than those of the attribute from which it was specialized.
  • Its values must be alphanumeric, space-delimited values.
  • In generalized form, the values must conform to the rules for attribute generalization.

The @class attribute rules and syntax

The specialization hierarchy of each DITA element is declared as the value of the @class attribute. The @class attribute provides a mapping from the current name of the element to its more general equivalents. The @class attribute also can provide a mapping from the current name to more specialized equivalents. All specialization-aware processing can be defined in terms of @class attribute values.

The @class attribute tells a processor what general classes of elements the current element belongs to. DITA scopes elements by module type instead of document type. Examples of module types are topic type, domain type, or map type. This enables document-type developers to combine multiple module types in a single document without complicating transformation logic.

The sequence of values in the @class attribute is important because it tells processors which value is the most general and which is most specific. This sequence is what enables both specialization aware processing and generalization.

Syntax

Values for the @class attribute have the following syntax requirements:
  • An initial "-" or "+" character followed by one or more spaces. Use "-" for element types that are defined in structural vocabulary modules, and use "+" for element types that are defined in domain modules.
  • A sequence of one or more tokens of the form "modulename/typename", with each token separated by one or more spaces, where modulename is the short name of the vocabulary module and typename is the element type name. Tokens are ordered left to right from most general to most specialized.

    These tokens provide a mapping for every structural type or domain in the ancestry of the specialized element. The specialization hierarchy for a given element type must reflect any intermediate modules between the base type and the specialization type, even those in which no element renaming occurs.

  • At least one trailing space character (" "). The trailing space ensures that string matches on the tokens can always include a leading and trailing space in order to reliably match full tokens.

Rules

Every DITA element (except the <dita> element that is used as the root of a ditabase document) MUST declare a @class attribute.

When the @class attribute is declared in an XML grammar, it MUST be declared with a default value. In order to support generalization round-tripping (generalizing specialized content into a generic form and then returning it to the specialized form) the default value MUST NOT be fixed. This allows a generalization process to overwrite the default values that are defined by a general document type with specialized values taken from the document being generalized.

A vocabulary module MUST NOT change the @class attribute for elements that it does not specialize, but simply reuses by reference from more generic levels.

Authors SHOULD NOT modify the @class attribute. The @class attribute and its value is generally not surfaced in authored DITA topics, although it might be made explicit as part of a processing operation.

Example: DTD declaration for @class attribute for the <step> element

This section is non-normative.

The following code sample lists the DTD declaration for the @class attribute for the <step> element:

<!ATTLIST step         class  CDATA "- topic/li task/step ">

This indicates that the <step> element is specialized from the <li> element in the topic module. It also indicates explicitly that the <step> element is available in a task topic. This declaration enables round-trip migration between upper level and lower level types without the loss of information.

Example: Element with @class attribute made explicit

This section is non-normative.

The following code sample shows the value of the @class attribute for the <wintitle> element:

<wintitle class="+ topic/keyword ui-d/wintitle ">A specialized keyword</wintitle>

Example: @class attribute with intermediate value

This section is non-normative.

The following code sample shows the value of a @class attribute for an element in the guiTask module, which is specialized from <task>. The element is specialized from <keyword> in the base topic vocabulary, rather than from an element in the task module:

<windowName class="- topic/keyword task/keyword guiTask/windowname ">...</windowName>

The intermediate values are necessary so that generalizing and specializing transformations can map the values simply and accurately. For example, if task/keyword was missing as a value, and a user decided to generalize this guiTask up to a task topic, then the transformation would have to guess whether to map to keyword (appropriate if task is more general than guiTask, which it is) or leave it as windowName (appropriate if task were more specialized, which it isn't). By always providing mappings for more general values, processors can then apply the simple rule that missing mappings must by default be to more specialized values than the one we are generalizing to, which means the last value in the list is appropriate. For example, when generalizing <guitask> to <task>, if a <p> element has no target value for <task>, we can safely assume that <p> does not specialize from <task> and does not need to be generalized.

The @specializations attribute rules and syntax

The @specializations attribute enables processors to determine what attribute specializations are available in a document. The attribute is declared on the root element for each topic or map type. Each attribute domain defines a token to declare the extension. The effective value of the @specializations attribute is composed of these tokens.

Syntax and rules

The @props and @base attributes are the only two core attributes available for specialization.

Each specialization of the @props and @base attributes MUST provide a token for use by the @specializations attribute.

The @specializations token for an attribute specialization begins with either @props or @base followed by a slash, followed by the name of the new attribute:

'@', props-or-base, ('/', attname)+
For example:
  • If @props is specialized to create @myNewProp, this results in the following token: @props/myNewProp
  • If @base is specialized to create @myFirstBase, this results in the following token: @base/myFirstBase
  • If that specialized attribute @myFirstBase is further specialized to create @mySecondBase, this results in the following token: @base/myFirstBase/mySecondBase

Note that the value for the @specializations attribute is not authored. Instead, the value is defaulted based on the modules that are included in the document type shell.

Example: @specializations attribute for a task with multiple domains

This section is non-normative.

In this example, a document-type shell integrates the task structural module and the following domain modules:

Domain Domain short name
User interface ui-d
Software sw-d
@deliveryTarget attribute deliveryTarget
@platform attribute platform
@product attribute product

The value of the @specializations attribute includes one value from each attribute module. The effective value is the following:

specializations="@props/deliveryTarget @props/platform @props/product"

If the document-type shell also used a specialization of the @platform attribute that describes the hardware platform, the new @hardwarePlatform attribute domain would add an additional value to the @specializations attribute:

specializations="@props/deliveryTarget @props/platform @props/platform/hardwarePlatform @props/product"

Specializing to include non-DITA content

You can extend DITA to incorporate standard vocabularies for non-textual content, such as MathML and SVG, as markup within DITA documents. This is done by specializing the <foreign> element.

There are three methods of incorporating foreign content into DITA.

  • A domain specialization of the <foreign> element. This is the usual implementation.
  • A structural specialization using the <foreign> element. This affords more control over the content model.
  • Directly embedding the non-DITA content within <foreign> element. If the non-DITA content has interoperability or vocabulary naming issues such as those that are addressed by specialization in DITA, they must be addressed by means that are appropriate to the non-DITA content.

Do not use <foreign> element to include textual content or metadata in DITA documents.

Example: Creating an element domain specialization for SVG

This section is non-normative.

The following code sample, which is from the svgDomain.ent file, shows the domain declaration for the SVG domain.

<!-- ============================================================= -->
<!--                   SVG DOMAIN ENTITIES                         -->
<!-- ============================================================= -->

<!-- SVG elements must be prefixed, otherwise they conflict with
     existing DITA elements (e.g., <desc> and <title>.
  -->
<!ENTITY % NS.prefixed "INCLUDE" >
<!ENTITY % SVG.prefix "svg" >

<!ENTITY % svg-d-foreign
   "svg-container
   "
>

Note that the SVG-specific %SVG.prefix; parameter entity is declared. This establishes the default namespace prefix to be used for the SVG content embedded with this domain. The namespace can be overridden in a document-type shell by declaring the parameter entity before the reference to the svgDomain.ent file. Other foreign domains might need similar entities when required by the new vocabulary.

For more information, see the svgDomain.mod file that is shipped with the DITA Technical Content edition. For an example of including the SVG domain in a document-type shell, see task.dtd.

Sharing elements across specializations

Specialization enables reuse of elements from ancestor specializations. However, it is also possible to reuse elements from non-ancestor specializations.

A structural specialization can incorporate elements from unrelated domains or other structural specializations by referencing them in the content model of a specialized element. The elements included in this manner must be specialized from ancestor content that is valid in the new context. If the reusing and reused specializations share common ancestry, the reused elements must be valid in the reusing context at every level they share in common.

Although a well-designed structural specialization hierarchy with controlled use of domains is still the primary means of sharing and reusing elements in DITA, the ability to also share elements declared elsewhere in the hierarchy allows for situations where relevant markup comes from multiple sources and would otherwise be developed redundantly.

Example: A specialization of <concept> reuses an element from the task module

This section is non-normative.

A specialized concept topic could declare a specialized <process> section that contains the <steps> element that is defined in the task module. This is possible because of the following factors:

  • The <steps> element is specialized from <ol>.
  • The <process> element is specialized from <section>, and the content model of <section> includes <ol>.

The <steps> element in <process> always can be generalized back to <ol> in <section>.

Example: A specialization of <reference> reuses an element from the programming domain

This section is non-normative.

A specialized reference topic could declare a specialized list (<apilist>) in which each <apilistitem> contains an <apiname> element that is borrowed from the programming domain.