Specialization
Specialization
The specialization feature of DITA allows for the creation of new element types and attributes that are explicitly and formally derived from existing types. This facilitates interchange of conforming DITA content and ensures a minimum level of common processing for all DITA content. It also allows specialization-aware processors to add specialization-specific processing to existing base processing.
Overview of specialization
Specialization allows information architects to define new kinds of information (new structural types or new domains of information), while reusing as much of existing design and code as possible, and minimizing or eliminating the costs of interchange, migration, and maintenance.
Specialization modules enable information architects to create new element types and attributes. These new element types and attributes are derived from existing element types and attributes.
In traditional XML applications, all semantics for a given element instance are bound to the
element type, such as <para>
for a paragraph or
<title>
for a title. The XML specification provides no built-in mechanism
for relating two element types to say "element type B is a subtype of element type A".
In contrast, the DITA specialization mechanism provides a standard mechanism for declaring that an element type or attribute is derived from an ancestor type. This means that a specialized type inherits the semantics and default processing behavior from its ancestor type. Additional processing behavior optionally can be associated with the specialized descendant type.
For example, the <section>
element type is part of
the DITA base. It represents an organizational division in a topic. Within the task information
type (itself a specialization of <topic>
), the
<section>
element type is further specialized to other element types
(such as <prereq>
and <context>
) that provide more
precise semantics about the type of organizational division that they represent. The specialized
element types inherit both semantic meaning and default processing from the ancestor
elements.
There are two types of DITA specializations:
- Structural specialization
-
Structural specializations are developed from either topic or map types. Structural specializations enable information architects to add new document types to DITA. The structures defined in the new document types either directly use, or inherit from, elements found in other document types. For example, concept, task, and reference are specialized from topic, and bookmap is specialized from map.
- Domain specialization
- Domain specializations are developed from elements defined within topic or map, or from the
@props
or@base
attributes. They define markup for a specific information domain or subject area. Domain specializations can be added to document-type shells.
Each type of specialization module represents an is a
hierarchy, in object-oriented terms,
with each structural type or domain being a subclass of its parent. For example, a specialization
of task is still a task, and a specialization of the user interface domain is still part of the
user interface domain. A given domain can be used with any map or topic type. In addition,
specific structural types might require the use of specific domains.
Use specialization when you need a new structural type or domain. Specialization is appropriate in the following circumstances:
- You need to create markup to represent new semantics (meaningful categories of information). This might enable you to have increased consistency or descriptiveness in your content model.
- You have specific needs for output processing and formatting that cannot be addressed using the current content model.
Do not use specialization to simply eliminate element types from specific content models. Use constraint modules to restrict content models and attribute lists without changing semantics.
Modularization
Modularization is at the core of DITA design and implementation. It enables reuse and extension of the DITA specialization hierarchy.
The DITA XML grammar files are a set of module files that declare the markup and entities that are required for each specialization. The document-type shell then integrates the modules that are needed for a particular authoring and publishing context.
Because all the pieces are modular, the task of developing a new information type or domain is simplified. An information architect can start with existing base types (topic or map)—or with an existing specialization if it comes close to matching their business requirements—and only develop an extension that adds the extra semantics or functionality that is required. A specialization reuses elements from ancestor modules, but it only needs to declare the elements and attributes that are unique to the specialization. This saves considerable time and effort, reduces error, enforces consistency, and makes interoperability possible.
Because all the pieces are modular, it is simpler to reuse different modules in different contexts.
For example, a company that produces machines can use the hazard statement domain, while a company that produces software can use the software, user interface, and programming domains. A company that produces health information for consumers can avoid using the standard domains. Instead, it develops a new domain that contains the elements necessary for capturing and tracking the comments made by medical professionals who review information for accuracy and completeness.
Because all the pieces are modular, new modules can be created and put into use without affecting existing document-type shells.
For example, a marketing division of a company can develop a new specialization for message campaigns and have their content authors begin using that specialization, without affecting any of the other information types that they have in place.
Vocabulary modules
A DITA element type or attribute is declared in exactly one vocabulary module.
The following terminology is used to refer to DITA vocabulary modules:
- structural module
- A vocabulary module that defines a top-level map or topic type.
- element domain module
- A vocabulary module that defines one or more specialized element types that can be integrated into maps or topics.
- attribute domain module
- A vocabulary module that defines exactly one specialization of either the
@base
or@props
attribute.
For structural types, the module name is typically the same as the root element. For example,
"task" is the name of the structural vocabulary module whose root element is
<task>
.
For element domain modules, the module name is typically a name that reflects the subject domain to which the domain applies, such as "highlight" or "software". Domain modules often have an associated short name, such as hi-d for the highlighting domain or sw-d for the software domain.
The name (or short name) of an element domain module is used to identify the module in
@class
attribute values. While module names need not be globally unique,
module names must be unique within the scope of a given specialization hierarchy. The short
name must be a valid XML name token.
Structural modules based on topic MAY define additional topic types that are then allowed to occur as subordinate topics within the top-level topic.
For example, a top-level topic type might require the use of subordinate topic types that would only ever be meaningful in the context of their containing type and thus would never be candidates for standalone authoring or aggregation using maps. In that case, the subordinate topic type can be declared in the module for the top-level topic type that uses it. However, in most cases, potential subordinate topics are best defined in their own vocabulary modules.
Domain elements intended for use in topics MUST ultimately be specialized from elements that are defined in the topic module. Domain elements intended for use in maps MUST ultimately be specialized from elements defined by or used in the map module. Maps share some element types with topics but no map-specific elements can be used within topics.
Structural modules also can define specializations of, or reuse elements from, domain or other structural modules. When this happens, the structural module becomes dependent.
Specialization rules for element types
There are certain rules that apply to element type specializations.
- Characteristics
-
A specialized element type has the following characteristics:
- A properly-formed
@class
attribute that specifies the specialization hierarchy of the element - A content model that is the same or less inclusive than that of the element from which it was specialized
- A set of attributes that are the same or a subset of those
of the element from which it was specialized,
except for specializations of
@base
or@props
- Values or value ranges of attributes that are the same or a subset of those of the element from which it was specialized
- A properly-formed
- Namespaces
-
DITA elements are never in a namespace. Only the
@DITAArchVersion
attribute is in a DITA-defined namespace. All other attributes, except for those defined by the XML standard, are in no namespace.This limitation is imposed by the details of the
@class
attribute syntax, which makes it impractical to have namespace-qualified names for either vocabulary modules or individual element types or attributes. Elements included as descendants of the DITA<foreign>
element type can be in any namespace.
Specialization rules for attributes
There are certain rules that apply to attribute specializations.
A specialized attribute has the following characteristics:
- It is specialized from
@props
or@base
. - It can be integrated into a document-type shell either globally, which makes it available on all elements, or it can be assigned to specific elements by using an expansion module.
- It does not have values or value ranges that are more extensive than those of the attribute from which it was specialized.
- Its values must be alphanumeric, space-delimited values.
- In generalized form, the values must conform to the rules for attribute generalization.
The @class
attribute rules and syntax
The specialization hierarchy of each DITA element is declared
as the value of the @class
attribute. The
@class
attribute provides a mapping from the current
name of the element to its more general equivalents.
The @class
attribute also can provide a mapping
from the current name to more specialized equivalents. All
specialization-aware processing can be defined in terms of
@class
attribute values.
The @class
attribute tells a processor what general
classes of elements the current element belongs to. DITA scopes
elements by module type instead of document type.
Examples of module types are topic type, domain type, or map type.
This enables document-type developers to combine multiple
module types in a single document without complicating transformation
logic.
The sequence of values in the @class
attribute is important because it tells
processors which value is the most general and which is most specific. This sequence is what
enables both specialization aware processing and generalization.
Syntax
@class
attribute have the following syntax requirements:- An initial "-" or "+" character followed by one or more spaces. Use "-" for element types that are defined in structural vocabulary modules, and use "+" for element types that are defined in domain modules.
- A sequence of one or more tokens of the form
"modulename/typename",
with each token separated by one or more spaces, where modulename is the short name of the vocabulary module and typename is the element type name. Tokens are ordered left to right from most general to most specialized.These tokens provide a mapping for every structural type or domain in the ancestry of the specialized element. The specialization hierarchy for a given element type must reflect any intermediate modules between the base type and the specialization type, even those in which no element renaming occurs.
- At least one trailing space character (" "). The trailing space ensures that string matches on the tokens can always include a leading and trailing space in order to reliably match full tokens.
Rules
Every DITA element (except the <dita>
element that is used as the
root of a ditabase document) MUST declare a
@class
attribute.
When the @class
attribute is declared in an XML grammar, it MUST be declared with a default value. In order to support
generalization round-tripping (generalizing specialized content into a generic form and then
returning it to the specialized form) the default value MUST
NOT be fixed. This allows a generalization process to overwrite the default values
that are defined by a general document type with specialized values taken from the document
being generalized.
A vocabulary module MUST NOT change the
@class
attribute for elements that it does not specialize, but simply
reuses by reference from more generic levels.
Authors SHOULD NOT modify the
@class
attribute. The @class
attribute and its value is generally not surfaced in authored DITA
topics, although it might be made explicit as part of a processing
operation.
Example: DTD declaration for @class
attribute for the
<step>
element
This section is non-normative.
The following code sample lists the DTD declaration for the @class
attribute for the <step>
element:
<!ATTLIST step class CDATA "- topic/li task/step ">
This indicates that the <step>
element is
specialized from the <li>
element in the topic module. It also indicates
explicitly that the <step>
element is
available in a task topic. This declaration
enables round-trip migration between upper level and lower level
types without the loss of information.
Example: Element with @class
attribute made explicit
This section is non-normative.
The following code sample shows the value of the @class
attribute for the
<wintitle>
element:
<wintitle class="+ topic/keyword ui-d/wintitle ">A specialized keyword</wintitle>
Example: @class
attribute with intermediate value
This section is non-normative.
The following code sample shows the value of a @class
attribute for an
element in the guiTask module, which is specialized from <task>
. The
element is specialized from <keyword>
in the base topic vocabulary,
rather than from an element in the task module:
<windowName class="- topic/keyword task/keyword guiTask/windowname ">...</windowName>
The intermediate values are necessary so that generalizing and specializing transformations
can map the values simply and accurately. For example, if task/keyword
was
missing as a value, and a user decided to generalize this guiTask up to a task topic, then
the transformation would have to guess whether to map to keyword (appropriate if task is
more general than guiTask, which it is) or leave it as windowName (appropriate if task were
more specialized, which it isn't). By always providing mappings for more general values,
processors can then apply the simple rule that missing mappings must by default be to more
specialized values than the one we are generalizing to, which means the last value in the
list is appropriate. For example, when generalizing <guitask>
to
<task>
, if a <p>
element has no target value
for <task>
, we can safely assume that <p>
does
not specialize from <task>
and does not need to be generalized.
The @specializations
attribute rules and
syntax
The @specializations
attribute enables
processors to determine what attribute specializations are available in
a document. The attribute is declared on the root element for each
topic or map type. Each attribute domain defines a token to declare the
extension. The effective value of the
@specializations
attribute is composed of these
tokens.
Syntax and rules
The @props
and @base
attributes are the only two core
attributes available for specialization.
Each specialization of the @props
and @base
attributes
MUST provide a token for use by the
@specializations
attribute.
The @specializations
token for an attribute specialization begins with
either @props
or @base
followed by a slash, followed by the
name of the new attribute:
'@', props-or-base, ('/', attname)+
- If
@props
is specialized to create@myNewProp
, this results in the following token:@props/myNewProp
- If
@base
is specialized to create@myFirstBase
, this results in the following token:@base/myFirstBase
- If that specialized attribute
@myFirstBase
is further specialized to create@mySecondBase
, this results in the following token:@base/myFirstBase/mySecondBase
Note that the value for the @specializations
attribute is not authored. Instead, the value is defaulted based on
the modules that are included in the document type shell.
Example: @specializations attribute for
a task with multiple
domains
This section is non-normative.
In this example, a document-type shell integrates the task structural module and the following domain modules:
Domain | Domain short name |
---|---|
User interface | ui-d |
Software | sw-d |
@deliveryTarget attribute |
deliveryTarget |
@platform attribute |
platform |
@product attribute |
product |
The value of the @specializations
attribute
includes one value from each attribute module.
The effective value is the following:
specializations="@props/deliveryTarget @props/platform @props/product"
If the document-type shell also used a specialization of the @platform
attribute that describes the hardware platform, the new @hardwarePlatform
attribute domain would add an additional value to the @specializations
attribute:
specializations="@props/deliveryTarget @props/platform @props/platform/hardwarePlatform @props/product"
Specializing to include non-DITA content
You can extend DITA to incorporate standard vocabularies for non-textual content, such
as MathML and SVG, as markup within DITA documents. This is done by specializing the
<foreign>
element.
There are three methods of incorporating foreign content into DITA.
- A domain specialization of the
<foreign>
element. This is the usual implementation. - A structural specialization using the
<foreign>
element. This affords more control over the content model. - Directly embedding the non-DITA content within
<foreign>
element. If the non-DITA content has interoperability or vocabulary naming issues such as those that are addressed by specialization in DITA, they must be addressed by means that are appropriate to the non-DITA content.
Do not use <foreign>
element to include textual content or metadata
in DITA
documents.
Example: Creating an element domain specialization for SVG
This section is non-normative.
The following code sample, which is from the svgDomain.ent file, shows the domain declaration for the SVG domain.
<!-- ============================================================= -->
<!-- SVG DOMAIN ENTITIES -->
<!-- ============================================================= -->
<!-- SVG elements must be prefixed, otherwise they conflict with
existing DITA elements (e.g., <desc> and <title>.
-->
<!ENTITY % NS.prefixed "INCLUDE" >
<!ENTITY % SVG.prefix "svg" >
<!ENTITY % svg-d-foreign
"svg-container
"
>
Note that the SVG-specific %SVG.prefix;
parameter entity
is declared. This establishes the default namespace prefix to be used for the SVG content
embedded with this domain. The namespace can be overridden in a document-type shell by
declaring the parameter entity before the reference to the
svgDomain.ent file. Other foreign domains might need similar entities when required by the new vocabulary.
For more information, see the svgDomain.mod file that is shipped with the DITA Technical Content edition. For an example of including the SVG domain in a document-type shell, see task.dtd.
Sharing elements across specializations
Specialization enables reuse of elements from ancestor specializations. However, it is also possible to reuse elements from non-ancestor specializations.
A structural specialization can incorporate elements from unrelated domains or other structural specializations by referencing them in the content model of a specialized element. The elements included in this manner must be specialized from ancestor content that is valid in the new context. If the reusing and reused specializations share common ancestry, the reused elements must be valid in the reusing context at every level they share in common.
Although a well-designed structural specialization hierarchy with controlled use of domains is still the primary means of sharing and reusing elements in DITA, the ability to also share elements declared elsewhere in the hierarchy allows for situations where relevant markup comes from multiple sources and would otherwise be developed redundantly.
Example: A specialization of <concept>
reuses an element from
the task module
This section is non-normative.
A specialized concept topic could declare a specialized <process>
section that contains the <steps>
element that is defined in the task
module. This is possible because of the following factors:
- The
<steps>
element is specialized from<ol>
. - The
<process>
element is specialized from<section>
, and the content model of<section>
includes<ol>
.
The <steps>
element in <process>
always can
be generalized back to <ol>
in <section>
.
Example: A specialization of <reference>
reuses an element from
the programming domain
This section is non-normative.
A specialized reference topic could declare a specialized list
(<apilist>
) in which each <apilistitem>
contains an <apiname>
element that is borrowed from the programming
domain.