Generalization

Generalization is the process of reversing a specialization. It converts specialized elements or attributes into the original types from which they were derived.

Overview of generalization

Specialized content can be generalized to any ancestor type. The generalization process can preserve information about the former level of specialization to allow round-tripping between specialized and unspecialized forms of the same content.

All DITA documents contain a mix of markup from at least one structural type and zero or more domains. When generalizing the document, any individual structural type or domain can be left as-is, or it can be generalized to any of its ancestors. If the document will be edited or processed in generalized form, it might be necessary to have a document-type shell that includes all non-generalized modules from the original document-type shell.

Generalization serves several purposes:

  • It can be used to migrate content. For example, if a specialization is unsuccessful or is no longer needed, the content can be generalized back to a less specialized form.
  • It can be used for temporary round-tripping. For example, if content is shared with a process that is not specialization aware, it can be temporarily generalized for that process and then returned to specialized form.
  • It can allow reuse of specialized content in an environment that does not support the specialization. Similar to round-tripping, content can be generalized for sharing, without the need to re-specialize.

When generalizing for migration, the @class attribute and @specializations attribute need to be absent from the generalized instance document, so that the default values in the document-type shell are used.

When generalizing for round-tripping, the @class attribute and @specializations attribute SHOULD retain the original specialized values in the generalized instance document.

Note that when using constraints, a document instance can always be converted from a constrained document type to an unconstrained document type merely by switching the binding of the document instance to the less restricted document-type shell. No renaming of elements is needed to remove constraints.

However, a document whose document-type shell uses expansion modules might not be interchangeable without first generalizing the element and attribute types that were introduced by the expansion modules.

Element generalization

Elements are generalized by examining the @class attribute. When a generalization process detects that an element belongs to one of the modules that is being generalized, the element is renamed to a more general form.

For example, the <step> element has a @class attribute value of "- topic/li task/step ". If the task module is generalized, the <step> element is renamed to its more general form from the topic module: <li>.

For specific concerns when generalizing structural types with dependencies on non-ancestor modules, see Generalization with cross-specialization dependencies.

While the tag name of a given element is normally the same as the type name of the last token in the @class value, this is not required. For example, if a generalization process has already run on the element, the @class attribute could contain tokens from two or more modules based on the original specialization. In that case, the element name could already match the first token or an intermediate token in the @class attribute. A second generalization process could end up renaming the element again or could leave it alone, depending on the target module or document type.

Processor expectations when generalizing elements

Generalization processors convert elements from one or more modules into their less specialized form. The list of modules can be supplied to a generalization processor, or it can be inferred based on knowledge of a target document-type shell.

The person or application initiating a generalization process can supply the source and target modules for each generalization, for example, "generalize from reference to topic". Multiple target modules can be specified, for example, "generalize from reference to topic and from user-interface domain to topic". When the source and target modules are not supplied, the generalization process is assumed to be from all structural types to the base (topic or map), and no generalization is performed for domains.

The person or application initiating a generalization process also can supply the target document-type shell. When the target document-type shell is not supplied, the generalized document will not contain a reference to a document-type shell.

A generalization processor SHOULD be able to handle cases where it is given:
  • Only source modules for generalization (in which case the designated source types are generalized to topic or map)
  • Only target modules for generalization (in which case all descendants of each target are generalized to that target)
  • Both (in which case only the specified descendants of each target are generalized to that target)

For each structural type instance, the generalization processor checks whether the structural type instance is a candidate for generalization, or whether it has domains that are candidates for generalization. It is important to be selective about which structural type instances to process; if the process simply generalizes every element based on its @class attribute values, an instruction to generalize "reference" to "topic" could leave a specialization of reference with an invalid content model, since any elements it reuses from "reference" would have been renamed to topic-level equivalents.

The @class attribute for the root element of the structural type is checked before generalizing structural types:

Source module unspecified Source module specified
Target module unspecified Generalize this structural type to its base ancestor Check whether the root element of the topic type matches a specified source module; generalize to its base ancestor if it does, otherwise ignore the structural type instance unless it has domains to generalize.
Target module specified Check whether the @class attribute contains the target module. If it does contain the target, rename the element to the value associated with the target module. Otherwise, ignore the element. It is an error if the root element matches a specified source but its @class attribute does not contain the target. If the root element matches a specified source module and its @class attribute does contain the target module, generalize to the target module. Otherwise, ignore the structural type instance unless it has domains to generalize.

For each element in a candidate structural type instance:

Source module unspecified Source module specified
Target module unspecified If the @class attribute starts with "-" (part of a structural type), rename the element to its base ancestor equivalent. Otherwise ignore it. Check whether the last value of the @class attribute matches a specified source; generalize to its base ancestor if it does, otherwise ignore the element.
Target module specified Check whether the @class attribute contains the target module; rename the element to the value associated with the target module if it does contain the target, otherwise ignore the element. It is an error if the last value in the @class attribute matches a specified source but the previous values do not include the target. If the last value in the @class attribute matches a specified source module and the previous values do include the target module, rename the element to the value associated with the target module. Otherwise, ignore the element.

When renaming elements during round-trip generalization, the generalization processor SHOULD preserve the values of all attributes. When renaming elements during one-way or migration generalization, the process SHOULD preserve the values of all attributes except the @class attribute, which is supplied by the target document type.

Attribute generalization

DITA provides a syntax to generalize attributes that have been specialized from the @props or @base attribute.

Specialization-aware processors MUST process both the specialized and generalized forms of an attribute as equivalent in their values.

When a specialized attribute is generalized to an ancestor attribute, the value of the ancestor attribute consists of the name of the specialized attribute followed by its specialized value in parentheses.

For example, if @jobrole is an attribute specialized from @person, which in turn is specialized from @props:

  • jobrole="programmer" can be generalized to person="jobrole(programmer)" or to props="jobrole(programmer)"
  • props="jobrole(programmer)" can be respecialized to person="jobrole(programmer)" or to jobrole="programmer"

In this example, processors performing generalization and respecialization can use the @specializations attribute to determine the ancestry of the specialized @jobrole attribute, and therefore the validity of the specialized @person attribute as an intermediate target for generalization.

If more than one attribute is generalized, the value of each is separately represented in this way in the value of the ancestor attribute.

Generalized attributes are typically not expected to be authored or edited directly. They are used by processors to preserve the values of the specialized attributes during the time or in the circumstances in which the document is in a generalized form.

A single element MUST NOT contain both generalized and specialized values for the same attribute.

For example, the following <p> element provides two values for the @jobrole attribute, one in a generalized syntax and the other in a specialized syntax:

<p person="jobrole(programmer)" jobrole="admin">
    <!-- ... -->
</p>

This is an error condition, since it means the document has been only partially generalized, or that the document has been generalized and then edited using a specialized document type.

Generalization with cross-specialization dependencies

Dependencies across specializations limit generalization targets to those that either preserve the dependency or eliminate them. Some generalization targets will not be valid and need to be detected before generalization occurs.

When a structural specialization has a dependency on a domain specialization, then the domain cannot be generalized without also generalizing the reusing structural specialization.

For example, a structural specialization <codeConcept> might incorporate and require the <codeblock> element from the programming domain. A generalization process that turns programming domain elements back into topic elements would convert <codeblock> to <pre>, making a document that uses <codeConcept> invalid. However, codeConcept<> could be generalized to concept or topic, without generalizing programming domain elements, as long as the target document type includes the programming domain.

When a structural specialization has a dependency on another structural specialization, then both must be generalized together to a common ancestor.

For example, if the task elements in checklist were generalized without also generalizing checklist elements, then the checklist content models that referenced task elements would be broken. And if the checklist elements were generalized to topic without also generalizing the task elements, then the task elements would be out of place, since they cannot be validly present in topic. However, checklist and task can be generalized together to any ancestor they have in common: in this case topic.

When possible, generalizing processes SHOULD detect invalid generalization target combinations and report them as errors.