Indexes
Indexes
Processors can generate indexes from the content of indexing elements.
Index overview
DITA provides several elements to enable indexing. Whether and how an index is rendered will vary based on implementation decisions and rendering formats.
Here are some definitions:
- An index is a mapping from
<indexterm>
elements to locations in the DITA content. - A generated index is a mapping of index terms to rendered locations.
While DITA provides several elements that support indexing, how those elements are used will vary by implementation.
- A publishing format like PDF might use a back-of-the-book style index with page numbers, which typically involves merging index elements and generating page numbers.
- Another publishing format might have no rendered index, but it would instead use the content of index elements to help weight search results.
- Some implementations might choose to supplement a generated index
with additional content, such as treating a specialized
<keyword>
element as both normal content and an index entry. - Implementations might have different ways to render indexing edge cases, based on either implementation capabilities or style preferences.
While DITA defines markup for indexing and specifies exactly the
point to which an <indexterm>
refers, it
cannot force DITA documents to use consistent patterns that work for
all formats. Implementations should
consider edge cases and how to treat them.
- Index processors typically ignore leading and trailing whitespace characters.
- Processors might want to treat two entries separately if they are defined with different capitalization.
- Processors need to determine how to handle nested markup, such
as an
<keyword>
element that is located within an<indexterm>
element. - Because
<index-see>
is used to refer to a term that is used instead of the current entry, processors should consider how to handle a case where an index term is used both as a page locator and with an<index-see>
element for redirection. - Similarly, processors should consider how to handle the case where an index term is
defined with both an
<index-see>
and an<index-see-also>
element.
Index elements
The contents of
<indexterm>
elements provides the text for the
entries in an index.
<indexterm>
elements can be nested to create
additional levels of indexing, such as
secondary and tertiary index entries.
The following elements contain information that processors can use to generate indexes:
<indexterm>
- Defines a term that can
contribute to an index.
Matching values of
@start
and@end
attributes on<indexterm>
elements can specify an index range. <index-see>
- Defines a term to use as a see reference. See references direct a reader to the preferred term.
<index-see-also>
- Defines a term to use as a see also reference. See also references direct a reader to an alternate index entry for additional information.
How the index elements are combined, the location of
<indexterm>
elements, and the hierarchy of
the DITA maps all affect how the index
elements are processed and the index entries
that are generated.
Location of <indexterm>
elements
<indexterm>
elements can occur in topic prologs, anywhere else in DITA topics, and in DITA maps.
The location of an
<indexterm>
element determines the point in
the document that the element references.
- Topic prologs
- An
<indexterm>
element that is located in a topic prolog is a point reference to the title of the topic. If an<indexterm>
element has an@end
attribute, it is a point reference to the end of the topic, including any sub-topics. - Anywhere else in a DITA topic
- An
<indexterm>
element that is located in a topic (and not in the topic prolog) is a point reference to the location where the<indexterm>
element occurs. - DITA maps
- An
<indexterm>
element that is contained within a<topicref>
element is a point reference to the title of the topic. If an<indexterm>
element has an@end
attribute, it is a point reference to the end of the branch that is specified by the topic reference. If the topic reference is not bound to a resource, the<indexterm>
element has no stated purpose.
Index locators
An <indexterm>
element binds the content of the element, typically a
term, to a specific location in a document.
The nesting of <indexterm>
elements and
the presence of <index-see>
elements
determines whether locators are rendered in generated indexes:
- An
<indexterm>
element that does not contain child<indexterm>
elements (or an<index-see>
element) contributes a locator to the generated index entry. - An
<indexterm>
element that contains child<indexterm>
elements contributes to the hierarchy of the multilevel index entry that is generated. Only a leaf<indexterm>
element contributes a locator to the generated index entry. A leaf<indexterm>
element is an<indexterm>
element that does not contain any other<indexterm>
elements. - If an
<indexterm>
element also contains one or more<index-see>
elements, no locator is included in the generated index entry. - If an
<indexterm>
element also contains one or more<index-see-also>
elements, the<indexterm>
element contributes a locator to the generated index entry, and<index-see-also>
element provides only a redirection.
Index redirection
The <index-see>
and
<index-see-also>
elements enable redirection
to other index entries within a generated
index.
The <index-see>
element contains text for an index entry that the
reader should use instead of the current one, whereas the
<index-see-also>
element contains text for an index entry
that the reader should use in addition to the current one.
Index ranges
Authors can use the @start
and
@end
attributes on a pair
of
<indexterm>
elements to index an extended discussion. The generated index entry reflects the span
between the two <indexterm>
elements.
The start of an index range is indicated by an
<indexterm>
with a @start
attribute. This is called a start-of-range element.
The end of an index range is indicated by an
<indexterm>
element with an
@end
attribute with a value that matches the
@start
attribute on the start element. This is called an end-of-range element.
End-of-range element should contain no
content or nested elements.
The start-of-range and end-of-range elements
must be leaf <indexterm>
elements. If part of
a multilevel index entry, the start-of-range and end-of-range
elements must be at the same level of the hierarchy.
The location of the
<indexterm>
elements determines how the range is defined:
- Topic body
- The start-of-range and end-of-range elements are in the body of the same DITA topic. The range is defined as between two point references in the DITA topic. If an end-of-range element does not exist within the same topic body, the start-of-range element is treated as a point reference rather than as the start of a range.
- Topic prolog
- The start-of-range and end-of-range elements are in the prolog of the same DITA topic. The range is defined as being between the title of the DITA topic and the end of the last nested topic. If an end-of-range element does not exist within the topic prolog, the start-of-range element is treated as a point reference rather than as the start of a range.
- DITA map
- The start-of-range and end-of-range elements are contained within topic references in the same DITA map. If an end-of-range element does not exist within the same map, the start-of-range element is treated as a point reference rather than as the start of a range.
- Match
@start
and@end
attributes by a character-by-character comparison with all characters significant and no case folding occurring. - Ignore
@start
and@end
attributes if they occur on an<indexterm>
element that has child<indexterm>
elements. - Handle an end-of-range
<indexterm>
element that is nested within one or more<indexterm>
elements. The end-of-range<indexterm>
element should have no content of its own; if it contains content, that content is ignored. - When index ranges with the same identifier overlap, the effective range is determined by matching the earliest start-of-range element from the set of overlapping ranges with the latest end-of-range element from the set of overlapping ranges.
- An unmatched start-of-range element is
treated as a simple
<indexterm>
element. - Ignore unmatched end-of-range
<indexterm>
elements.
Index sorting
The combination of an <indexterm>
and a <sort-as>
element specifies a sort phrase
under which an index entry is grouped or
sorted.
This gives an author the flexibility to sort an index entry in
an index differently from how its text normally would be grouped or sorted. The common use for
this scenario is to disregard
insignificant leading text, such as punctuation or words like "the"
or "a". For example, the author might want
<data>
to be sorted under the letter D
rather than the left angle bracket (<). An author might want to
include such an entry under both the punctuation heading and the
letter D, in which case there can be two index entries differentiated only by the sort-as value.
Certain languages have special sort order needs. For example, Japanese index entries might be written partially or wholly in kanji, but need to be sorted in phonetic order according to its hiragana/katakana rendition. There is no reliable automated way to map written to phonetic text: for kanji text, there can be multiple phonetic possibilities depending on the context. The only way to correctly sort Japanese index entries is to keep the phonetic counterparts with the written forms. The phonetic text would be presented as the sort-as value for indexing purposes.
Examples of indexing
This section is non-normative.
This section contains examples and scenarios that illustrate the use and processing of indexing elements.
Example: Index range defined in a single topic
This section is non-normative.
In this scenario, an index range is defined directly in the body of a topic.
In the following code sample, the index range begins at the start of the second paragraph and continues to the beginning of the last paragraph.
<topic id="accounting">
<title>Accounting regulations</title>
<body>
<p>Be ethical in your accounting.</p>
<p><indexterm start="acctrules">rules</indexterm>Remember to do all of the following: ...</p>
<!-- ...pages worth of rules... -->
<p><indexterm end="acctrules"/>Failure to comply will get you audited.</p>
</body>
<!-- Potential sub-topics -->
</topic>
Example: Index range defined in a topic prolog
This section is non-normative.
In this scenario, an index range is defined in the topic prolog. Ranges defined in a prolog cover subtopics, including those nested based on a map.
Specifying an index range in a topic prolog is useful for defining an index range that contains a topic and its children.
Consider the following DITA map which contains topics about a small company's operating procedures. The map contains a topic about accounting (acct.dita), which has child topics: procedures.dita and forms.dita.
<map>
<title>Company procedures</title>
<topicref href="acct.dita">
<topicref href="procedures.dita"/>
<topicref href="forms.dita"/>
</topicref>
<!-- ... -->
</map>
The information developer wants an index entry that will span acct.dita and its children. They use the following markup in acct.dita:
<topic id="accounting-at-acme">
<title>Accounting at Acme</title>
<prolog>
<metadata>
<keywords>
<indexterm start="acct">accounting</indexterm>
<indexterm end="acct"/>
</keywords>
</metadata>
</prolog>
<!-- ... -->
</topic>
This markup specifies that the index range begins with the start of the topic title, and the end of the range is the end of the forms.dita topic. The index range includes the "Accounting at Acme" topic and its two child topics.
Example: Index range defined in a map
This section is non-normative.
In this scenario, an index range is defined in the DITA map. Ranges defined in a DITA map can span topics.
Consider the following DITA map:
<map>
<title>Food available in the Acme cafeteria</title>
<!-- ... -->
<topicref href="apples.dita">
<topicmeta>
<keywords>
<indexterm start="acme-fruit">fruit</indexterm>
</keywords>
</topicmeta>
</topicref>
<topicref href="oranges.dita"/>
<topicref href="pineapples.dita">
<topicmeta>
<keywords>
<indexterm end="acme-fruit"/>
</keywords>
</topicmeta>
</topicref>
<!-- ... -->
</map>
The index range begins with the start of the first topic title in apples.dita, and it continues until the end of the last element in pineapples.dita.