Operations › DITA Sample

Incident Management Playbook

A fully structured DITA publication demonstrating all three core topic types: concept, task, and reference. Each section shows the XML source alongside its rendered output, so you can see how structured authoring produces clean, consistent, reusable documentation at scale.

DITA 1.3 Concept + Task + Reference Side-by-Side View Audience: Engineers, Ops Teams, Incident Responders
← Back to Samples

What you are looking at

DITA (Darwin Information Typing Architecture) is an XML-based standard for structured content authoring. Content is written in typed, self-contained topics, each serving a single purpose: explaining a concept, walking through a task, or providing reference data. A ditamap assembles those topics into a publication without duplicating content. The result is documentation that can be reused, repurposed, and published to multiple formats from a single source.

ditamap — publication structure
<!-- incident-management-playbook.ditamap -->
<map>
  <title>Incident Management Playbook</title>

  <topicref href="concept-what-is-an-incident.dita" type="concept"/>
  <topicref href="task-respond-to-incident.dita"   type="task"/>
  <topicref href="ref-escalation-matrix.dita"     type="reference"/>

</map>
concept

What Is an Incident

The concept topic establishes shared definitions. It answers "what is this?" before any procedure begins. In DITA, concept topics use <conbody> for free-form explanation and often include tables, lists, or diagrams that define terms or categories.

XML source — concept-what-is-an-incident.dita
<concept id="concept-incident-definition">

  <title>What Is an Incident</title>

  <shortdesc>
    An incident is any unplanned disruption or degradation
    of a service that affects end users or system stability.
  </shortdesc>

  <conbody>

    <p>
      Incidents differ from service requests and problems.
      A service request is a planned interaction. A problem
      is the underlying cause of one or more incidents.
      An incident is the immediate, user-visible symptom.
    </p>

    <p>
      All incidents are assigned a severity level at intake.
      Severity determines response time, escalation path,
      and communication requirements.
    </p>

    <table>
      <title>Severity Definitions</title>
      <tgroup cols="3">
        <colspec colname="level"/>
        <colspec colname="label"/>
        <colspec colname="definition"/>
        <thead>
          <row>
            <entry>Level</entry>
            <entry>Label</entry>
            <entry>Definition</entry>
          </row>
        </thead>
        <tbody>
          <row>
            <entry>SEV-1</entry>
            <entry>Critical</entry>
            <entry>Full outage. All users affected.</entry>
          </row>
          <row>
            <entry>SEV-2</entry>
            <entry>Major</entry>
            <entry>Core function impaired. Workaround unavailable.</entry>
          </row>
          <row>
            <entry>SEV-3</entry>
            <entry>Moderate</entry>
            <entry>Partial degradation. Workaround available.</entry>
          </row>
          <row>
            <entry>SEV-4</entry>
            <entry>Low</entry>
            <entry>Minor issue. No user impact at present.</entry>
          </row>
        </tbody>
      </tgroup>
    </table>

  </conbody>
</concept>
Rendered output

What Is an Incident

An incident is any unplanned disruption or degradation of a service that affects end users or system stability.

Incidents differ from service requests and problems. A service request is a planned interaction. A problem is the underlying cause of one or more incidents. An incident is the immediate, user-visible symptom.

All incidents are assigned a severity level at intake. Severity determines response time, escalation path, and communication requirements.

Severity Definitions

LevelLabelDefinition
SEV-1 Critical Full outage. All users affected.
SEV-2 Major Core function impaired. Workaround unavailable.
SEV-3 Moderate Partial degradation. Workaround available.
SEV-4 Low Minor issue. No user impact at present.
task

Respond to an Incident

Task topics are the most procedurally strict topic type in DITA. They require a defined <prereq>, a numbered <steps> block, and an explicit <result>. The structure enforces completeness: every step is discrete, every prerequisite is stated, and the expected outcome is declared before a responder begins.

XML source — task-respond-to-incident.dita
<task id="task-incident-response">

  <title>Respond to an Incident</title>

  <shortdesc>
    Use this procedure when an incident is detected
    or reported. Complete all steps in sequence.
  </shortdesc>

  <taskbody>

    <prereq>
      <p>
        You must have responder access in the incident
        management platform and be on-call for the current
        rotation before proceeding.
      </p>
    </prereq>

    <steps>

      <step>
        <cmd>Confirm the alert is a genuine incident,
        not a monitoring false positive.</cmd>
        <info>Check system dashboards before declaring.</info>
      </step>

      <step>
        <cmd>Create an incident record and assign
        a severity level based on the concept topic.</cmd>
      </step>

      <step>
        <cmd>Notify the on-call engineer and open
        a dedicated incident channel.</cmd>
        <info>
          Channel naming convention:
          <codeph>inc-YYYYMMDD-[sev]</codeph>
        </info>
      </step>

      <step>
        <cmd>Escalate using the escalation matrix
        if the incident is SEV-1 or SEV-2.</cmd>
      </step>

      <step>
        <cmd>Post status updates every 15 minutes
        until the incident is resolved or downgraded.</cmd>
      </step>

      <step>
        <cmd>Close the incident record and schedule
        a post-incident review within 48 hours.</cmd>
      </step>

    </steps>

    <result>
      <p>
        The incident is resolved, documented, and owners
        are aligned on next steps for root cause analysis.
      </p>
    </result>

  </taskbody>
</task>
Rendered output

Respond to an Incident

Use this procedure when an incident is detected or reported. Complete all steps in sequence.

Prerequisites
You must have responder access in the incident management platform and be on-call for the current rotation before proceeding.
  1. Confirm the alert is a genuine incident, not a monitoring false positive.
    Check system dashboards before declaring.
  2. Create an incident record and assign a severity level based on the concept topic.
  3. Notify the on-call engineer and open a dedicated incident channel.
    Channel naming: inc-YYYYMMDD-[sev]
  4. Escalate using the escalation matrix if the incident is SEV-1 or SEV-2.
  5. Post status updates every 15 minutes until the incident is resolved or downgraded.
  6. Close the incident record and schedule a post-incident review within 48 hours.
Result: The incident is resolved, documented, and owners are aligned on next steps for root cause analysis.
reference

Escalation Matrix

Reference topics in DITA are lookup tables, not prose. They are not read sequentially; a responder consults them under pressure to confirm a specific fact, threshold, or contact. The <refbody> element contains <properties> or <simpletable> structures optimized for scanning, not reading.

XML source — ref-escalation-matrix.dita
<reference id="ref-escalation-matrix">

  <title>Escalation Matrix</title>

  <shortdesc>
    Role assignments, response time targets, and
    escalation paths by severity level.
  </shortdesc>

  <refbody>

    <simpletable>
      <sthead>
        <stentry>Severity</stentry>
        <stentry>First Responder</stentry>
        <stentry>Escalate To</stentry>
        <stentry>Response Target</stentry>
        <stentry>Comms Required</stentry>
      </sthead>
      <strow>
        <stentry>SEV-1</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>Eng Director + VP Ops</stentry>
        <stentry>15 minutes</stentry>
        <stentry>Status page + exec brief</stentry>
      </strow>
      <strow>
        <stentry>SEV-2</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>Eng Manager</stentry>
        <stentry>30 minutes</stentry>
        <stentry>Status page update</stentry>
      </strow>
      <strow>
        <stentry>SEV-3</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>Team Lead</stentry>
        <stentry>2 hours</stentry>
        <stentry>Internal Slack only</stentry>
      </strow>
      <strow>
        <stentry>SEV-4</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>None required</stentry>
        <stentry>Next business day</stentry>
        <stentry>Ticket update only</stentry>
      </strow>
    </simpletable>

  </refbody>
</reference>
Rendered output

Escalation Matrix

Role assignments, response time targets, and escalation paths by severity level.

Severity First Responder Escalate To Response Target Comms Required
SEV-1 On-call Engineer Eng Director + VP Ops 15 min Status page + exec brief
SEV-2 On-call Engineer Eng Manager 30 min Status page update
SEV-3 On-call Engineer Team Lead 2 hr Internal Slack only
SEV-4 On-call Engineer None required Next bus. day Ticket update only