Incident Management Playbook

concept

What Is an Incident

The concept topic establishes shared definitions. It answers "what is this?" before any procedure begins. In DITA, concept topics use <conbody> for free-form explanation and often include tables, lists, or diagrams that define terms or categories.

XML source — concept-what-is-an-incident.dita

<concept id="concept-incident-definition">

  <title>What Is an Incident</title>

  <shortdesc>
    An incident is any unplanned disruption or degradation
    of a service that affects end users or system stability.
  </shortdesc>

  <conbody>

    <p>
      Incidents differ from service requests and problems.
      A service request is a planned interaction. A problem
      is the underlying cause of one or more incidents.
      An incident is the immediate, user-visible symptom.
    </p>

    <p>
      All incidents are assigned a severity level at intake.
      Severity determines response time, escalation path,
      and communication requirements.
    </p>

    <table>
      <title>Severity Definitions</title>
      <tgroup cols="3">
        <colspec colname="level"/>
        <colspec colname="label"/>
        <colspec colname="definition"/>
        <thead>
          <row>
            <entry>Level</entry>
            <entry>Label</entry>
            <entry>Definition</entry>
          </row>
        </thead>
        <tbody>
          <row>
            <entry>SEV-1</entry>
            <entry>Critical</entry>
            <entry>Full outage. All users affected.</entry>
          </row>
          <row>
            <entry>SEV-2</entry>
            <entry>Major</entry>
            <entry>Core function impaired. Workaround unavailable.</entry>
          </row>
          <row>
            <entry>SEV-3</entry>
            <entry>Moderate</entry>
            <entry>Partial degradation. Workaround available.</entry>
          </row>
          <row>
            <entry>SEV-4</entry>
            <entry>Low</entry>
            <entry>Minor issue. No user impact at present.</entry>
          </row>
        </tbody>
      </tgroup>
    </table>

  </conbody>
</concept>

Rendered output

What Is an Incident

An incident is any unplanned disruption or degradation of a service that affects end users or system stability.

Incidents differ from service requests and problems. A service request is a planned interaction. A problem is the underlying cause of one or more incidents. An incident is the immediate, user-visible symptom.

All incidents are assigned a severity level at intake. Severity determines response time, escalation path, and communication requirements.

Severity Definitions

Level	Label	Definition
SEV-1	Critical	Full outage. All users affected.
SEV-2	Major	Core function impaired. Workaround unavailable.
SEV-3	Moderate	Partial degradation. Workaround available.
SEV-4	Low	Minor issue. No user impact at present.

task

Respond to an Incident

Task topics are the most procedurally strict topic type in DITA. They require a defined <prereq>, a numbered <steps> block, and an explicit <result>. The structure enforces completeness: every step is discrete, every prerequisite is stated, and the expected outcome is declared before a responder begins.

XML source — task-respond-to-incident.dita

<task id="task-incident-response">

  <title>Respond to an Incident</title>

  <shortdesc>
    Use this procedure when an incident is detected
    or reported. Complete all steps in sequence.
  </shortdesc>

  <taskbody>

    <prereq>
      <p>
        You must have responder access in the incident
        management platform and be on-call for the current
        rotation before proceeding.
      </p>
    </prereq>

    <steps>

      <step>
        <cmd>Confirm the alert is a genuine incident,
        not a monitoring false positive.</cmd>
        <info>Check system dashboards before declaring.</info>
      </step>

      <step>
        <cmd>Create an incident record and assign
        a severity level based on the concept topic.</cmd>
      </step>

      <step>
        <cmd>Notify the on-call engineer and open
        a dedicated incident channel.</cmd>
        <info>
          Channel naming convention:
          <codeph>inc-YYYYMMDD-[sev]</codeph>
        </info>
      </step>

      <step>
        <cmd>Escalate using the escalation matrix
        if the incident is SEV-1 or SEV-2.</cmd>
      </step>

      <step>
        <cmd>Post status updates every 15 minutes
        until the incident is resolved or downgraded.</cmd>
      </step>

      <step>
        <cmd>Close the incident record and schedule
        a post-incident review within 48 hours.</cmd>
      </step>

    </steps>

    <result>
      <p>
        The incident is resolved, documented, and owners
        are aligned on next steps for root cause analysis.
      </p>
    </result>

  </taskbody>
</task>

Rendered output

Respond to an Incident

Use this procedure when an incident is detected or reported. Complete all steps in sequence.

Prerequisites
You must have responder access in the incident management platform and be on-call for the current rotation before proceeding.

Confirm the alert is a genuine incident, not a monitoring false positive.
Check system dashboards before declaring.
Create an incident record and assign a severity level based on the concept topic.
Notify the on-call engineer and open a dedicated incident channel.
Channel naming: inc-YYYYMMDD-[sev]
Escalate using the escalation matrix if the incident is SEV-1 or SEV-2.
Post status updates every 15 minutes until the incident is resolved or downgraded.
Close the incident record and schedule a post-incident review within 48 hours.

Result: The incident is resolved, documented, and owners are aligned on next steps for root cause analysis.

reference

Escalation Matrix

Reference topics in DITA are lookup tables, not prose. They are not read sequentially; a responder consults them under pressure to confirm a specific fact, threshold, or contact. The <refbody> element contains <properties> or <simpletable> structures optimized for scanning, not reading.

XML source — ref-escalation-matrix.dita

<reference id="ref-escalation-matrix">

  <title>Escalation Matrix</title>

  <shortdesc>
    Role assignments, response time targets, and
    escalation paths by severity level.
  </shortdesc>

  <refbody>

    <simpletable>
      <sthead>
        <stentry>Severity</stentry>
        <stentry>First Responder</stentry>
        <stentry>Escalate To</stentry>
        <stentry>Response Target</stentry>
        <stentry>Comms Required</stentry>
      </sthead>
      <strow>
        <stentry>SEV-1</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>Eng Director + VP Ops</stentry>
        <stentry>15 minutes</stentry>
        <stentry>Status page + exec brief</stentry>
      </strow>
      <strow>
        <stentry>SEV-2</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>Eng Manager</stentry>
        <stentry>30 minutes</stentry>
        <stentry>Status page update</stentry>
      </strow>
      <strow>
        <stentry>SEV-3</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>Team Lead</stentry>
        <stentry>2 hours</stentry>
        <stentry>Internal Slack only</stentry>
      </strow>
      <strow>
        <stentry>SEV-4</stentry>
        <stentry>On-call Engineer</stentry>
        <stentry>None required</stentry>
        <stentry>Next business day</stentry>
        <stentry>Ticket update only</stentry>
      </strow>
    </simpletable>

  </refbody>
</reference>

Rendered output

Escalation Matrix

Role assignments, response time targets, and escalation paths by severity level.

Severity	First Responder	Escalate To	Response Target	Comms Required
SEV-1	On-call Engineer	Eng Director + VP Ops	`15 min`	Status page + exec brief
SEV-2	On-call Engineer	Eng Manager	`30 min`	Status page update
SEV-3	On-call Engineer	Team Lead	`2 hr`	Internal Slack only
SEV-4	On-call Engineer	None required	`Next bus. day`	Ticket update only

Incident Management Playbook

What you are looking at

What Is an Incident

What Is an Incident

Respond to an Incident

Respond to an Incident

Escalation Matrix

Escalation Matrix