top of page

An Introduction to The Incident Management Process

Updated: Feb 2

What is an Incident Management Process?


In the context of an IT help desk, incident management refers to identifying, analysing, resolving, and preventing IT issues ('incidents') that impact the availability and reliability of IT services.



an impage of a computer with a warning symbol

An IT incident can be any event that causes disruption or degradation of the normal functioning of an IT system, application, or infrastructure.


The incident management process involves the IT help desk team logging and prioritising incidents based on their impact and urgency, diagnosing and resolving incidents according to predefined procedures, and ensuring they are fully documented and reported.


The aim is to minimise the impact of incidents on business operations and restore regular service as quickly as possible.


If you don't use the term "incident management," just think of "ticket management".


Why do we have Incident Management?


If you don't have a straightforward process, how can you implement a tool to automate, communicate, or evaluate it?


Here are a couple of other textbook-style reasons;

  • To minimise the impact of incidents on the business and its customers.

  • To restore regular service operation as quickly as possible.

  • To prevent incidents from recurring.

  • To continuously improve the incident management process

  • To communicate effectively with stakeholders during and after an incident

  • To comply with relevant industry standards and regulations.


The Incident Management Process


Here's a video overview of the Incident Management Process.



Incident Management Process Steps


Incident Management Process
.pptx
Download PPTX • 106KB

incident management process diagram
Incident Management Process Diagram

1) Record Incident

When an incident is identified, a comprehensive record is generated within the Incident Management system. This record acts as a dynamic logbook that will be updated throughout the life of the incident. It documents the initial issue, every action taken, who undertook it, and when it happened. This ensures traceability and accountability for each phase of resolving the incident.


2) Classification & Assessment

After the initial recording, experts review the incident details to classify it. The classification could range from hardware issues to software bugs or user errors. A priority level is also assigned based on factors like impact and urgency. This step is crucial for allocating resources effectively and can also serve a secondary role in trend analysis. Detecting patterns in incidents can lead to proactive measures in the future.


3) Investigation & Recovery

The Help Desk is then tasked with a brief investigation into the issue. During this time, staff may refer to an existing knowledge base for potential solutions or fixes. Depending on the complexity, they may resolve or escalate the issue to a specialised support team. Time is of the essence here, as a speedy recovery minimises downtime and impact.


4) Contact the Customer with a Resolution

Once the problem is resolved or a workaround is found, the user or customer is contacted to confirm the solution's effectiveness. Their acceptance is crucial; if they are satisfied, the incident record is updated to indicate a successful resolution.


5) Update Knowledge Base

This step is crucial for organisational learning. If the incident led to a new solution or workaround, this information is documented in the knowledge base. By doing this, the organisation equips itself better for future incidents, enabling quicker resolutions and reducing time spent on investigations.


6) Close Incident

Finally, the incident is formally closed once the user accepts the resolution. At this point, a final classification of the cause is added to the record, such as whether it was due to a user error, a recent change in systems, a software fault, etc. This closure process ensures that all actions are documented and provides valuable data for reviewing the effectiveness of the incident management process.


Incident Management Roles & Responsibilities


Helpdesk Staff

  • Incident Identification and Logging: Responsible for recognising and documenting incidents as users report them.

  • Incident Categorisation: Classifying incidents based on their impact and urgency.

  • Data Capture for Analysis: Gathering necessary information that will aid in diagnosing the issue.

  • Customer Updates: Providing timely updates to customers upon request.

  • Incident Escalation: Escalate incidents to the appropriate technical teams or the Major Incident Manager.


Help Desk Manager / Team Leader

  • Process Management: Overseeing the entire incident process from start to finish.

  • Response Coordination: Coordinating the collective response to incidents among various teams.

  • Resource Allocation and Task Prioritisation: Assigning human and technical resources while setting task priorities.

  • Progress Monitoring: Keeping track of incident resolution progress and updating stakeholders accordingly.

  • Procedure Adherence: Ensuring incidents are logged, categorised, and resolved per established protocols.

  • Post-Incident Reviews: Conducting reviews after incident resolution to identify and implement improvements.

  • Metrics and Trend Analysis: Reporting on key performance indicators and analysing incident trends for future preventive measures.


Technical Support Staff

  • Collaboration: Working with other technical units or third-party suppliers to facilitate incident resolution.

  • Implementation of Fixes or Workarounds: Taking necessary actions to restore affected services, whether fixes or workarounds.

  • System Updates: Keeping the incident management system updated with the incident resolution status and progress.

  • Managerial Communication: Offering the Help Desk Manager updates regarding the incident's status, impact, and estimated resolution time.

  • Post-Incident Review Participation: Engaging in reviews after the incident has been resolved to identify areas for improvement and execute corrective actions.


Incident Management RACI Matrix

Task / Activity

Help Desk Staff

Help Desk Team Leader / Manager

Technical Support Staff

Incident identification & logging

R

A

I

Incident Categorisation

R

A

I

Data capture for analysis

R

A

I

Customer updates

R

A

I

Incident escalation

R

A

C

Process management

I

A

I

Response coordination

I

A

R

Resource allocation and task prioritisation

I

A

R

Progress monitoring

I

A

R

Procedure adherence

R

A

C

Post-incident reviews

C

A

R

Metrics & trend analysis

I

A

C

Implementation of fixes or workarounds

I

C

R

System updates

I

C

R

Managerial communication

I

A

R

Post-incident review participation

C

A

R

Key:

  • R (Responsible): The person who performs an activity or does the work.

  • A (Accountable): The person who is ultimately accountable and has the final authority on the task.

  • C (Consulted): The person must be consulted before a decision or action is taken.

  • I (Informed): The person who must be informed after a decision or action is taken.


The Incident Management Maturity Model

The following maturity model helps you consider where your help desk sits on the maturity scale and provides some guidance and ideas for improvement areas.


Level 1: Ad-hoc

  • No formal incident management process is in place.

  • Reactive response to incidents.

  • Reliance on individual efforts and experience.

Level 2: Basic

  • Basic documentation of incident management procedures.

  • Inconsistent use of tools and processes.

  • Limited incident prioritisation and categorisation.

  • Escalation paths are not clearly defined.

Level 3: Structured

  • Well-defined incident management procedures.

  • Clear roles and responsibilities.

  • Standardised prioritisation, categorisation, and escalation.

  • Improved collaboration and communication.

Level 4: Managed

  • Proactive incident management approach.

  • Continuous improvement processes in place.

  • Regular reviews and audits of incident management.

  • Established performance metrics and KPIs.

  • Focus on compliance and consistency.

Level 5: Optimised

  • Fully integrated and optimised incident management.

  • Advanced analytics and automation.

  • Incident anticipation and prevention.

  • Continuous improvement is a core value.

  • Alignment with IT and business goals.


Taking in on a level

If you want to drive the maturity of your incident process, then there are two main steps you can take;


1) Implementing an Incident Management Policy.


So, this is optional.


It may have value depending on the type of organisation you are in (pharmaceutical, financial or regulatory). If you feel it conveys important information to various parties and has value, go ahead. If you think it is bureaucratic and has little value, skip it.


The benefit it really brings is consolidating everything under a single roof. All processes, including Major Incidents, roles & responsibilities, and any specific expectations or guidance.


Have a look and see if you think it adds value. If not, maybe its something to consider as the maturity in the teams improves.


2) Developing a Major Incident Process.

In circumstances where there is a major outage, then you need a major process. Check out the following guidance on creating a major incident process.

The Major Incident Process
The Major Incident Process



Articles

About the author

Hi, I'm Alan, and have been working within the IT sector for over 30 years.

For the last 15 years, I've focused on IT Governance, Information Security, Projects and Service Management across various styles of organisations and markets.

I hold a degree in Information Systems, ITIL Expert certificate, PRINCE2 Practitioner and CISMP (Information Security Management).

More...

Iseo blue logo
bottom of page