Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
June 7, 2021
Learn to improve your organization’s incident management with this framework for incident management: Prepare, Respond, Review.
In this post, based on the white paper A Framework for Incident Response, Assessment, and Learning, by Shaaron A. Alvares, Josh Atwell, Jason Cox, Erica Morrison, Scott Prugh, and Randy Shoup, we present fresh incident management framework to help you improve your overall organizational response to incidents.
In the white paper this framework is broken down into a taxonomy of dysfunctions and patterns to help you greatly improve your incident response and posture.
Incidents and outages are an existential threat to businesses that build, operate, and consume technology services. Businesses and customers rely heavily on these critical systems. When they fail, customer credibility can be irreparably harmed, putting both business reputation and revenue at stake.
Your teams are already responding to incidents, but how well are they doing it? How are they adjusting as the technology landscape changes? Could they do better?
With this framework for incident management, we’ll help you point the north star toward the ideal state and change the narrative about incidents from one of blame to one of learning over the long term.
We’ll also provide real-world, right-sized patterns and examples that can be used for incremental improvement to change behavior with a view to a long-term investment, giving pragmatic and tactile practices and patterns, with examples from some of the top practitioners and companies, to address a complicated topic that is hard to cover well.
The traditional ITIL-based incident-management framework gave companies a structured way of categorizing, handling, and resolving incidents. This framework, as well as adjacent processes, such as problem management, have become the reference model for organizations to deal with the realities of handling incidents.
However, software systems in today’s enterprises are composed of hundreds of different systems and technologies that interact in surprising ways. As complexity has increased, the ITIL framework has not evolved to deal with the messy reality.
As such, the traditional way of thinking about incidents and dealing with them has become operational debt and can prevent organizations from evolving. There is also a dearth of practical, accessible, hands-on experience about how leading companies deal with the realities of incident management and response in this complex world.
Dysfunctions with traditional incident management include:
Incidents cannot be prevented. But we can greatly reduce the frequency, duration, and impact of incidents on both our customers and our employees who operate these systems.
The benefits of improving incident management and response are substantial and can yield reduced impact to customers, improved confidence from customers in the business, reduced stress on teams and employees, and increased revenue.
Overarching principles for improvement include:
The incident problem space is very large, and our goal is to break it down, remove the mythology, and create a framework that can be evolved over time with more depth and breadth as our industry learns more.
We propose this incident management framework:
Prepare, Respond, Review.
The figure below outlines the Prepare, Response, Review incident response pattern cycle, as well as the common patterns within the pre-incident (prepare), incident response (respond), and post-incident (review) phases.
In the full white paper, the authors dive into each of the patterns below in more detail, providing solutions in each area.
Before diving into the patterns, we find it is essential for organizations to measure how well their teams are currently doing at incident management.
Below is an incident response assessment: a collection of probing questions that allow you and your team to answer and assess your current incident-response preparedness. Take these questions to your team to see how well you are doing and where there are areas for improvement.
It should be very clear that the requirements for effective incident response are broad and nuanced. In the full white paper, we have intended to highlight key patterns that can be reviewed and used to assess the effectiveness of your incident-response plan.
We acknowledge that every environment contains its own priorities and constraints, but like any good architecture, there is typically a high percentage of consistency between organizations. Key outcomes around quickly identifying and resolving incidents are universal. These patterns are developed to reflect those requirements as well as present some emerging patterns that have yielded strong results for high-performing teams.
The desired state for incident response should encompass a few key characteristics.
Please review the patterns in the full white paper for the most detailed explanation of this incident response framework.
Trusted by technology leaders worldwide. Since publishing The Phoenix Project in 2013, and launching DevOps Enterprise Summit in 2014, we’ve been assembling guidance from industry experts and top practitioners.
No comments found
Your email address will not be published.
First Name Last Name
Δ
"This feels pointless." "My brain is fried." "Why can't I think straight?" These aren't…
As manufacturers embrace Industry 4.0, many find that implementing new technologies isn't enough to…
I know. You’re thinking I'm talking about Napster, right? Nope. Napster was launched in…
When Southwest Airlines' crew scheduling system became overwhelmed during the 2022 holiday season, the…