Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
June 28, 2021
This post has been adapted from Nora Jones’ 2021 DevOps Enterprise Summit Virtual – Europe presentation. You can view the full presentation here.
We’ve all had incidents. They’re unexpected. They’re stressful. And sometimes in management, there’s inevitable questions that creep up. What can we do to prevent this from ever happening again? What caused this? Why did this take so long to fix?
The organizations I’ve worked in, and the research that myself and my team have done in this space, has shown the following responses to the question of why are incident reviews important:
“I’m honestly not sure.”
“Management wants us to.”
“It gives the engineer space to vent.”
“I think people would be mad if we didn’t.”
“We have obligations to customers.”
“We have tracking purposes.”
“We want to see if we’re getting better.”
“We want to have the answers to the board’s questions.”
I think we all know that some form of post-incident review is important, but we don’t all agree on why it’s important. We want to make efforts to improve, we want to show that we’re improving, but we’re spinning our wheels in a lot of ways because we’re not actually making efforts to improve the post-incident reviews themselves, we’re making efforts to try to stop incidents.
But without making efforts to try to improve the post-mortem reviews, or improve the incident reviews, we’re actually not going to improve incidents on any level.
The good news is incident analysis can be trained and aided, but it has to be trained and aided to be approved upon.
At the DevOps Enterprise Summit, John Allspaw has talked about how the metrics we are tracking today, like MTTR and MTTD and number of incidents, are actually shallow metrics. I get why we’re tracking those things, it’s an emotional release, it’s something that can make us feel better. But he posed an open question and challenge to the audience.
He said, “Where are the people in this tracking? And where are you?”
We haven’t changed much as an industry in this regard. Gathering useful data about incidents does not come for free. You need time and space to determine it.
I’m going to talk to you about why giving this time and space to your engineers, and your organizations, to improve post-incident reviews can actually work within your favor. It can give you that ROI you’re looking for and level up your entire organization.
[bctt tweet=”Spoiler alert, sometimes the thorough analysis or incident review actually reveals things that we’re not ready to see, hear, or change. So as leaders, we have to be open to hearing some of these things.” username=”@ITRevBooks”]
I’m going to tell you about this through multiple stories that I’ve experienced myself, and show you new paths on how you can do this in ways that are not disruptive to your business, as well as next steps for you to embark on.
Spoiler alert, sometimes the thorough analysis or incident review actually reveals things that we’re not ready to see, hear, or change. So as leaders, we have to be open to hearing some of these things.
There’s a famous equation in a book called Seeing What Others Don’t by Gary Klein. Gary Klein is a cognitive psychologist who studies experts and expertise in organizations. This metric he came up with is performance improvement. It’s the combination of error reduction + insight generation. You can’t have one without the other.
Yet we focus as an industry way too much on the error reduction piece and not on the insight generation piece. Except we’re not actually going to improve the performance of our organizations if we’re only focusing on the error reduction piece. And I get it, that is an easy thing to measure. As software engineers, we’re taught to look for technical errors, we’re taught to look for some of these things, we’re not so much taught to generate insights. We’re not so much taught to disseminate insights. And we don’t get celebrated for it.
[bctt tweet=”we focus as an industry way too much on the error reduction piece and not on the insight generation piece.” username=”@ITRevBooks”]
That’s something that we can do as leaders: we can actually celebrate the insight generation and dissemination and training materials by folks in our organization.
Next are three different stories about the value incident analysis brought about in different organizations. These are based on true events I have witnessed or been a part of, but their names and details have been changed.
Next: Story #1: Netflix…
Nora Jones has been on the front lines as a software engineer, as a manager, and now runs her own organization, Jeli. In 2017, she keynoted at AWS Reinvent to an audience of around 50,000 people about the benefits of chaos engineering, purposefully injecting failure in production, and her experiences implementing it at Jet.com, which is now Walmart and Netflix. Most recently, she started her own company, Jeli, based on a need she saw for the importance and value add to the whole business of a good post-incident review. As well as the the barrier to entry she saw of getting folks to work on that. She started an online community called Learning From Incidents and Software. This community is full of over 300 people in the software industry sharing their experiences with incidents and incident reviews.
This post is based on her 2021 presentation DevOps Enterprise Summit-Virtual Europe, which you can watch for free in the IT Revolution Video Library.
Trusted by technology leaders worldwide. Since publishing The Phoenix Project in 2013, and launching DevOps Enterprise Summit in 2014, we’ve been assembling guidance from industry experts and top practitioners.
No comments found
Your email address will not be published.
First Name Last Name
Δ
You've been there before: standing in front of your team, announcing a major technological…
If you haven’t already read Unbundling the Enterprise: APIs, Optionality, and the Science of…
Organizations face critical decisions when selecting cloud service providers (CSPs). A recent paper titled…
We're thrilled to announce the release of The Phoenix Project: A Graphic Novel (Volume…