Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
Explore our extensive library of experience reports.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
Weekly discussion around “Deming’s Journey to Profound Knowledge” with author John Willis.
VIRTUAL — Helping leaders succeed and organizations thrive (formerly DevOps Enterprise Summit).
Venue: Fontainebleau — Helping leaders succeed and organizations thrive (formerly DevOps Enterprise Summit).
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
December 9, 2021
This case study has been excerpted from the second edition of The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, John Willis and Nicole Forsgren, PhD.
After successfully improving releases between 2012 and 2015, CSG further evolved their organizational structure to improve their day-to-day operational stance. At the DevOps Enterprise Summit in 2016, Scott Prugh, Chief Architect and VP Software Engineering at the time, spoke about a dramatic organizational transformation that combined disparate development and operations teams into cross-functional build/run teams.
Prugh described the start of this journey:
We had made dramatic improvements in our release processes and release quality but continued to have escalations and conflicts with our operations team members. The development teams felt confident about their code quality and continued to push releases faster and more frequently. On the other hand, our ops teams complained about production outages and the rapid changes breaking the environment. To battle these opposing forces, our change and program management teams ramped up their processes to improve coordination and to attempt to control the chaos. Sadly, this did little to improve production quality, our operational teams’ experience, or the relationship between the development and operations teams.
We had made dramatic improvements in our release processes and release quality but continued to have escalations and conflicts with our operations team members. The development teams felt confident about their code quality and continued to push releases faster and more frequently.
On the other hand, our ops teams complained about production outages and the rapid changes breaking the environment. To battle these opposing forces, our change and program management teams ramped up their processes to improve coordination and to attempt to control the chaos. Sadly, this did little to improve production quality, our operational teams’ experience, or the relationship between the development and operations teams.
Figure: How Structure Influences Behavior and Quality (Image courtesy of Scott Prugh)
To understand more of what was going on, the team dug into the incident data, which revealed some surprising and alarming trends:
Prugh further observed, “We had basically improved our development stance significantly but had done little to improve the production operations environment. We got the exact result we had optimized for: great code quality and poor operations quality.”
In order to find a solution, Prugh asked the following questions:
Figure: From Siloed Approach to Cross-Functional Teams (Image courtesy of Scott Prugh)
At the same time Prugh was making these discoveries, the customer escalations of production issues had repeatedly been raised to the executive leadership. CSG’s customers were irate, and the executives asked what could be done to improve CSG’s operational stance.
After several passes at the problem, Prugh suggested creating “Service Delivery Teams” that build and run the software. Basically, he suggested bringing together Dev and Ops onto one team.
At first, the proposal was viewed as polarizing. But after representing previous successes with shared operations teams, Prugh further argued that bringing Dev and Ops together would create a win-win for both teams by:
Figure: Conventional vs. Cross-Functional Structure (Image courtesy of Scott Prugh)
The next steps involved re-creating a new team structure of combined development and operations teams and leaders. Managers and leaders of the new teams were selected from the current pool, and team members were re-recruited onto the new cross-functional teams. After the changes, development managers and leaders got real experience in running the software they had created.
It was a shocking experience. The leaders realized that creating build/run teams was only the first step in a very long journey. Erica Morrison, VP Software Engineering, recalls:
As I got more involved with the Network Load Balancer team, I quickly started to feel like I was in The Phoenix Project. While I had seen many parallels to the book in previous work experiences, it was nothing like this. There was invisible work/work in multiple systems: one system for stories, another for incidents, another for CRQs, another for new requests. And TONS of email. And some stuff wasn’t in any system. My brain was exploding, trying to track it all. The cognitive load from managing all the work was huge. It was also impossible to follow up with teams and stakeholders. Basically whoever screamed the loudest went to the top of the queue. Almost every item was a priority #1 due to the lack of a coordinated system to track and prioritize work. We also realized that a ton of technical debt had accumulated, which prevented many critical vendor upgrades, leaving us on outdated hardware, software, and OS. There was also a lack of standards. When we did have them, they were not universally applied or rolled out to production. Critical people bottlenecks were also prolific, creating an unsustainable work environment. Finally, all changes went through a traditional CAB [change advisory board] process, which created a massive bottleneck to get things approved. Additionally, there was little automation supporting the change process, making every change manual, not traceable, and very high risk.
As I got more involved with the Network Load Balancer team, I quickly started to feel like I was in The Phoenix Project. While I had seen many parallels to the book in previous work experiences, it was nothing like this. There was invisible work/work in multiple systems: one system for stories, another for incidents, another for CRQs, another for new requests. And TONS of email. And some stuff wasn’t in any system. My brain was exploding, trying to track it all.
The cognitive load from managing all the work was huge. It was also impossible to follow up with teams and stakeholders. Basically whoever screamed the loudest went to the top of the queue. Almost every item was a priority #1 due to the lack of a coordinated system to track and prioritize work.
We also realized that a ton of technical debt had accumulated, which prevented many critical vendor upgrades, leaving us on outdated hardware, software, and OS. There was also a lack of standards. When we did have them, they were not universally applied or rolled out to production.
Critical people bottlenecks were also prolific, creating an unsustainable work environment.
Finally, all changes went through a traditional CAB [change advisory board] process, which created a massive bottleneck to get things approved. Additionally, there was little automation supporting the change process, making every change manual, not traceable, and very high risk.
To address these issues, the CSG team took a multi-pronged approach. First, they created a bias for action and culture change by applying the learnings from John Shook’s Model of Change: “Change Behavior to Change Culture.” The leadership team understood that in order to change the culture they had to change behavior, which would then affect values and attitudes and result in eventual culture change.
Next, the team brought in developers to supplement the operational engineers and demonstrate what would be possible with great automation and engineering applied to key operational problems. Automation was added to traffic reporting and device reporting. Jenkins was used to orchestrate and automate basic jobs that were being done by hand. Telemetry and monitoring to CSG’s common platform (StatHub) were added. And finally, deployments were automated to remove errors and support rollback.
The team then invested in getting all the config in code and version control. This included CI practices as well as lower environments that could test and practice deployments to devices that would not impact production. The new processes and tools allowed easy peer review since all code went through a pipeline on the way to production.
Finally, the team invested in bringing all work into a single backlog. This included automation to pull tickets from many systems into one common system that the team could collaborate in and thus prioritize the work.
Erica Morrison recalls her final learnings:
We worked really hard in this journey to bring some of the best practices we know from Development to the Ops world. There have been many things in this journey that went well, but there were a lot of misses and a lot of surprises. Probably our biggest surprise is how hard Ops really is. It’s one thing to read about it but quite another to experience it first-hand. Also, change management is scary and not visible to developers. As a development team, we had no details about the change process and changes going in. We now deal with change on a daily basis. Change can be overwhelming and consume much of what a team does each day. We also reaffirmed that change is one of the key intersection points of opposing goals between Dev and Ops. Developers want their changes to make it to production as soon as possible. But when Ops is responsible for putting that change in and dealing with the fallout, it creates a natural reaction to want to go slower. We now understand that getting Dev and Ops working together to both design and implement the change creates a win-win that improves both speed and stability.
We worked really hard in this journey to bring some of the best practices we know from Development to the Ops world. There have been many things in this journey that went well, but there were a lot of misses and a lot of surprises. Probably our biggest surprise is how hard Ops really is. It’s one thing to read about it but quite another to experience it first-hand.
Also, change management is scary and not visible to developers. As a development team, we had no details about the change process and changes going in. We now deal with change on a daily basis. Change can be overwhelming and consume much of what a team does each day.
We also reaffirmed that change is one of the key intersection points of opposing goals between Dev and Ops. Developers want their changes to make it to production as soon as possible. But when Ops is responsible for putting that change in and dealing with the fallout, it creates a natural reaction to want to go slower. We now understand that getting Dev and Ops working together to both design and implement the change creates a win-win that improves both speed and stability.
Trusted by technology leaders worldwide. Since publishing The Phoenix Project in 2013, and launching DevOps Enterprise Summit in 2014, we’ve been assembling guidance from industry experts and top practitioners.
In their upcoming book, Unbundling the Enterprise: APIs, Optionality, and the Science of Happy…
Welcome to the final installment of IT Revolution’s series based on the book Investments…
As a business leader, you know that artificial intelligence (AI) is no longer just…
Welcome to the twelfth installment of IT Revolution’s series based on the book Investments…