The Construction Of “The Phoenix Project:” Using The Downward Spiral To Better Sell DevOps
In this blog post, I will describe why we modeled “The Phoenix Project” so closely on Dr. Eliyahu Goldratt’s seminal book “The Goal: A Process of Ongoing Improvement,” how we used Dr. Goldratt’s Logical Thinking Processes to create a succinct description of the IT downward spiral that almost every organization is affected by, and then I’ll describe the resulting structure of “The Phoenix Project.”
One of the best validations about the way we characterized the downward IT spiral in the book is the number of people leaving Amazon comments similar to this:
- “I find myself relating to the characters in The Phoenix Project and as others have commented – I’ve probably met most of them over the course of my career.”
- “If you have ever worked in and aspect of IT, DevOps, or Infosec you will definitely be able to relate to situations in this book.”
- “There’s not a character in The Phoenix Project that I don’t identify with myself or someone I know in real life… not to mention the problems faced and overcome by those characters.”
(I’m also delighted that, at the time of this writing, Amazon shows 102 5-star reviews! Thank you, all!)
I’ll show one PowerPoint slide that describes the downward spiral, and how to use it to help sell the value of the DevOps to others. (Of course, this technique is equally applicable for all the other concepts in the book, including top-down view of risk and infosec, kanbans, elevation of preventive work, two week improvement cycles, etc.)
In a future post, I will discuss the steps in the Logical Thinking Processes, how we created the characters which model many of the IT archetypes and roles, and the various software tools I used during the writing process.
(The downward IT spiral is also the basis of the talk I put together for DevOpsDays London called “How To Better Sell DevOps (slideshare link)” which I did at the urging of Jonathan Thrope (@jonathan_thorpe’s blog) and John Clapham (@JohnC_Bristol) over breakfast.)
Why A Novel? Storytelling Is The Most Effective Means Of Creating A Shared Understanding
For many challenges in life, whether it’s to get a project approved, persuade them to your point of view, plan a family vacation, earn someone’s trust, or ask someone out on a date, our goal is to persuade and to get someone else’s mirror neurons to fire.
Studies have repeatedly shown that the most effective mechanism for persuasion is storytelling. I remember reading somewhere this is because the human brain is structured to hear stories. Joseph Campbell’s concept of “The Hero’s Journey” shows how epic tales and legends all share similar patterns, and Kurt Vonnegut has his unforgettable “Simple Shapes Of Stories” concept (my favorite being this 5 minute video where he shows the graph of famous stories).
You find the Hero’s Journey pattern everywhere: it’s the core of the solution-selling pattern, good copywriting, and even the best TED talks: they describe the problem, show its significance, describe how the problem is solved and the value it will creates.
Storytelling becomes very important when the problem we’re trying to point out is not fully recognized, or when the solution being proposed flies in the face of common wisdom. It’s been my thesis that the problems that motivate DevOps is not only the most important problem facing IT, but is also the most important business problem that must be solved. However, it’s a business problem that is not widely recognized or even believed, and even the believers (myself included) often have difficulty localizing or precisely verbalizing it.
When I first read “The Goal” in 1998, it completely blew my mind. Even though I’ve never worked in a manufacturing plant, let alone managed one, it was amazing to me to see the world through Alex Rogo’s eyes as he had to fix his cost and quality issues in 90 days, otherwise his world, and I knew that the lessons being taught in the book were important and relevant to my professional career.
It is widely cited “The Goal” influenced an entire generation of professional plant managers around the world. This was documented in an appendix in the third revised edition of “The Goal.” However, I believe the best description of this phenomena is described on Disc 6 of Dr. Goldratt’s audio set “Beyond The Goal”, where he described the letters he received. The letters would almost always say something like, “You have obviously been hiding in our factory, because you’ve described my life [as a plant manager] exactly…”
I mentioned in a previous blog post how we had been preparing to write “The Phoenix Project” for nearly ten years. We wanted to write “The Goal” for the modern IT context, because we kept seeing the same problems over and over, and we believed that everyone needed to see that when IT fails, the business fails. And that if all the various IT stakeholders (e.g., Dev, IT Ops, Infosec), as well as “The Business” and even Audit, could work together, the business could win and win big.
The Structure Of “The Goal”
It’s difficult to overstate how much we consciously mirrored the structure of “The Goal.” Having studied that book over a decade, here’s how I deconstruct the book:
- Part 1: vivid description of the problem (170 pages)
- Verbalization of predominant beliefs (e.g., cost accounting, needs for “efficiency”, large batch sizes, etc.)
- Depiction of how these beliefs and resulting practices pre-ordain failure (e.g., orders never on time, customers never receive what they need, plant is unprofitable, layoffs, plant is in danger of being shutdown, Alex on verge of being fired, etc.)
- Part 2: iterative solving of problems in 90 days (the rest of the book)
- The search for the constraint (e.g., heat treat ovens and NCX-10)
- Breakthrough 1: Creating red/green tags for parts heading towards the constraints (subordinate)
- Breakthrough 2: Ensuring continuous three shifts of operations of the constraint
- Breakthrough (exploit)
- Breakthrough 3: Find outsourcers/vendors to expand constraint capacity (elevate)
- Breakthrough 4: Reduce batch sizes by 50% to alleviate WIP at non-constraints (implementing drum-buffer-rope pattern)
- Breakthrough 5: Finding additional sales demand when plant capacity exceeds market demand (constraint moves outside of plant)
(Thanks to Dr. James Holt of Washington State University for his two graduate courses on Constraints Management and all of his “kitchen table coaching,” as well as to Dr. Scott R. Schultz at Mercer University for his excellent synopsis of the book.)
The IT Downward Spiral As The First 170 Pages Of “The Phoenix Project”
Dr. Goldratt stated in “Beyond The Goal” that “not even one description or hint of what the solution to Alex’s problems were even mentioned until page 170.” We took that concept very literally, and baked that structure into “The Phoenix Project.”
The first 170 pages of “The Phoenix Project” is the narrative form of over fourteen years of research into the downward spiral that IT and the business undergo when Development, IT Operations and Infosec don’t work work together to achieve the global goals of the organization, and when Infosec is not properly subordinated into the flow of work.
For years, the selling tool for DevOps was the one PowerPoint slide shown below, which describes the downward IT spiral that occurs when Dev and IT Operations don’t work together (it is slide 11 in this presentation).
Why was this slide so effective? Because you can walk someone through the descriptions of the Ops issues, and then the Dev issues, and then ask, “does this problem resonate with you at all?”
And the response is almost always, “Holy crap. That’s us.” Which inevitably leads to the question, “How can you help us escape this downward spiral?” (Someone once said to me, ”Before I can trust someone, I first need to know they care [about what I care about.]“)
Why? Because this phenomena occurs in every IT organization. It is an inevitable consequence of the core, chronic conflict in IT: in order to help the business win, every IT organization must respond to urgent business needs (i.e., make changes more quickly), while providing reliable, stable and secure IT service (i.e., make changes more slowly, if ever).
(I include this table in text form at the end of the blog post.)
Describing The Downward Spirals As A Mechanism To Better Sell DevOps
Downward spirals are bad because they are positively reinforcing feedback loops. In other words, left unchecked, the problems get worse. There are a several downward spirals in the slide:
- fragile artifacts become more fragile
- technical debt grows
- date-driven application projects focus only on features, sacrificing non-functional requirements, which results in more fragile artifacts in production
- application deploy take longer, become more turbulent, and continually gets worse
- IT Ops is mired in firefighting, and therefore cannot do preventive work or new projects
- long feature delivery cycle times result in more political decision making, meaning more focus on features (vs. non-functional requirements)
But the most important point is the business implications: when Dev and IT Operations miss their commitments, the company misses the commitment it made to the outside world. This affects not only the CIO, but also their boss, and their boss’s boss.
And that’s why this one slide has been so effective: it describes the local Dev and IT Operations problems, but also shows how it almost preordains failure of the global goals, which the CEO and the board need to care about.
(After all, 95% of all capital projects have some IT reliance, and 50% of all capital spending is technology-related. As Chris Little (@BMC_DevOps) said so pithily, “Every business, regardless of what business they think they’re in, is an IT company.”)
The slide doesn’t show the Infosec problems, who get aced out of the game by the Dev / IT Operations tribal warfare. But Infosec contributes to this core, chronic conflict (I’ll explain this concept later in the post), too:
- Infosec feeds endless amounts of production vulnerabilities (sometimes unneeded) to IT Operations
- Patches to fragile production systems cause outages
- IT Operations becomes less likely to deploy patches
- Development uses up all the time in the schedule, leaving no time for security testing
- Infosec project reviews jeopardize due date and cost targets
- Only Infosec work performed is audit and compliance related (lowest nutritional value)
- Infosec allows unnecessary and unneeded audit remediation work to be performed by Development and IT Operations
- Infosec continually becomes less and less integrated into daily work of Development and IT Operations
- Infosec “find to fix” time gets worse and worse
- Infosec increasingly unable to help the organization achieve its goals
When walking people through these points, the goal is to get the other person (regardless of their role) to say, “Holy cow. You understand my problems and these problems are important to me.”
And in the ideal, their next question will be, “How can you help me?”
Resulting Structure For “The Phoenix Project” (Warning: some spoilers enclosed)
As described earlier, we knew we had 170 pages to describe the problem. But, what is the scope of the problem being described?
I’ve found that one of the most useful constructs for describing business goals is the COSO Enterprise Risk Management Cube (or as we practitioners call it, “the COSO Cube”) construct by the COSO Commision. Here’s my favorite, which is version 2. (Yes, even cubes have versions.
The top of the COSO Cube describes the four objectives that every organization has in order to achieve its global goals: strategy, accurate financial reporting, compliance with laws and regulations, and operations.
- Strategy: as embodied by the Phoenix Project, which the entire future of the company depends upon, requires that IT be a core competency.
- Accurate financial reporting: as embodied by the third year repeat audit findings around the IT general controls, but also the loss of accounts payable and inventory management systems which prevents the closing the financial books at the end of quarter. Even the payroll failure resulted in inaccurate payroll numbers, which led to financial reporting errors.
- Compliance with laws and regulations (e.g., SOX-404, PCI DSS): failing the SOX-404 audit results in an adverse footnotes by the external auditor in the SEC 10-K statement, and there are grave implications for failing the PCI DSS assessment.
- Operations (e.g., IT application and project delivery, IT operations, information security): the Project Phoenix was $20 million over-budget and three years late, and when it finally was deployed into production, everyone would have been better off it hadn’t. Furthermore, virtually all of the critical business systems that run daily operations require that IT services be running correctly, which they often weren’t.
(I’m laughing as I write this, because the situation that poor Bill must face in “The Phoenix Project” really is the perfect storm, as it shows all four of the COSO Cube internal control objectives being jeopardized. But, the Amazon reviews show that almost every company is at risk of being like Parts Unlimited. The difference is only in degree.)
So, the structure for Part 1 of “The Phoenix Project” became:
- Part 1: vivid description of the problem (170 pages)
- Verbalization of observations of current reality (e.g., fragile artifacts, production failures, too many audit findings, technical debt, IT production issues resulting in jeopardizing all the COSO internal control objectives)
- Depiction of how predominant practices pre-ordain failure (e.g., IT Operations measured mostly on uptime and availability, Development and Product Management measured primarily on features and time to market, which trump IT Operations concerns, Infosec measured on compliance and injecting work into the IT system)
Similarly, we constructed the remaining portions of “The Phoenix Project” to be a series of eight breakthroughs as Bill and team figured out how to identify, exploit, subordinate and elevate the constraint:
- Breakthrough 1: discovering too much WIP in IT Operations and gobsmacking amounts of reliance on Brent for both project and recovery work
- Breakthrough 2: elevating preventive work to prevent unplanned work (especially for Brent) and making work visible
- Breakthrough 3: throttling the flow of work into IT Operations by the project freeze (subordinate)
- Breakthrough 4: identifying and constraining the flows of work to Brent (it was genuinely surprising to find out after the fact at how much this resembles the green/red tag pattern used in “The Goal”)
- Breakthrough 5: documenting and defining standardized work, and managing handoffs to reduce time spent in queue (through use of kanbans), and further reducing reliance on Brent
- Breakthrough 6: correctly identifying reliance on IT systems to hit Dick’s corporate objectives (via Gartner RVM model) and correctly scoping audit and infosec (via GAIT and GAIT-R)
- Breakthrough 7: building DevOps flow of work to better design in non-functional requirements in Dev, reduce batch sizes and enable single-piece flow through Development and IT Operations and better enable operational resilience (replicating the work described by Jez Humble, David Farley, Paul Hammond and John Allspaw)
- Breakthrough 8: when they discover the constraint moves outside the organization, they bring back those resources in-house (replicating one of my favorite Theory of Constraints plays: when there’s idle plant capacity, it’s often cheaper to fabricate parts in-house than to pay a supplier. Lovely.)
In a future blog post, I’ll describe how we used Dr. Goldratt’s Logical Thinking Processes to create the downward IT spiral slide, and how we used the Current Reality Tree to create the book outline.
Please let me know what you think!