LLMs and Generative AI in the enterprise.
Inspire, develop, and guide a winning organization.
Understand the unique values and behaviors of a successful organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how to enhance collaboration and performance in large-scale organizations through Flow Engineering
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Just as physical jerk throws our bodies off balance, technological jerk throws our mental models and established workflows into disarray when software changes too abruptly or without proper preparation.
Sure, vibe coding makes you code faster—that’s the obvious selling point. But if you think speed is the whole story, you’re missing out on the juicy stuff.
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
January 19, 2026
Studies show 20-55% productivity gains from AI code generation. Organizations are mandating AI adoption. Cursor went from $1M to $500M in revenue in less than two years. The promise is clear: AI will revolutionize software development.
But here’s what’s actually happening: Developers are churning out code at unprecedented speeds, and their delivery pipelines are collapsing under the weight. Code reviews are backing up. Testing is overwhelmed. Production incidents are increasing. The very improvements that were supposed to accelerate delivery are creating bottlenecks downstream.
According to the recent paper “Revenge of QA” published in the Fall 2025 Enterprise Technology Leadership Journal, AI isn’t revealing new problems—it’s exposing decades of process debt we’ve been carrying all along.
On the surface, it seems curious that a technology capable of generating so much code so quickly would lead to only 20% productivity gains instead of 10x.
The authors believe this is because code creation isn’t the bottleneck and hasn’t been for decades.
The current constraint in the delivery cycle is usually somewhere downstream of the coding effort: code review, integration, system test, or some other verification or validation step. Not only is generating more code unlikely to lead to greater throughput in the system, but all that unshippable inventory could throw the entire system out of balance, increasing delays.
This insight draws from Eliyahu Goldratt’s Theory of Constraints, illustrated through two characters from management literature:
Herbie, from The Goal, is the slowest scout on a hiking trip who keeps falling behind. No matter how fast the other scouts hike, the group can only move as fast as Herbie.
Brent, from The Phoenix Project, is the brilliant senior technologist with his fingers in everything. Nothing can move forward without Brent. He’s everywhere and nowhere, a constraint disguised as competence.
Both illustrate a fundamental principle: The capacity of the system equals the capacity of the constraint.
If verification and validation are your constraints, generating more code faster does nothing to improve overall productivity. Worse, Lean management tells us that inventory is a liability, not an asset. Unshipped code incurs carrying costs—branch management, feature-flagged dormant code, continual rebasing and retesting, additional complexity, and additional risk.
In short: Generating more code doesn’t speed up overall delivery unless code generation was actually the bottleneck. It might actually increase the cost of delivery.
The more the authors talked with people using AI in actual production work, the clearer a pattern emerged: The sheer volume of generated code is overwhelming, and most organizations are ill-equipped to handle the increase.
The resulting pressure forces organizations to adapt, but the easiest adaptations address volume by taking shortcuts. As one developer said, “If you’re going through sixty pages of manual code review of a pull request, at some point, you start skimming.”
These shortcuts increase risk. Inevitably, that leads to more late-breaking surprises and production issues, which create more rework and interruptions, which exacerbate the problem further.
Seasoned developers complain that generative AI writes code that looks plausibly correct but is overly complex, poorly designed, or contains security flaws. Some have even witnessed AI disabling tests to force builds to pass.
When speed undermines quality, quality undermines speed. It’s a vicious cycle that many organizations are only now beginning to recognize.
The solution isn’t to focus solely on optimizing throughput downstream of coding. Here’s why:
First: No matter how much you optimize verification and validation steps, AI can still pump out code fast enough that the volume will overwhelm any attempt to achieve quality via inspection.
Second: Attempts to improve quality solely by improving testing ironically have a tendency to lead to worse quality overall. This isn’t new—Elisabeth Hendrickson documented this pattern back in 2001 in her paper “Better Testing – Worse Quality?”
The problem is systemic. Organizations that invested heavily in quality gates, approval processes, and manual testing over the years built impressive-looking quality assurance machinery. But that machinery was designed for a different era—when code generation was slow and deliberate.
AI exposes the fundamental flaw: Quality gates at the end of the process can’t compensate for quality problems baked in from the beginning.
This is the process debt that’s now coming due. Decades of treating quality as something you inspect in rather than build in. Decades of separating “developers who write code” from “QA people who test it.” Decades of optimizing for local efficiency (faster coding!) without considering system-level throughput.
There’s a reason the paper is titled “Revenge of QA.” Not because QA teams are seeking retribution, but because the AI era is vindicating principles that quality engineers have been advocating for decades.
Good QA people are customer-centric risk thinkers. They engage with the entire system in all its glory, all its horror, and all its messy reality. They see the whole system, analyze its risks, and ask deep questions. Of all the roles involved in delivering software, QA people tend to be the ones who see the big picture.
If your organization dismantled its independent QA team years ago (as many did during the DevOps transformation), the authors aren’t suggesting you recreate it. Rather, everyone involved in delivering software needs to become a quality engineer.
This means:
In the AI era, the software development lifecycle is collapsing into three phases that cycle rapidly:
If your organization struggles with defining work in terms of outputs rather than outcomes—churning out feature after feature, delivering solutions in search of problems—then accelerating delivery will exacerbate the situation.
The shift required: Think in prompts, not product requirement documents. When you work with generative AI, you give it a prompt. When you get what you want, you move on. When you don’t, you refine the prompt. You’re unlikely to ask for everything all at once because AI becomes overwhelmed and produces generic, useless content.
Take the same approach to ideation: Provide enough context to ground the conversation, establish overall intent, provide specific, narrowly-scoped prompts for what you want to see, and plan to iterate until you get the outcome you want.
Think in bets, not requirements. Annie Duke, author of Thinking in Bets, points out that every decision is a bet. The more bets you make, the more lessons you can learn. Rebecca Murphey from Netflix explains: “You can rapidly prototype something using vibe coding, take it to a customer, customer says that’s stupid, I don’t want it. Now you never spent an engineer’s time on building it.”
Use AI to align on intent and avoid doing useless work entirely—not just to do work faster.
Design for observability from the start. It’s not enough to design a new payment flow. You need to build the instrumentation that gives you visibility into whether it’s actually improving payment completion rates.
This is where process debt becomes most visible. Organizations with weak engineering practices discover that AI amplifies their dysfunction:
Assemble the Ensemble: In the early days of Agile, combining an experienced developer with an experienced QA engineer created synergistic results. The QA engineer’s instinct to explore, validate, and boundary test was orthogonal to the developer’s skills. In combination, they created robust code rapidly.
Now there’s an opportunity to include AI in cross-disciplinary ensembles. AI brings encyclopedic knowledge of syntax and patterns, but not the wisdom or judgment to apply them appropriately. In a cross-functional ensemble—product, design, security, QA, operations—people can evaluate AI’s contribution in real time.
Take Tiny Safe Steps: Vibe coding sounds fluid, but AI is a fickle genie. It can undo all the good it has already done in seconds. The disciplined practices developers cultivated over decades—test-driven development, keeping tests green, running tests locally with every change, working in tiny steps with frequent commits—are more relevant now than ever before.
Without fully automated tests that run locally in just seconds, you can’t know if a change has unintended consequences. When you’re generating one or two orders of magnitude more code in the same time, the inability to detect side effects becomes a critical risk.
Experiment and Throw Away: The cheaper it is to generate code, the more disposable it becomes. This isn’t a license for sloppy work, but an invitation to cultivate an experimental mindset. Some programmers direct AI to attempt multiple parallel experiments, then select the best outcome. Others try one experiment at a time but are quick to throw away generated code rather than preserve it.
The key shift: Optimize for learning with AI as a partner rather than optimizing for the creation of a permanent and perfect asset from the beginning.
AI Does More Than Generate Code: Given a local Git repository, you can ask AI about the history of a codebase, attributes like security and performance, and risks in the code. AI can be a thinking partner in deciding how to design features, fit them into existing architecture, or where to pay down technical debt first.
Try this prompt: “Given this code base and its git history, what areas of the code present the greatest risk?”
Anecdotally, some developers find that generating code is actually the least valuable contribution AI makes to their process.
The goal of operations is to sustain the value of what’s been delivered. But increasingly, operating isn’t the end of the road—it’s the beginning of the next cycle.
Use Real-World Telemetry for Continuous Validation: By combining logs, metrics, and traces, you can validate that the system works within expected parameters or locate and debug performance bottlenecks when it doesn’t. This isn’t just for operations teams—it’s quality assurance data for the entire development process.
Treat Observability as a QA Tool: Historically, observability was an afterthought in software development. The modern practice begins using observability within delivery as part of the quality assurance toolkit. This provides earlier insight into holistic quality and ensures the system is built to provide the insight required for successful operations.
Automate Canary Releases: Deploy to a subset of users first, rolling out piecemeal. Combined with high observability, this allows you to identify trends and behavior out of sync with the previous release, detecting problems early before they impact all users.
Make Operations Data Available to AI Agents: What’s new for the age of AI is that platform engineering isn’t just for human operators—it’s also for AI agents. Software teams with AI agents will be able to use production information the same way humans do, except taking on more information and processing it faster.
Companies without platforms to provide information even to human operators will fall additional orders of magnitude behind, vulnerable to any competitor who enables true AI-driven cycles.
Organizations that already operate at DORA elite level—deploying to production multiple times per day, with change lead times measured in hours, low change failure rates, and mean recovery times under an hour—are better poised to take advantage.
AI can generate code faster than any human can thoroughly test it, but elite organizations already have testing baked into their development process. AI introduces new risks, but it can already detect and recover quickly.
By contrast, organizations that still struggle with tension between speed and quality will find they cannot get promised value from AI. Generating more code doesn’t accelerate outcomes when that code is stuck behind quality gates or when it’s shoved through prematurely, resulting in production outages and missed SLOs and SLAs.
This is the process debt coming due. Organizations that invested in manual quality gates, heavyweight approval processes, and inspection-based quality are discovering their processes can’t scale to AI-generated code volumes.
The tooling we have today is the worst it’s ever going to be. Although we’re not yet at a point where AI can deliver on the full ideate-deliver-operate cycle, the authors believe it will.
When it does, the software development lifecycle will collapse to an even simpler cycle: Ideate. Validate. Repeat.
Agentic AI will eventually handle details between ideating and validating the same way compilers handle translating expressive programming languages to machine code. When AI reaches that capability, it will be possible to ship production-quality software through the whole loop without a context shift.
This doesn’t mean the end of software developers, testers, DevOps engineers, or any other technical role. Those skills are still needed because understanding what AI is doing is necessary to put the right guardrails in place.
A quality engineering mindset that takes a holistic perspective on delivering value and managing risk becomes increasingly critical. Ideating isn’t just about coming up with new features—it’s about the entire experience, including all the quality attributes: usability, security, performance, reliability, resilience.
AI code generation is performing an unintended public service: It’s exposing decades of process debt that organizations have been carrying.
The debt includes:
Organizations can no longer defer payment on this debt. AI’s code generation speed makes the dysfunction impossible to ignore.
The good news: The practices that address this debt aren’t new. They’re the disciplined engineering practices that quality engineers, DevOps practitioners, and Agile advocates have been championing for years: incremental delivery, fast feedback, automated testing, continuous integration, observability, and organizational learning.
If some of this advice seems familiar, it’s because the core ideas center on adopting disciplined practices around incremental delivery, fast feedback, and organizational learning. It’s evergreen advice. Those practices are a solid foundation that will enable your organization to take full advantage of AI. They put the good vibes in vibe coding.
The revenge of QA isn’t retribution—it’s vindication. The principles quality engineers have been advocating for decades are now non-negotiable for success in the AI era.
Pay down the process debt, or watch your AI productivity gains evaporate in the bottleneck you’ve been ignoring for years.
This blog post is based on “Revenge of QA: The Age of AI is the Age of Verification & Validation” by Elisabeth Hendrickson, Jeffrey Fredrick, Joseph Enochs, John Willis, and Kamran Kazempour, published in the Enterprise Technology Leadership Journal Fall 2025.
Managing Editor at IT Revolution working on publishing books and guidance papers for the modern business leader. I also oversee the production of the IT Revolution blog, combining the best of responsible, human-centered content with the assistance of AI tools.
No comments found
Your email address will not be published.
First Name Last Name
Δ
Studies show 20-55% productivity gains from AI code generation. Organizations are mandating AI adoption.…
The headlines scream about GenAI transforming everything. Tech leaders promise revolutionary productivity gains. Vendors…
You've read the headlines. AI is enabling 100% to 200% increases in productivity! Software…
The following is an excerpt from the new book Vibe Coding: Building Production-Grade Software With…