LLMs and Generative AI in the enterprise.
Inspire, develop, and guide a winning organization.
Understand the unique values and behaviors of a successful organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how to enhance collaboration and performance in large-scale organizations through Flow Engineering
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Just as physical jerk throws our bodies off balance, technological jerk throws our mental models and established workflows into disarray when software changes too abruptly or without proper preparation.
Sure, vibe coding makes you code faster—that’s the obvious selling point. But if you think speed is the whole story, you’re missing out on the juicy stuff.
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
February 2, 2026
It’s 2:06 a.m. Jessie’s phone buzzes with that distinct pattern she’d set for production alerts. In her decade of on-call rotations, she’d learned that nothing good happened at 2 in the morning. The alert message confirmed her fears: unusual network traffic, suspicious IP addresses. The new AI-generated payment processing code—deployed just hours earlier—was failing spectacularly.
This is the opening of “No Vibe Coding While I’m On Call,” a fictional but painfully realistic story by Cornelia Davis, Nathen Harvey, Gene Kim, Matt Ring, and Jessie Young published in the Fall 2025 Enterprise Technology Leadership Journal. Through a series of production incidents at a fictional company called Meowseratti (a luxury pet stroller company expanding into insurance), the authors illustrate what happens when organizations embrace AI code generation without understanding the risks.
The story isn’t just entertainment—it’s a warning wrapped in narrative. Every incident in the story mirrors real patterns the authors have witnessed in organizations rushing to adopt “vibe coding” practices.
Meowseratti announces its expansion into pet insurance, and executives want to move fast. Really fast. The pressure is on: “Every day we wait is another day of missed revenue.”
Steve, an enthusiastic engineer, proposes using AI to generate a basic customer portal overnight. “We use AI to generate a basic customer portal tonight. Just enough to collect credit cards and issue policies. We can add the fancy features later.”
Jessie, the skeptical on-call engineer, raises the obvious concern: “Tell me we’re at least going to test these AI-generated systems before we start taking people’s money.”
But the momentum is unstoppable. Steve’s response captures the dangerous optimism: “Come on, we’re just collecting emails and credit cards. AI’s gotten really good at this basic stuff. I say we deploy as soon as we have something working—we can always roll back if there’s a problem.”
That phrase—”we can always roll back”—becomes the recurring theme as incident after incident proves that rollback isn’t a strategy; it’s an admission of failure.
Jessie is having dinner with friends when her phone starts buzzing with that dreaded pattern: buzz-buzz-pause-buzz. The universal rhythm of systems falling apart.
SEV-1 alerts. Production API errors spiking. Payment flows failing. The logs point to paymentProcessor.ts—a critical file that handles bringing money into the company. The culprit? A commit titled “Fix linting issues across codebase” that touched 103 files.
The change looked innocent: a simple refactor from let to const, transforming five lines of if-else logic into a clean ternary. Green tests across the board. But buried in an error trace was the smoking gun: SyntaxError: Assignment to constant variable.
AI had optimized locally without understanding the global context. That status variable got reassigned later under specific conditions, in three different places throughout the payment flow. AI was optimizing for code elegance without understanding the system’s actual behavior.
The post-incident review revealed the deeper problem. When they asked AI to analyze its own changes, it immediately identified them as risky—especially for payment processing code. It pointed out every place where the const conversion could conflict with existing code patterns.
“So the LLM made the changes, even though it knew it was risky?” someone asked.
“It didn’t know when it was making the changes because we didn’t ask it to look,” Jessie explained. “This is what happens when we treat AI like a magic ‘fix everything’ button instead of a sophisticated but limited tool.”
The lesson: Break work into smaller, verifiable chunks. No more “complete refactor” requests. Each task should have clear, measurable completion criteria. And most critically—use AI to analyze its own work before accepting changes.
Jessie’s at a concert, finally taking a break after the week she’d had, when her phone buzzes. Response times are climbing. The new feature deployed earlier that day—one where they’d been “experimenting with AI-assisted development”—is failing spectacularly.
The dashboard shows a cascade of 500 errors. Users aren’t getting expected results. But here’s the kicker: all the new tests are passing with flying colors.
The commit history shows strings of merges dominated by tests. Hundreds of new assertions, all green. The test coverage metrics looked impressive. But test coverage isn’t the same as meaningful testing.
Rolling back the changes took priority. After stabilizing, Jessie realized the uncomfortable truth: They’d automated their way into a false sense of security. AI had generated tests that checked what the code did, not what the code should do.
The lesson: Always start with a human-reviewed test plan. Get AI to enumerate ALL test cases first, explicitly requesting edge cases and error conditions. Most importantly—review test coverage against actual requirements, not metrics. High test coverage does not equal meaningful test coverage.
6:45 a.m. Jessie’s phone buzzes. P1 alert: High latency and errors in the EU region. European users are staring at empty feeds or endless loading spinners.
The documentation looks perfect—too perfect. Clean markdown formatting, clear dependency listings, and right there in the troubleshooting section: “To disable Dynamic Content Prioritization, set the feature flag ENABLE_DYNAMIC_CONTENT_PRIORITIZATION to false in the central configuration service.”
Simple enough. Except when Jessie searches the feature flag console, the flag doesn’t exist. She tries creating it manually. Nothing changes. Maybe it’s an environment variable? Still nothing.
Twenty precious minutes wasted before she looks at the actual code. The new prioritization logic is right there in the constructor, being called directly in getUserFeed(). No feature flag. No environment variable. No configuration at all.
AI had written documentation for a safety mechanism that didn’t exist.
It had pattern-matched their usual deployment practices and confidently documented them without checking the actual implementation. The documentation described what should have been built, not what was actually built.
The lesson: All feature documentation requires code verification. Implement a documentation verification pipeline. Cross-reference with actual code. Document the verification checklist itself.
Thread pool exhaustion during a planned database maintenance window. Order processing failures. Twenty-eight minutes of degraded service affecting 1,200 transactions.
But according to their architecture, a maintenance window on the analytics database should not have affected order processing at all. The system had been specifically designed to handle database maintenance through message queues and asynchronous processing.
The investigation revealed the problem: AI had “modernized” their code. What had been a simple fire-and-forget asynchronous call was now a synchronous database write. Each change looked good in isolation, but together they were eroding the system design.
When they asked AI to analyze its own code changes, it immediately identified the exact resilience patterns it had broken, complete with warnings about synchronous database calls in critical paths.
“So AI knows better,” the executive observes. “It knows what it shouldn’t do, but does it anyway unless explicitly told not to?”
Exactly. AI optimizes for what we ask in the moment. They asked it to modernize code, so it used newer APIs and delivered clean, synchronous code. They didn’t ask it to preserve resilience patterns, so it didn’t.
The lesson: AI can make architectural decisions one PR at a time. Each change looks good in isolation, but together they fundamentally alter system behavior. Establish mandatory architectural review for AI-generated changes. Document system resilience patterns explicitly and reference them in prompts.
After months of incidents—each one punctuated by Jessie’s weary refrain “No vibe coding like that while I’m on call!”—the team finally builds something better.
The transformation isn’t about abandoning AI. It’s about building guardrails:
Pre-Flight Checklists for AI-Assisted Development:
The AI Development Framework:
The Tools:
By the story’s end, Jessie is sipping chamomile tea at 9 p.m. during her on-call rotation. Her pager is silent. The dashboard shows smoothly humming services. Green lights blink serenely across the board.
The transformation took months of painful lessons, but the result is clear: AI hasn’t replaced developer judgment; it has augmented it.
The fictional story illustrates real patterns organizations are experiencing:
If you’re rushing to adopt AI code generation:
If you’re trying to do it safely:
The bottom line: In DevOps, “You build it, you run it.” But what happens when GenAI is doing most of the building? You need to run it even more carefully, with even better observability, even faster feedback loops, and even more disciplined engineering practices.
The “vibe coding” chaos can give way to structured, disciplined, AI-assisted engineering. But only if you build the guardrails before you need them—not after your on-call engineer has been paged at 2 a.m. for the fifth time in a week.
As Jessie finally learns to sleep peacefully during on-call weeks, she has one last thought: “This is how we use AI responsibly.”
And for organizations racing to adopt AI code generation? That’s the lesson worth learning before you page your on-call engineer at 2 a.m.
This blog post is based on “‘No Vibe Coding While I’m On Call’: A Story of Generative AI and Its Impact on the SDLC” by Cornelia Davis, Nathen Harvey, Gene Kim, Matt Ring, and Jessie Young, published in the Enterprise Technology Leadership Journal Fall 2025.
Managing Editor at IT Revolution working on publishing books and guidance papers for the modern business leader. I also oversee the production of the IT Revolution blog, combining the best of responsible, human-centered content with the assistance of AI tools.
No comments found
Your email address will not be published.
First Name Last Name
Δ
It's 2:06 a.m. Jessie's phone buzzes with that distinct pattern she'd set for production…
It is an exciting time to work in software engineering! AI is here to…
In 1944, the Office of Strategic Services—the precursor to the CIA—published a guide on…
A version of this article originally appeared at Hyperadaptive.Solutions. Why AI initiatives fail and…