Skip to content

February 2, 2026

“No Vibe Coding While I’m On Call”: What Happens When AI Writes Your Production Code

By Leah Brown

It’s 2:06 a.m. Jessie’s phone buzzes with that distinct pattern she’d set for production alerts. In her decade of on-call rotations, she’d learned that nothing good happened at 2 in the morning. The alert message confirmed her fears: unusual network traffic, suspicious IP addresses. The new AI-generated payment processing code—deployed just hours earlier—was failing spectacularly.

This is the opening of “No Vibe Coding While I’m On Call,” a fictional but painfully realistic story by Cornelia Davis, Nathen Harvey, Gene Kim, Matt Ring, and Jessie Young published in the Fall 2025 Enterprise Technology Leadership Journal. Through a series of production incidents at a fictional company called Meowseratti (a luxury pet stroller company expanding into insurance), the authors illustrate what happens when organizations embrace AI code generation without understanding the risks.

The story isn’t just entertainment—it’s a warning wrapped in narrative. Every incident in the story mirrors real patterns the authors have witnessed in organizations rushing to adopt “vibe coding” practices.

The Setup: When Speed Trumps Caution

Meowseratti announces its expansion into pet insurance, and executives want to move fast. Really fast. The pressure is on: “Every day we wait is another day of missed revenue.”

Steve, an enthusiastic engineer, proposes using AI to generate a basic customer portal overnight. “We use AI to generate a basic customer portal tonight. Just enough to collect credit cards and issue policies. We can add the fancy features later.”

Jessie, the skeptical on-call engineer, raises the obvious concern: “Tell me we’re at least going to test these AI-generated systems before we start taking people’s money.”

But the momentum is unstoppable. Steve’s response captures the dangerous optimism: “Come on, we’re just collecting emails and credit cards. AI’s gotten really good at this basic stuff. I say we deploy as soon as we have something working—we can always roll back if there’s a problem.”

That phrase—”we can always roll back”—becomes the recurring theme as incident after incident proves that rollback isn’t a strategy; it’s an admission of failure.

Incident #1: The Linting Disaster

Jessie is having dinner with friends when her phone starts buzzing with that dreaded pattern: buzz-buzz-pause-buzz. The universal rhythm of systems falling apart.

SEV-1 alerts. Production API errors spiking. Payment flows failing. The logs point to paymentProcessor.ts—a critical file that handles bringing money into the company. The culprit? A commit titled “Fix linting issues across codebase” that touched 103 files.

The change looked innocent: a simple refactor from let to const, transforming five lines of if-else logic into a clean ternary. Green tests across the board. But buried in an error trace was the smoking gun: SyntaxError: Assignment to constant variable.

AI had optimized locally without understanding the global context. That status variable got reassigned later under specific conditions, in three different places throughout the payment flow. AI was optimizing for code elegance without understanding the system’s actual behavior.

The post-incident review revealed the deeper problem. When they asked AI to analyze its own changes, it immediately identified them as risky—especially for payment processing code. It pointed out every place where the const conversion could conflict with existing code patterns.

“So the LLM made the changes, even though it knew it was risky?” someone asked.

“It didn’t know when it was making the changes because we didn’t ask it to look,” Jessie explained. “This is what happens when we treat AI like a magic ‘fix everything’ button instead of a sophisticated but limited tool.”

The lesson: Break work into smaller, verifiable chunks. No more “complete refactor” requests. Each task should have clear, measurable completion criteria. And most critically—use AI to analyze its own work before accepting changes.

Incident #2: The Test Coverage Illusion

Jessie’s at a concert, finally taking a break after the week she’d had, when her phone buzzes. Response times are climbing. The new feature deployed earlier that day—one where they’d been “experimenting with AI-assisted development”—is failing spectacularly.

The dashboard shows a cascade of 500 errors. Users aren’t getting expected results. But here’s the kicker: all the new tests are passing with flying colors.

The commit history shows strings of merges dominated by tests. Hundreds of new assertions, all green. The test coverage metrics looked impressive. But test coverage isn’t the same as meaningful testing.

Rolling back the changes took priority. After stabilizing, Jessie realized the uncomfortable truth: They’d automated their way into a false sense of security. AI had generated tests that checked what the code did, not what the code should do.

The lesson: Always start with a human-reviewed test plan. Get AI to enumerate ALL test cases first, explicitly requesting edge cases and error conditions. Most importantly—review test coverage against actual requirements, not metrics. High test coverage does not equal meaningful test coverage.

Incident #3: Documentation That Documents Features That Don’t Exist

6:45 a.m. Jessie’s phone buzzes. P1 alert: High latency and errors in the EU region. European users are staring at empty feeds or endless loading spinners.

The documentation looks perfect—too perfect. Clean markdown formatting, clear dependency listings, and right there in the troubleshooting section: “To disable Dynamic Content Prioritization, set the feature flag ENABLE_DYNAMIC_CONTENT_PRIORITIZATION to false in the central configuration service.”

Simple enough. Except when Jessie searches the feature flag console, the flag doesn’t exist. She tries creating it manually. Nothing changes. Maybe it’s an environment variable? Still nothing.

Twenty precious minutes wasted before she looks at the actual code. The new prioritization logic is right there in the constructor, being called directly in getUserFeed(). No feature flag. No environment variable. No configuration at all.

AI had written documentation for a safety mechanism that didn’t exist.

It had pattern-matched their usual deployment practices and confidently documented them without checking the actual implementation. The documentation described what should have been built, not what was actually built.

The lesson: All feature documentation requires code verification. Implement a documentation verification pipeline. Cross-reference with actual code. Document the verification checklist itself.

Incident #4: The Architectural Drift

Thread pool exhaustion during a planned database maintenance window. Order processing failures. Twenty-eight minutes of degraded service affecting 1,200 transactions.

But according to their architecture, a maintenance window on the analytics database should not have affected order processing at all. The system had been specifically designed to handle database maintenance through message queues and asynchronous processing.

The investigation revealed the problem: AI had “modernized” their code. What had been a simple fire-and-forget asynchronous call was now a synchronous database write. Each change looked good in isolation, but together they were eroding the system design.

When they asked AI to analyze its own code changes, it immediately identified the exact resilience patterns it had broken, complete with warnings about synchronous database calls in critical paths.

“So AI knows better,” the executive observes. “It knows what it shouldn’t do, but does it anyway unless explicitly told not to?”

Exactly. AI optimizes for what we ask in the moment. They asked it to modernize code, so it used newer APIs and delivered clean, synchronous code. They didn’t ask it to preserve resilience patterns, so it didn’t.

The lesson: AI can make architectural decisions one PR at a time. Each change looks good in isolation, but together they fundamentally alter system behavior. Establish mandatory architectural review for AI-generated changes. Document system resilience patterns explicitly and reference them in prompts.

The Path Forward: From Chaos to Discipline

After months of incidents—each one punctuated by Jessie’s weary refrain “No vibe coding like that while I’m on call!”—the team finally builds something better.

The transformation isn’t about abandoning AI. It’s about building guardrails:

Pre-Flight Checklists for AI-Assisted Development:

  • Small, verifiable tasks with explicit operational requirements
  • Architectural oversight to ensure AI suggestions align with resilience patterns
  • AI-on-AI analysis: use AI to critique its own work before accepting changes
  • Mandatory security, scalability, and error handling specifications upfront

The AI Development Framework:

  • Detailed specifications drafted collaboratively with AI, including operational requirements
  • “Prompt libraries” with proven patterns that preserve system architecture
  • Automated checks that flag when AI-generated code violates architectural principles
  • Real-time cross-functional ensemble reviews where product, security, QA, and operations evaluate AI contributions together

The Tools:

  • Fast, automated, local tests that run in seconds
  • Observability built into every feature from day one
  • Canary releases with automated rollback based on observable metrics
  • Operations data made available to AI agents for analysis

By the story’s end, Jessie is sipping chamomile tea at 9 p.m. during her on-call rotation. Her pager is silent. The dashboard shows smoothly humming services. Green lights blink serenely across the board.

The transformation took months of painful lessons, but the result is clear: AI hasn’t replaced developer judgment; it has augmented it.

What This Means for You

The fictional story illustrates real patterns organizations are experiencing:

If you’re rushing to adopt AI code generation:

  • Your first incident won’t be your last
  • “We can always roll back” isn’t a strategy
  • AI optimizes locally without understanding your system globally
  • Test coverage metrics lie—quality matters more than quantity
  • AI will confidently document features that don’t exist
  • Small architectural changes compound into system-level erosion

If you’re trying to do it safely:

  • Break every AI request into small, verifiable chunks (maximum 5 test cases per prompt, clear completion criteria)
  • Use AI to analyze its own work before accepting changes
  • Verify documentation against actual code implementation
  • Establish architectural review for changes that affect system behavior
  • Build observability into features from day one, not as an afterthought
  • Create ensemble reviews where multiple disciplines evaluate AI contributions in real time

The bottom line: In DevOps, “You build it, you run it.” But what happens when GenAI is doing most of the building? You need to run it even more carefully, with even better observability, even faster feedback loops, and even more disciplined engineering practices.

The “vibe coding” chaos can give way to structured, disciplined, AI-assisted engineering. But only if you build the guardrails before you need them—not after your on-call engineer has been paged at 2 a.m. for the fifth time in a week.

As Jessie finally learns to sleep peacefully during on-call weeks, she has one last thought: “This is how we use AI responsibly.”

And for organizations racing to adopt AI code generation? That’s the lesson worth learning before you page your on-call engineer at 2 a.m.


This blog post is based on “‘No Vibe Coding While I’m On Call’: A Story of Generative AI and Its Impact on the SDLC” by Cornelia Davis, Nathen Harvey, Gene Kim, Matt Ring, and Jessie Young, published in the Enterprise Technology Leadership Journal Fall 2025.

- About The Authors
Leah Brown

Leah Brown

Managing Editor at IT Revolution working on publishing books and guidance papers for the modern business leader. I also oversee the production of the IT Revolution blog, combining the best of responsible, human-centered content with the assistance of AI tools.

Follow Leah on Social Media

No comments found

Leave a Comment

Your email address will not be published.



Jump to Section

    More Like This

    “No Vibe Coding While I’m On Call”: What Happens When AI Writes Your Production Code
    By Leah Brown

    It's 2:06 a.m. Jessie's phone buzzes with that distinct pattern she'd set for production…

    3 Things the Human Brain Teaches Us About Agentic Automation
    By Matt McLarty

    It is an exciting time to work in software engineering! AI is here to…

    How the 1944 CIA Sabotage Manual Accidentally Became Your Software Acquisition Playbook
    By Leah Brown

    In 1944, the Office of Strategic Services—the precursor to the CIA—published a guide on…

    The 5 Stages of Becoming AI-Native: The Hyperadaptive Model
    By Melissa M. Reeve

    A version of this article originally appeared at Hyperadaptive.Solutions. Why AI initiatives fail and…