September 29, 2025

The Vibe Coding Loop

By Gene Kim ,Steve Yegge

The following is an excerpt from the forthcoming book Vibe Coding: Building Production-Grade Software With GenAI, Chat, Agents, and Beyond by Gene Kim and Steve Yegge.

The vibe coding loop looks similar to the traditional developer loop. But when you’re coding with AI, every step becomes critical. As you’ll see soon, you can’t fall asleep at the wheel. If you do, you’ll soon wind up with frustrating and expensive rework, a theme we continue to explore throughout Part 2 of the book. Let’s talk about the vibe coding loop. Here’s what it can look like:

Frame your objective: Give your AI collaborator a clear, concise overview of what outcome you’re aiming for. Be specific about what success looks like and why you’re building it.
Decompose the tasks: Break down what you’re trying to do into clear, achievable steps. In general, the smaller the steps, the better chance AI has to succeed. Even as AI grows more capable, small steps are always a good idea. Don’t hesitate to ask it to subdivide the big tasks (e.g., “Here’s what I’m trying to do. Propose a plan.”).
Start the conversation: Ask AI to generate a plan to achieve your goal, or give it instructions to get it started, such as what you practiced in the last chapter.
Review with care: The solution your AI comes up with might look correct, but until you have established a basis for trusting it, you need to review it.
Test and verify: You’re responsible for the quality of the code, whether you wrote it or AI did. This works best when writing your tests and expectations before generating the code—advocates of test-driven development (TDD) will rejoice. Fail fast, fix fast, and ask AI to help you spot subtle mistakes that might linger unnoticed.
Refine and iterate: Continue iterating until you achieve your goal.

By the way, once you’re somewhat experienced with this vibe coding loop, there is one more critical step to add:

Automate your own workflow: Begin automating away chunks of your workflow. Any friction creates huge opportunity costs. And any time you spend typing or copying/pasting/slinging slows down your vibe coding loop. If you’re doing anything manually, that is a cost you pay every time you try to vibe code.

Automating this toil will not only make you faster but will speed up your ability to experiment and innovate. We’ll talk more about the unexpectedly high benefits of this later in the book. (Hint: It’s the O in FAAFO). And if, at any time, you’re typing a lot or manually searching through data structures, stop and ask yourself: “Could I ask AI to help with this?” The answer is usually yes, and you’ll be faster and have more fun.

War Story: Gene’s Video Excerpter

For the past fifteen years, Gene has been taking screenshots whenever he finds something interesting in podcasts or YouTube videos, hoping to revisit those moments eventually, maybe to write about someday or to further research an interesting fact. In practice, he rarely used them. It was too tedious to search through the screenshots, locate the original content, and find the exact quote he needed. The juice didn’t seem worth the squeeze. Optimistically, he held out hope that it might be someday and kept making screenshots. For fifteen years! We mentioned this story briefly in the Preface, but now we’ll show the details of how Gene was able to vibe code his way to success.

In our first vibe coding pairing session together, we set out to build something that could create video excerpts (clips) of YouTube videos directly from Gene’s screenshots. He would be able to dig up a picture and, with the click of a button, post that excerpt from the video. His new tool would also use the video transcript to add overlaid captions (subtitles) onto the clips.

We used ffmpeg, a super-powerful command-line tool that can process, convert, and manipulate video and audio files in almost any format. It’s notorious for having extremely complex command-line options and syntax, which makes the operations difficult to write and almost impossible to read afterward. With this complexity in mind, we were going to find out if AI could come to the rescue.

In the following sections, we’ll walk you through how Gene went through the vibe coding loop multiple times, using a chat assistant to build what he wanted. We recorded the forty-seven minutes it took for him to build it.

Frame the Objective

First, Gene explained to Steve what he was trying to build. He needed a tool to automate the process of creating a “highlights reel” from his extensive collection of video highlights, which were video screenshots he had taken on his phone. Before starting our session, he had converted those screenshots into the following data: the YouTube channel and video, as well as the start and end times of the video clip he wanted generated. He also had movie files and transcripts of those YouTube videos.

He aimed to create captioned video .mp4 files, with the transcript converted into subtitles that showed up in the video frame, so he could share on social media. Gene felt his thousands of screenshots were a treasure trove of the wisdom of others, of interesting research material, and of miscellaneous topics that people would be interested in. This tool would finally let him start sharing that accumulated wisdom.

Decompose the Tasks

Given the objective, Gene now needed to decompose his problem into tasks that he could implement with AI. He came up with the following tasks, which could be implemented and validated using AI:

Download the YouTube video and transcripts. (Gene had already done this using the fantastic yt-dlp.)
Extract a specified segment from the downloaded video using ffmpeg, based on the highlight’s start and end time stamps.
Extract the corresponding transcript for that segment from the existing transcript file.
Generate subtitles from the transcript text and time stamps.
Overlay those subtitles onto the video segment using ffmpeg.

For this project, Gene chose to use Claude via the Sourcegraph AI assistant inside his IntelliJ IDE, though any assistant (and any model) would have worked. This session occurred before autonomous agents, so he was vibe coding using regular chat. A skill that remains useful today with agents, because some problems will always best be solved with chat.

Gene’s vibe coding loop looked like this: He would type his prompt in the assistant window. AI would generate some code in the chat. Gene would copy and paste that answer into his editor, or in some cases, smart-apply it directly into the code with a button click. Ask, answer, integrate, over and over. And it worked! Boy, did it ever. As we shall see.

Task 1: Start Simple—Video Extraction

Gene’s first task was to extract a segment of the source video file. Here was his starting prompt:

Given an excerpt beginning and end (in seconds), give me the ffmpeg command to extract that portion of the video. Go ahead and shell out and put that into a file /tmp/output.mp4.

A short prompt, but it got the job done. No need to look up any ffmpeg documentation, no need to learn the command-line arguments, no need to learn time unit conventions. AI handled all the details. Within minutes, Gene and Steve had working code that could extract video clips. He opened the video file, and it looked great. Given the simple nature of this task, Gene decided tests were not needed. Gene was convinced that we could rely on ffmpeg working correctly, so we moved onto the next task. (You decide whether that was a good decision.)

Task 2: Processing the Transcript

Next, Gene moved on to processing the transcript data. Given the start and end time of the highlight, he needed to extract the relevant transcript portions. Here was the prompt he used:

Here’s the video transcript (it’s a JSON array of objects). Write a function that, given a list of start and end ranges, extracts all the relevant entries in the transcript.

AI generated the function, which Gene copied into his Clojure code base. Although it ran correctly, this was a nontrivial function, so we needed test cases. This function computed intersections of time ranges in the transcript and seemed to have lots of places where the code might go wrong.

Gene gave our AI assistant another prompt: “Write some tests.” It generated several interesting test cases, exercising the different ways that time ranges might overlap. And indeed, one test case failed.

This was a genuine teachable moment for both of us. Our AI assistant was sure that the failed case was due to an off-by-one error in the code. But we discovered the code itself was correct; it was the generated test cases that were wrong. So much for tests that “look good.”

This reminded us that AI is not always reliable. We had to stay vigilant and verify its answers—especially because AI almost always sounds confident and correct and explains why it’s correct in lengthy detail. In this case, it was right when it generated the initial code but completely wrong in guessing why the tests were failing.

We soon had a tested function, which, given a list of transcript start/end ranges, would correctly extract the text for that part of the transcript. So far, so good.

Task 3: Caption Generation

Finally, we needed to add captions. This meant taking the transcript file and inserting it as captions that could be seen in the video frames. This was a large enough task that we decomposed it into the following subtasks:

First, we asked ChatGPT what caption formats ffmpeg supports. (Answer: SRT and ASS formats, which neither Gene nor Steve knew about before. And now we do!)

Gene then asked ChatGPT, “Give examples of SRT and ASS transcript files.” Gene chose the SRT transcript format because it had fewer fields and looked simpler to implement. Again, there is no need to become an SRT file format specialist. We then asked ChatGPT to generate the SRT file from the transcript segments.

Gene wrote this prompt:

Write a function to transform my list of transcript entries (a JSON array) into an SRT file.

Our AI assistant generated the code to do it, and it chose a great function name (which is sometimes more difficult than writing the function). Finally, we needed the subtitle text to be placed into the video frames. We learned that ffmpeg calls these “captions.”

Modify the ffmpeg command to generate captions, using the specified SRT caption file.

If you watch the session recording, you can hear Gene gasp the moment he opens the video and sees the video excerpt with overlaid captions. We had not been vibe coding for long, barely over half an hour. And we hadn’t written many prompts. On the recording, Gene declared, “This is freaking incredible,” plus lots of expletives we had to censor out.

The Result

In a total of forty-seven minutes of pair programming using vibe coding techniques with chat, Gene had built a working video clip generator that achieved his goal:

Extract a portion of the source YouTube videos using the start/end time stamps.
Transform the podcast transcript file into caption texts and output to an SRT caption file, which ffmpeg can use as input.
Generate captioned text in the video frames using ffmpeg using the SRT caption files to overlay captions onto the extracted file.

Not bad for an hour’s work. It turned into an hour because, upon closer inspection, Gene and Steve noticed that two lines of captions were being displayed, and there was something wrong with the caption timing. They spent a few minutes trying to fix it, and then Gene promised to work on it that evening.

The next day, after Gene got his code working, he texted Steve: “Holy cow, I got this running! I had so much fun generating and posting excerpts, extracting every quote I found inspiring.” Steve had not expected that Gene—who is not a professional programmer—would have accomplished this in under an hour. Gene had finally created a way to plunder his fifteen-year-old treasure trove.

What’s better is that it turns out the video Gene was using for testing the code was a talk by Dr. Erik Meijer (whom you may recall from Part 1). When Gene posted a twelve-part series of his favorite quotes from that talk on social media, Dr. Meijer responded: “This looks amazing. Thanks for doing this. It helps grasp the talk even faster than just watching at 2x speed.”

Gene’s tweet got nearly a quarter million views. Clearly others were finding his treasure trove and excerpt format valuable. This is the kind of impact vibe coding can unlock.

Okay, if you’re super experienced, Gene’s programming feat might sound mundane. It’s mostly new code in a small code base, and the final product was smaller than what some professional developers might commit multiple times a day. Some of you could have written this whole program in a quarter of the time it took us pairing with vibe coding.

That’s fair. But it’s also not the point. The takeaway here is not “Oh ho, ha ha. AIs will never replace real programmers.” The point is that we were able to build it at all. The program never would have been written the old way, but Gene did it in under an hour (fast) with AI.

For Gene, this was a life-changing experience. Gene achieved FAAFO. He had considered this sufficiently so far from reach that he had never bothered trying (ambitious). After creating this program, he used it several times a week because it unlocked the value of thousands of interesting moments he captured while listening to podcasts. Best of all, it was fun, and it set in motion writing tons of other utilities, some of which he uses multiple times daily.

Here are some other takeaways from this early vibe coding session:

AIs are capable of handling small to medium tasks, including in less popular programming languages, and using fairly complex Unix command-line tools.
You interact with AI as if it were a senior pair programmer who’s so distracted that they can make serious mistakes from time to time.
Clojure is the future of programming languages. Ha, ha! We’re just conducting a test to see if you’re still reading. But we both do like Clojure a lot.

We did this little test in September 2024 (almost prehistoric AI times). Given all the advances in coding agents, we know we could complete this project today in a fraction of the time. A coding agent could doubtless have solved this problem in a couple of minutes. As AI improves, it will be able to handle larger and larger tasks. It’s possible that Gene’s video excerpting program could have been implemented in one shot—if not today, sometime in the future. But like when giving tasks to humans, the larger the task you have AI take on, the more that can go wrong.

The relevant skill is no longer code generation (i.e., typing out code by hand), but being able to articulate your goals clearly and create good specifications that AI can implement. Because of this, the principles here continue to apply to larger projects as AI’s capabilities scale up.

Bonus Task: Detecting the YouTube Progress Bar

In the Preface, Gene mentioned that he had his first inkling of how powerful chat programming could be as early as February 2024. While we’re talking about chat programming, here is a slightly expanded explanation of what happened.

For the non-iOS YouTube screenshots, he could ask the new ChatGPT-4 vision model to extract the current playback time displayed in the video player controls (e.g., “1:45”). But screenshots from the iOS YouTube app were different. They only showed a red progress bar with no visible time stamp. Without that timing information, he couldn’t automatically determine where in the video to create his excerpts.

On a whim, Gene typed into ChatGPT: “Here’s a YouTube screenshot. There’s a red progress bar under the video player window. Write a Clojure function that analyzes the image. March up the left side of the image to find the red progress bar.” The AI-generated code used Java 2D graphics libraries—ImageIO, BufferedImage, Color classes—which Gene had never used before. Gene hadn’t used bitmap functions since writing Microsoft C++ code in 1995. When the function correctly identified the progress bar on row 798 of the image on the first try, Gene sat slack-jawed.

Next, he extended the solution. “On that row, march right until you see a non-red pixel,” he prompted, and AI delivered code that calculated the exact playback percentage from the progress bar’s position. What would have taken him days of studying graphics APIs—if he’d attempted it at all—was working in under an hour. This code transformed thousands of iOS screenshots from unusable artifacts into valuable time stamps.

That’s what changed Gene’s life in 2024 and set the stage for his exciting adventure with Steve a year and a half later. Truly, FAAFO.

Onward

Gene’s video excerpting tool shows the vibe coding loop in action. By breaking down a complex task, collaborating with AI through conversation, and iteratively building a solution, Gene accomplished in under an hour what might never have happened otherwise.

But, as valuable as this chat-based approach proved to be, it only scratches the surface of what’s possible with vibe coding. Later in the book, we’ll examine the prompts that Gene used and show what made them effective.

Before we do that, we’ll look at what we can do with autonomous, agentic coding assistants, or “coding agents,” and how they alter the vibe coding loop.

Stay tuned for more exclusive excerpts from the upcoming book Vibe Coding: Building Production-Grade Software With GenAI, Chat, Agents, and Beyond by Gene Kim and Steve Yegge on this blog or by signing up for the IT Revolution newsletter.

- About The Authors

Gene Kim

Gene Kim has been studying high-performing technology organizations since 1999. He was the founder and CTO of Tripwire, Inc., an enterprise security software company, where he served for 13 years. His books have sold over 1 million copies—he is the WSJ bestselling author of Wiring the Winning Organization, The Unicorn Project, and co-author of The Phoenix Project, The DevOps Handbook, and the Shingo Publication Award-winning Accelerate. Since 2014, he has been the organizer of DevOps Enterprise Summit (now Enterprise Technology Leadership Summit), studying the technology transformations of large, complex organizations.

Steve Yegge

Steve Yegge is an American computer programmer and blogger known for writing about programming languages, productivity, and software culture for two decades. He has spent over thirty years in the industry, split evenly between dev and leadership roles, including nineteen years combined at Google and Amazon. Steve has written over a million lines of production code in a dozen languages, has helped build and launch many large production systems at big tech companies, has led multiple teams of up to 150 people, and has spent much of his career relentlessly focused on making himself and other developers faster and better. He is currently an Engineer at Sourcegraph working on AI coding assistants.

No comments found

with Andrew Davis and Steve Pereira

with Dominica DeGrandis

with Matthew Skelton & Manuel Pais

The Vibe Coding Loop

War Story: Gene’s Video Excerpter

Frame the Objective