LLMs and Generative AI in the enterprise.
Inspire, develop, and guide a winning organization.
Understand the unique values and behaviors of a successful organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how to enhance collaboration and performance in large-scale organizations through Flow Engineering
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
Exploring the impact of GenAI in our organizations & creating business impact through technology leadership.
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
The debate over in-office versus remote work misses a fundamental truth: high-performing teams succeed based on how they’re organized, not where they sit.
Leaders can help their organizations move from the danger zone to the winning zone by changing how they wire their organization’s social circuitry.
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
March 17, 2025
What do you do when you have a critical book deadline and need to use a tool you wrote that hasn’t worked in two years? It doesn’t deploy anymore because of some obscure error at startup in Google Cloud Run. And you haven’t touched the code in two years and don’t remember how any of this works.
Oh, and by the way, the entire data pipeline that made it so useful stopped working two years ago when Twitter limited access to their API and Zapier deprecated their integration.
It seems like you’d be completely out of luck, right?
That’s the situation I found myself in last week. But I’ve routinely been building impossible things thanks to GenAI. In fact, I’m working on a book with Steve Yegge on how everyone can achieve the impossible! (The book title we’ve finally converged upon: “The Vibe Coding Handbook: How To Engineer Production-Grade Software With GenAI, Chat, Agents, and Beyond”)
In this blog post, I’ll share how I used Claude Code two weeks ago over two days, and resurrected a critical tool that I needed, fixed bugs that have bothered me for four years (!!), and oh and by the way, I created a new Twitter data pipeline that I’ve been dreaming of for two years.
It was an exhilarating experience using Claude Code, and it cost $80 in tokens (and 6-7 hours of work).
Steve started using Claude Code the same day I did. He told me the next day, “I literally couldn’t sleep. I was tossing and turning all night long, because all I wanted to do was get up and use it to fix things. I’ve got all this pent up anger at 30 years of bugs that have been piling up in this game I wrote and now I’ve got this machine that can just fix them.”
In a tweet, he wrote, “I’ve been using Claude Code for a couple of days, and it has been absolutely ruthless in chewing through legacy bugs in my gnarly old code base. It’s like a wood chipper fueled by dollars. It can power through shockingly impressive tasks, using nothing but chat.”
That feeling resonated with me. As Dr. Erik Meijer says, using Claude Code is addictive – you put in money, it fixes your code, it builds features, and you’re powerfully motivated to give it more money to do more of it!
In this post, I’ll describe why solving this problem was so important, an experience report using Claude Code, and some lessons learned.
(PS: for Clojure fans out there: so many of the posts I’ve written were about using languages I don’t normally use: Python for data analysis, JavaScript for building a Google Docs Add-On. This time, it’s all about my normal weapon of choice: Clojure and ClojureScript.)
The program that I needed to resurrect was a Trello card management tool, which was at the center of my daily workflow for nearly six years. It was a program I built for myself to manage my TODOs, but also to manage notes for writing books.
For many things, Trello serves as my TODO list and research fact file. Most of these Trello cards are from liked tweets that came in from Zapier. Other Trello cards were generated from starred Gmails, starred articles in Feedly (my RSS reader), and some other sources, all via Zapier. I had Trello boards for my TODOs, and major projects, such as conference planning, book projects, things to research, and so forth. (It was an evolution of the David Allen Getting Things Done method I’ve been using for decades.)
I had written first written this app in Clojure and ClojureScript in 2016, and rewrote it in 2019 using the Fulcro and Fulro RAD framework — the primary goal was to make it super-easy to move cards between boards and lists. It has a ton of keyboard accelerators (like vim), easy ways to make new lists, a “most recently used” list of move targets which I could repeat with just a click, and all sorts of affordances that I made to make life easier for myself. And most importantly, it made it easy to deal with super-wide boards, something that is quite cumbersome in the Trello interface.
I was shocked today to discover that since 2016, I have processed almost 12,000 Trello cards with this app. Nearly 6,000 cards archived, and over 5,000 cards moved — which means I had read the cards, maybe pondered them, maybe even written about them, or just archived them. To me, this speaks volumes about how much I used this tool.
(In the screenshot below, you can see the boards or lists on the left, and the Trello cards on the right, with the list of all cards on the bottom.)
I pulled some stats from git (actually, Claude Code did). It’s 13K lines of code. And since 2019, I’ve added/modified/deleted 30K lines of code across 300 commits, adding various things I needed over the years. This shows how much functionality I added to help make my life easier.
Here was the data pipeline pre-2023, which was populating cards into Trello.
All that came to a crashing end when Twitter turned off most of their API in 2023, unless you paid $200/month. Zapier deprecated their Twitter integration, which turned off my biggest data source to my Trello boards. That’s when the system I had been using for a decade started to fall apart, if not cease entirely.
I mostly stopped using this tool except for the rare cases when I had to do mass Trello card operations — I hadn’t pushed any code updates in two years.
But with “The Vibe Coding Handbook” book deadline looming, and countless valuable AI coding examples appearing on Twitter daily, I needed this system operational again. I needed to get the hundreds (or thousands?) of liked Twitter posts out of my Twitter account, and come up with a way to get them on a continual basis.
I also needed to figure out how to get my Trello app deploying again — as I mentioned, it hadn’t been deployed in 2 years. New deployments resulted in an uncaught exception and a mysterious error. There were also a bunch of bugs I wanted to fix, but I didn’t remember much (or anything) about the code.
One year ago, I probably would have looked at this and given up even before starting. Just imagining all the problems I could have encountered, I would have immediately concluded that the “juice wasn’t worth the squeeze.”
But then I remembered how many things I’ve built in the last year that I could never have imagined doing before GenAI. Heck, I had even written a Google Docs Add-On a month ago! Encouraged by these successes, I started mentally priming myself for tackling this.
Below, I show the new data pipeline I created over two days — in a future post, I’ll describe how I created the new data pipeline (despite every GitHub repo and gist describing how do do this disappearing).
In this post, I’ll focus on how Claude helped me fix the Trello card management app.
I had signed up for Claude Code shortly after it was released, but was waitlisted. But when Steve texted me saying that he had started using Claude Code, I was like, wut? How?
That’s when I discovered that Anthropic just enabled anyone to start using it. Once I got it installed and started using it. Here’s instructions on how.
Claude Code is very different than the coding assistants you see in IDEs. When you start it in a terminal window shell with “claude,” it starts in a terminal window. You see a program that’s basically an interactive curses-style application. You can use up/down arrows to get history, there are live elements at the bottom of the screen.
But it’s a revelation — instead of focusing on the UI, the Anthropic team seemed to have focused on what really matters: stripping away all unnecessary affordances to explore how an LLM can solve problems with and for you.
These are some notes I took at the end of my first day:
In the beginning, Claude Code doesn’t have permission to do anything — I found myself giving it more permissions to do things on its own. I was happy to give it access to sed, grep, and head. Of course, it asks to make modifications to source code files.
I was more reluctant to let it actually run the program itself (clj -M:m-run xxx), but I decided to let it on Day 2. It had the benefit of enabling it to detect compile-time errors on its own, which was awesome.
Things got really weird when I let it do git add and git commit by itself about one hour into my first session. I feel like this is where it went totally off the rails.
I had gaven it a small task, and as a side effect of running the program, it realized it couldn’t start a not-relevant web server that was in a different namespace due to port collision. Then it started making changes to the web server. I was like, “Whoa, whoa, whoa. Stop, stop, stop. This is way beyond what I wanted you to do.”
I learned from Steve that at any time, you can always hit ESC, and it will immediately stop what it’s doing and give control back to you. It’s far more immediate than I originally thought — you hit ESC, and you instantly have control again. (It must “kill -9” whatever active Claude Code tasks are operating.)
This dynamic makes it very pleasant and confidence-inspiring. You give Claude Code tasks, probably leaf nodes in your task tree, that are reasonably bounded and defined — it grinds away and it shows you code diffs, you look at them, and just hit Enter when you think “looks good to me!”
And far from turning your brain off, which is what people associate with “vibe coding,” your brain stays constantly engaged — you need it to make sure it’s “coloring inside the lines.”
Claude Code has the notion of “context remaining” —I’m guessing that this is the context window of everything it has discovered about your codebase, your objectives and tasks. Eventually, you’ll see a notification at the bottom like “10% context remaining. /compact your session.” (They recently renamed the message. It now reads “Context left until /compact: 22%” — I think this means it will automatically compact your session for you when exhausted.)
(Steve also mentioned in a conversation that this means the entirety of this context is likely being sent to the Anthropic API each interaction, which drives up your costs. So there may be real benefits by compacting more frequently: faster inference times and cheaper.)
One time, I made the terrible mistake of typing /clear instead of /compact, which was absolutely awful. Maybe I was too tired after working on trying to deobfuscate the Twitter DOM. But when I realized what I had done, I had a “OMG, rm -rf /” level of horror. And indeed, I had to explain what I was trying to do again, what methods we’ve already tried and failed, etc. It started going down rabbit holes from hours ago. After a long day, it was just too difficult to reestablish all the context again. I quit for the day.)
(Note to Claude Code team: because context is so important, can “/clear” require explicit confirmation, and maybe even be renamed “/newsession”. And how do you “save game”? I’d love to have a “/savecontext” command, that outputs the entire context to a .txt file.)
Ah, from my notes that first day: after I accidentally cleared the context, it it started creating new endpoints on the web server that it shouldn’t even be touching, brought in a whole bunch of libraries that I didn’t want, and then wanted to commit it into git.
I was like, “No, no, no.” I called it quits for the day.
End of day cost: $12. Happily spent!
After a good night’s sleep, I started another coding session —to much delight, I got enough of a Twitter to Trello card pipeline going, and managed to get 1,200 Trello cards created from my liked tweets into MySQL.
It was time to switch to Trello card processing production mode — I needed to start reading each one of these cards, and disposition them: i.e., move them to the most relevant list, or create a new list. To make life a little easier, I managed to create a quick categorizer, using Google Gemini to look at the tweet text, and categorize it, based on the existing board lists.
After processing several hundred cards, I was very irked to run into an annoying bug that has been aggravating me for four years — every once in awhile, when I move a card, it would go to the wrong list. And when I made a new list, the new list wouldn’t show up in the dropdown list right away. And when I clicked on a list in the move history, the card would get moved to the wrong place.
I hadn’t touched this code in two years, and I know that I’ve tried to fix this bug off and on several times, and gave up each time. (I knew that it had to do with inconsistently using Clojure key names in maps — I alway run into this type of problem in larger programs, which typically tempts me to consider more strongly typed languages. More on this later.)
Looking through the logs, here are some of the prompts I used in Claude Code to zero in on the problem, starting from “total mystery” to “put in more logging” to “here are some hypotheses”:
And then the most decisive prompt, where I finally wrote down all the cases:
In less than 30 minutes, I managed to get this modal dialog box running correctly. I think all the edge failure cases were not hammered out. The dropdown box works, tab, enter, and escape keys finally work as expected, and the history of moves above the dropdown box works!
As Claude explained: “There was namespace mismatch between code (:idList) and data (:trello-card/idList)” —this problem is that the misnamed keys were littered around the code, but Claude Code had found enough of the mistakes to fix the errors I was encountering.
It was such an exhilarating feeling, using my Trello list manager, fixing problems one-by-one, making it easier and easier to do what I needed to get done. Here are some of the other prompts I wrote that day:
By the end of that day, I had for Steve a bunch of cards that he could integrate into this sections!
One problem stands out in my mind as remarkable. When selecting an item in the dropdown box, sometimes it would get sent to the wrong list. Claude Code convinced me that it was due to a delay between two stateful managed React components.
I don’t have very much experience with front-end programming, and I barely know what this means. But this definitely reminded me of issues I’ve had with managed React elements. My first encounter with this was keeping the cursor in the right place in text input boxes, which apparently continues to be a problem that people face in React — here’s a 2020 issue.
Claude recommended changing the Semantic UI React controls, to which my response was, “Whoa, whoa. Not sure if I’m up for that. I just want to fix my one little problem.” I was delighted when I suggested using a global variable instead to work around this issue, it actually worked!
But a couple of times, I was reminded why vigilance is so important while using Claude Code (and other “agentic” coding assistants): earlier, when trying to troubleshoot this issue, Claude Code was convinced that the move target list was being corrupted somehow. It then started reloading this list from the server.
I was willing to entertain that concept (despite being deeply skeptical). But I started getting 3-5 second keyboard latency when typing in the search box. Claude Code then started adding debouncing and caching. And then tried to debug debouncing.
I pulled the plug on this, and reverted back to my last git commit, and asked Claude Code to go down a different line of problem solving.
Someone described this situation as “waking up to an AI-generated room of horrors” — this is definitely something I can now relate to. As the engineer, sometimes you go where the LLM suggests. Other times, you use your experience and judgement to say, “Nope.”
One last story: I had mentioned that I could no longer deploy this code to Google Cloud Run. It was running on my laptop just fine, but it was failing with this error in Google Cloud Run: “”Received RST_STREAM: Protocol error” when fetching secrets from Google Cloud Secrets Manager.
Claude Code wasn’t able to fix this — I resorted to using OpenAI Deep Research to figure this one out. It turns out that this error was because I converted my Google Secrets Manager from the Google Java client library to using REST calls, and for some reason, I was getting this error because I was using HTTP/2 by default.
(Why does this HTTP call work in some of my programs, not not others? I have no idea.)
But on the way to fixing the issue, I asked Claude Code to comment all my calls to my Secrets Manager library — which it did… very thoroughly.
I was happily copying and pasting the new logging messages into Claude Code to isolate what was generating the error. But when I looked into the source file of my callers, I was shocked to find the code now had deeply nested try/catches — like six or seven levels deep, when they used to be two or three.
It was “fine,” but I would really like to clean this up. Someday. But for now, I have other fish to fry. (Some might say I just added to my technical debt — but it’s just a bunch of overly nested error handling. Big deal.)
But it was definitely had shades of those “waking up to AI generated horror” feelings.
End of day cost: $30. Happily spent!
I asked Claude Code to provide recommendations for completely eliminating these categories of errors, to put me on a path to a cleaner code base. We identified two areas for improvement that I’m eager to jump into, after book deadlines are over:
I was deeply involved in the UI and needed to find a way to extract that logic from the UI layer for better testing and maintenance.
The bugs I encountered were mainly due to namespace mismatches in my code. Some places used :idList while others used :trello-card/idList. This inconsistent usage resulted in nil values appearing where they shouldn’t.
Claude Code recommended creating accessor functions like get-card-id and get-list-id that would handle all possible variations:
(defn get-card-id [card] ; creates a more forgiving way to get values from map (or (:id card) (:trello-card/id card) (:card-id card) (get-in card [:trello-card/raw :id])))
I liked this because I often forget the conventions in programs that I’ve worked on for a long time. Implementing this in functions is a great way to enforce consistency.
Claude Code also suggested adding data validation to catch issues early:
(defn validate-move-operation [{:keys [card-id list-id target-list-id] :as move-op}] (cond ; this is like a case statement (nil? card-id) {:valid false :error "Missing card ID"} (nil? target-list-id) {:valid false :error "Missing target list ID"} :else {:valid true :data move-op}))
For the Move modal dialog box, Claude Code recommended refactoring the components to make them more testable:
The implementation priority would be:
For now, I have a functioning system that helps me process my research cards, which is what I needed to meet my book deadline. But these improvements would transform my “works for me” tool into something more maintainable for the long term.
Many people associate “vibe coding” for turning your brain off, and doing what the LLM says. I’m hoping that this experience report shows that what I did on those two days was the opposite of that — I was using AI as another engineering tool to achieve what I wanted.
That’s the point of the book that Steve and I are writing: The Vibe Coding Handbook: How To Engineer Production-Grade Software With GenAI, Chat, Agents, and Beyond — there are times you want to prototype something and throw away, there are times you want to solve a problem for yourself (and only you have the consequences when something goes wrong), and there are times when you are supporting mission-critical services support many, many people.
There is a way to use vibe coding and engineering tactics for each scenario. So, happy vibe coding!
PS: For those of you interested in Claude Code, here’s a 1.25 hour video that Steve Yegge and I recorded last Friday — Steve used it to convert a 20 year old, 2500 line Ruby admin script to Kotlin/Gradle.
It was super impressive — this was something we tried two months ago using more traditional CHOP/vibe coding methods, but we gave up after 90 minutes. In a little over an hour, Claude Code converted over some of the most challenging tasks: the MySQL and Redis admin commands, logging into the sandbox environment. Wow! (Link: https://www.youtube.com/watch?v=HtqxI53h7zM)
My writeup on that experience coming next!
In The Vibe Coding Handbook, we describe a spectrum of modalities of coding with GenAI. The first is “pairing,” where you are working with the AI to achieve a goal — it really is like pair programming with another person, if that person was at the same time like a “summer intern that also believes in conspiracy theories” (as coined by Simon Willison) and the world’s best architect.
The second modality is “delegating,” (which I think many will associate with “agentic coding”), where you ask the AI to do something and it does so without any human interaction.
You delegate when you’re sure the task novelty is not terribly high, when the AI has demonstrated successes of these types of task in the past, that the person (and the AI) has the appropriate level of skill, and that the consequences should something go wrong are not terribly high…
These dimensions dictate the frequency of reporting and feedback you need. (If these sound familiar, these are the dimensions that Dr. Andy Grove defined in his book, “High Output Management,” when he described the frequency of reporting needed for people!)
What is interesting to me is the conditions when you can get away with “extreme delegation” — when you give who you are delegating to a tremendous amount of latitude. This likely means the ability to execute something with little oversight, with little and infrequent communication, and a great deal of trust.
Two domains that has studied this in tremendous detail is in military command, especially naval command. In the movie, “Master and Commander: Far Side of the World,” it’s pretty astonishing that the entirety of Captain Jack Aubrey’s orders from the British Admiralty were, “Find the [French ship] Acheron, sink her or take her as a prize.” (It is based on a series of novels by Patrick O’Brian.)
That order, supposedly given around 1805 during the height of the Napleonic Wars, reflected the reality of communications at the time — there was no ability to communicate with the Admiralty until they got back to port, maybe over a year later. It is evidence of the trust and discretion given to naval commanders. This was very much in the tradition of the Admiralty during the Napoleonic Wars, and captures this era of high autonomy and extreme delegation.
There was another example of this that struck me last year, while watching with my family the latest Hulu incarnation of James Clavell’s Shōgun. The part that really got my attention was not a central character: the Portuguese Black Ship, captained by Captain Ferreira —inspired by the real-life “Kurofune” of the late 1500s—carried astonishingly valuable cargo between Macau and Nagasaki. These were the ships that took advantage of the Portuguese trade monopoly throughout Asia.
Thanks to repeated chat sessions with ChatGPT, I was shocked to learn that this was one of the most valuable cargo ships in the world. Each voyage transported silk, porcelain, spices, gold, and silver worth approximately $300-500 million in today’s USD—potentially approaching $1 billion when accounting for the enormous profit margins these luxury goods commanded in Japan.
To put this into perspective, the largest VLCC oil supertankers carry oil valued at “only” $160-300 million.
But what is astonishing is how much more authority Captain Ferriera has than even these VLCC supertanker captains.
Last night, I asked ChatGPT to speculate on what Captain Ferriera’s orders were, it responded:
Captain Ferreira’s orders in Shōgun were likely short and strategic, similar in style to Captain Jack Aubrey’s orders in Master and Commander. Given that no further communication with the Viceroy of Goa or Lisbon was possible, Ferreira would have operated under standing orders issued before departure from Macau or Goa.
“Deliver the cargo to Nagasaki. Ensure the Portuguese monopoly remains unchallenged. Neutralize any Dutch or English threats. Protect the Jesuit mission. Avoid war with the daimyōs unless absolutely necessary.”
I think I would have been quick to dismiss this as fanciful — but apparently, US naval commanders going into war zones often have similarly short orders, along the lines of: don’t start a war, don’t lose your ship, and don’t get your sailors killed.
Enclosed below are some AI-generated guidance on when full delgation is okay versus active pairing versus consulting the LLM as an expert.
From my experience with Claude Code, here are warning signs that you’ve given it too much autonomy:
Gene Kim has been studying high-performing technology organizations since 1999. He was the founder and CTO of Tripwire, Inc., an enterprise security software company, where he served for 13 years. His books have sold over 1 million copies—he is the WSJ bestselling author of Wiring the Winning Organization, The Unicorn Project, and co-author of The Phoenix Project, The DevOps Handbook, and the Shingo Publication Award-winning Accelerate. Since 2014, he has been the organizer of DevOps Enterprise Summit (now Enterprise Technology Leadership Summit), studying the technology transformations of large, complex organizations.
No comments found
Your email address will not be published.
First Name Last Name
Introduction What do you do when you have a critical book deadline and need…
In today's fast-paced organizations, a leader's key role is enabling teams to deliver value…
So, I'm writing a book with Steve Yegge (famous for his 20 years at…