Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
November 13, 2023
In this post, I interpret another one of my favorite case studies of transformations through the lens of Slowify, Simplify, and Amplify: the 2011 Operation InVersion at LinkedIn, led by Kevin Scott, now the CTO of Microsoft.
(In previous posts, I did something similar with the Nordstrom DevOps transformation, the Google leaked memo “We Have No Moat, and Neither Does OpenAI,” and the letter from Cloudflare CEO on their forty-hour outage.)
In 2011, LinkedIn engineers (and customers) suffered every two weeks as they deployed changes to Leo, one of their core systems. After their IPO, in an astonishingly courageous move, then VP of Engineering Kevin Scott launched Operation InVersion, halting all feature work for two months to overhaul the site, resulting in amazing outcomes.
On a personal note, when I first learned about Operation InVersion, I was gobsmacked. I think it was in 2009 when Hal Pomeranz would watch LinkedIn on Friday nights at 9pm PT, when search would no longer work. They promised they’d be back up within a couple of hours, and on Twitter, we’d make bets on when they’d be back up.
On some weekends, LinkedIn search wouldn’t return until late Saturday night.
Then one day, it must have been in 2011, I remember LinkedIn looked very different — the page elements all rendered independently, and suddenly, an ever-increasing number of new features started showing up.
This was made possible because of Operation InVersion. As well as its stunning growth, and later its acquisition by Microsoft in 2016 for $26 billion.
To briefly reprise, in our upcoming book Wiring the Winning Organization, Dr. Steve Spear and I present a very simple and parsimonious theory of performance, based on these three mechanisms:
Operation InVersion serves as a powerful example of all three mechanisms. Most notable is how they Simplified using modularization, which enables independence of action. This is when we take systems that are highly entwined and integrated, and decompose them into many smaller ones, which can then be changed independently of each other.
In our book, among the 23 case studies are the great Amazon re-architecture of 2001 and the massive $5 billion investment that IBM made in the 1960s to create the System/360 (that’s $20 billion in today’s dollars!).
What Amazon, the IBM System/360, and LinkedIn Operation Inversion have in common is the huge economic value that is created when we create an architecture that allows teams to work in parallel. The reduced design-time coupling means they can develop independently, and reduced run-time coupling means that services have a dramatically reduced blast-radius, preventing global chaos and disruption.
Below is an abbreviated version of the LinkedIn case study from The DevOps Handbook, 2nd Edition, followed by specific examples of the three mechanisms of Slowify, Simplify, and Amplify at work.
LinkedIn’s Operation InVersion is a case study that illustrates the need to pay down technical debt… Six months after their successful IPO in 2011, LinkedIn continued to struggle with problematic deployments that became so painful that they launched Operation InVersion, where they stopped all feature development for two months in order to overhaul their computing environments, deployments, and architecture.
LinkedIn was created in 2003 to help users “connect to your network for better job opportunities… By November 2015, LinkedIn had over 350 million members, who generated tens of thousands of requests per second, resulting in millions of queries per second on the LinkedIn back-end systems.
From the beginning, LinkedIn primarily ran on their homegrown Leo application, a monolithic Java application that served every page through servlets and managed JDBC connections to various back-end Oracle databases. However, to keep up with growing traffic… by 2010, most new development was occurring in new services, with nearly one hundred services running outside of Leo…
The problem was that Leo was only being deployed once every two weeks… Josh Clemm, a senior engineering manager at LinkedIn explained, “Leo was often going down in production; it was difficult to troubleshoot and recover, and difficult to release new code. It was clear we needed to ‘kill Leo’ and break it up into many small functional and stateless services.”
[This was because] by fall 2011, late nights were no longer a rite of passage or a bonding activity, because the problems had become intolerable. Some of LinkedIn’s top engineers, including Kevin Scott, who had joined as VP of Engineering three months before their initial public offering, decided to completely stop engineering work on new features and dedicate the whole department to fixing the site’s core infrastructure. Scott launched Operation InVersion as a way to “inject the beginnings of a cultural manifesto into his team’s engineering culture…”
Kevin Scott described one downside… “You go public, have all the world looking at you, and then we tell management that we’re not going to deliver anything new while all of engineering works on this [InVersion] project for the next two months. It was a scary thing.”
[In 2013, journalist Ashlee Vance of Bloomberg] described the massively positive results of Operation InVersion:
“…Instead of waiting weeks for their new features to make their way onto LinkedIn’s main site, engineers could develop a new service, have a series of automated systems examine the code for any bugs and issues the service might have interacting with existing features, and launch it right to the live LinkedIn site… LinkedIn’s engineering corps [now] performs major upgrades to the site three times a day.”
As Josh Clemm described in his article on scaling at LinkedIn, “[Operation InVersion] was successful in enabling the engineering agility we need to build the scalable new products we have today [In] 2010, we already had over 150 separate services. [By 2015], we have over 750 services.”
Kevin Scott stated, “Your job as an engineer and your purpose as a technology team is to help your company win… Your job is to figure out what it is that your company, your business, your marketplace, your competitive environment needs. Apply that to your engineering team in order for your company to win.”
By allowing LinkedIn to pay down nearly a decade of technical debt, Operation InVersion enabled stability and safety while setting the next stage of growth for the company… By finding and fixing problems as part of our daily work, we manage our technical debt so that we avoid these “near-death” experiences.
This case study is a good example of paying off technical debt, creating a stable and safe environment as a result. The burdens of daily workarounds were lifted and the team was able to once again focus on delivering new features to delight their customers.
Now that we have reviewed this case study from The DevOps Handbook, 2nd edition, let’s see how LinkedIn moved from the danger zone to the winning zone using the three mechanisms of slowification, simplification, and amplification:
LinkedIn paused all new feature development for two months to focus on overhauling their computing environments, deployments, and architecture. This is an example of pulling back problem-solving from fast-paced operations to more deliberate planning and practice.
The entire engineering organization focused on improving tooling and deployment, infrastructure, and developer productivity. This is an example of deliberate investment in the tools that engineers use in their daily work.
One effort was aimed at addressing issues with their monolithic Java application, Leo, which was causing the site to go down in production, making it difficult to troubleshoot, recover, and release new code.
LinkedIn’s monolithic Leo application was broken down into many smaller, functional, and stateless services. This made the system more manageable and easier to troubleshoot. Before, Leo was a single application that served every page and managed connections to various backend databases.After the InVersion effort, LinkedIn’s engineering corps was able to perform major upgrades to the site three times a day. This allowed for faster deployment of new features and improvements. Previously, Leo was only being deployed once every two weeks.
In 2010, LinkedIn had over 150 separate services. After the InVersion effort, the number of services increased to over 750. By shifting towards a more modular architecture, they were able to scale developer productivity with the number of developers.
Another result was less problematic deployments. Before, the team was often working late into the night to fix problems caused by the addition of new features.
Each service was developed and deployed independently, reducing dependencies and making the system more resilient. Previously, a problem in one part of the Leo application could potentially affect the entire system.
The problems with Leo were so significant that they were causing the site to go down in production, making it difficult to troubleshoot, recover, and release new code. These issues were amplified and escalated, resulting in making it the entire focus of the entire engineering organization for two months (through Slowification). This prevented the already large problem from becoming even larger.
LinkedIn developed a suite of software and tools to automatically examine new code for bugs and issues before it was launched to the live site. This reduced the risk of introducing bugs and sped up the development process. Before, engineers had to manually check for potential issues, or worse, discover them during deployment or in production.
The successful results of Operation InVersion were massively positive, with LinkedIn’s engineering corps now able to perform major upgrades to the site three times a day. This is an example of effective generation, transmission, reception, action, and confirmation of corrective actions.
Gene Kim is a best-selling author whose books have sold over 1 million copies. He authored the widely acclaimed book "The Unicorn Project," which became a Wall Street Journal bestseller. Additionally, he co-authored several other influential works, including "The Phoenix Project," "The DevOps Handbook," and the award-winning "Accelerate," which received the prestigious Shingo Publication Award. His latest book, “Wiring the Winning Organization,” co-authored with Dr. Steven Spear, was released in November 2023.
No comments found
Your email address will not be published.
First Name Last Name
Δ
The following post is an excerpt from the book Unbundling the Enterprise: APIs, Optionality, and…
A few years ago, Gene Kim approached me with an intriguing question: What would…
Ever since digital tools and experiences became aspects of everyday work life, there’s been…
Introduction Few retail and consumer goods companies have embraced digital transformation as comprehensively and…