Skip to content

November 13, 2023

Case Study: LinkedIn’s 2011 Operation InVersion through the Lens of Slowify, Simplify, and Amplify

By Gene Kim

In this post, I interpret another one of my favorite case studies of transformations through the lens of Slowify, Simplify, and Amplify: the 2011 Operation InVersion at LinkedIn, led by Kevin Scott, now the CTO of Microsoft.

(In previous posts, I did something similar with the Nordstrom DevOps transformation, the Google leaked memo “We Have No Moat, and Neither Does OpenAI,” and the letter from Cloudflare CEO on their forty-hour outage.)

In 2011, LinkedIn engineers (and customers) suffered every two weeks as they deployed changes to Leo, one of their core systems. After their IPO, in an astonishingly courageous move, then VP of Engineering Kevin Scott launched Operation InVersion, halting all feature work for two months to overhaul the site, resulting in amazing outcomes.

On a personal note, when I first learned about Operation InVersion, I was gobsmacked. I think it was in 2009 when Hal Pomeranz would watch LinkedIn on Friday nights at 9pm PT, when search would no longer work. They promised they’d be back up within a couple of hours, and on Twitter, we’d make bets on when they’d be back up.

On some weekends, LinkedIn search wouldn’t return until late Saturday night.

Then one day, it must have been in 2011, I remember LinkedIn looked very different — the page elements all rendered independently, and suddenly, an ever-increasing number of new features started showing up.

This was made possible because of Operation InVersion. As well as its stunning growth, and later its acquisition by Microsoft in 2016 for $26 billion.

To briefly reprise, in our upcoming book Wiring the Winning Organization, Dr. Steve Spear and I present a very simple and parsimonious theory of performance, based on these three mechanisms:

  • Slowify (i.e., “slow down to speed up”) to make it easier and more forgiving to solve problems.
  • Simplify (i.e., partition problems in time and space) to split apart large problems to make them easier to solve, most likely in parallel.
  • Amplify (among other things, weak signals of failure) to make it obvious that problems need to be solved and that they were successfully resolved.

Operation InVersion serves as a powerful example of all three mechanisms. Most notable is how they Simplified using modularization, which enables independence of action. This is when we take systems that are highly entwined and integrated, and decompose them into many smaller ones, which can then be changed independently of each other.

In our book, among the 23 case studies are the great Amazon re-architecture of 2001 and the massive $5 billion investment that IBM made in the 1960s to create the System/360 (that’s $20 billion in today’s dollars!).

What Amazon, the IBM System/360, and LinkedIn Operation Inversion have in common is the huge economic value that is created when we create an architecture that allows teams to work in parallel. The reduced design-time coupling means they can develop independently, and reduced run-time coupling means that services have a dramatically reduced blast-radius, preventing global chaos and disruption.

Below is an abbreviated version of the LinkedIn case study from The DevOps Handbook, 2nd Edition, followed by specific examples of the three mechanisms of Slowify, Simplify, and Amplify at work.

Abridged LinkedIn Case Study From DevOps Handbook

LinkedIn’s Operation InVersion is a case study that illustrates the need to pay down technical debt… Six months after their successful IPO in 2011, LinkedIn continued to struggle with problematic deployments that became so painful that they launched Operation InVersion, where they stopped all feature development for two months in order to overhaul their computing environments, deployments, and architecture.

LinkedIn was created in 2003 to help users “connect to your network for better job opportunities… By November 2015, LinkedIn had over 350 million members, who generated tens of thousands of requests per second, resulting in millions of queries per second on the LinkedIn back-end systems.

From the beginning, LinkedIn primarily ran on their homegrown Leo application, a monolithic Java application that served every page through servlets and managed JDBC connections to various back-end Oracle databases. However, to keep up with growing traffic… by 2010, most new development was occurring in new services, with nearly one hundred services running outside of Leo…

The problem was that Leo was only being deployed once every two weeks… Josh Clemm, a senior engineering manager at LinkedIn explained, “Leo was often going down in production; it was difficult to troubleshoot and recover, and difficult to release new code. It was clear we needed to ‘kill Leo’ and break it up into many small functional and stateless services.”

[This was because] by fall 2011, late nights were no longer a rite of passage or a bonding activity, because the problems had become intolerable. Some of LinkedIn’s top engineers, including Kevin Scott, who had joined as VP of Engineering three months before their initial public offering, decided to completely stop engineering work on new features and dedicate the whole department to fixing the site’s core infrastructure. Scott launched Operation InVersion as a way to “inject the beginnings of a cultural manifesto into his team’s engineering culture…”

Kevin Scott described one downside… “You go public, have all the world looking at you, and then we tell management that we’re not going to deliver anything new while all of engineering works on this [InVersion] project for the next two months. It was a scary thing.”

[In 2013, journalist Ashlee Vance of Bloomberg] described the massively positive results of Operation InVersion:

“…Instead of waiting weeks for their new features to make their way onto LinkedIn’s main site, engineers could develop a new service, have a series of automated systems examine the code for any bugs and issues the service might have interacting with existing features, and launch it right to the live LinkedIn site… LinkedIn’s engineering corps [now] performs major upgrades to the site three times a day.”

As Josh Clemm described in his article on scaling at LinkedIn, “[Operation InVersion] was successful in enabling the engineering agility we need to build the scalable new products we have today [In] 2010, we already had over 150 separate services. [By 2015], we have over 750 services.”

Kevin Scott stated, “Your job as an engineer and your purpose as a technology team is to help your company win… Your job is to figure out what it is that your company, your business, your marketplace, your competitive environment needs. Apply that to your engineering team in order for your company to win.”

By allowing LinkedIn to pay down nearly a decade of technical debt, Operation InVersion enabled stability and safety while setting the next stage of growth for the company… By finding and fixing problems as part of our daily work, we manage our technical debt so that we avoid these “near-death” experiences.

This case study is a good example of paying off technical debt, creating a stable and safe environment as a result. The burdens of daily workarounds were lifted and the team was able to once again focus on delivering new features to delight their customers.

LinkedIn’s Transformation Viewed through the Three Mechanisms of Wiring the Winning Organization

Now that we have reviewed this case study from The DevOps Handbook, 2nd edition, let’s see how LinkedIn moved from the danger zone to the winning zone using the three mechanisms of slowification, simplification, and amplification:

Slowification

LinkedIn paused all new feature development for two months to focus on overhauling their computing environments, deployments, and architecture. This is an example of pulling back problem-solving from fast-paced operations to more deliberate planning and practice.

The entire engineering organization focused on improving tooling and deployment, infrastructure, and developer productivity. This is an example of deliberate investment in the tools that engineers use in their daily work.

One effort was aimed at addressing issues with their monolithic Java application, Leo, which was causing the site to go down in production, making it difficult to troubleshoot, recover, and release new code.

Simplification

LinkedIn’s monolithic Leo application was broken down into many smaller, functional, and stateless services. This made the system more manageable and easier to troubleshoot. Before, Leo was a single application that served every page and managed connections to various backend databases.
After the InVersion effort, LinkedIn’s engineering corps was able to perform major upgrades to the site three times a day. This allowed for faster deployment of new features and improvements. Previously, Leo was only being deployed once every two weeks.

In 2010, LinkedIn had over 150 separate services. After the InVersion effort, the number of services increased to over 750. By shifting towards a more modular architecture, they were able to scale developer productivity with the number of developers.

Another result was less problematic deployments. Before, the team was often working late into the night to fix problems caused by the addition of new features.

Each service was developed and deployed independently, reducing dependencies and making the system more resilient. Previously, a problem in one part of the Leo application could potentially affect the entire system.

Amplification

The problems with Leo were so significant that they were causing the site to go down in production, making it difficult to troubleshoot, recover, and release new code. These issues were amplified and escalated, resulting in making it the entire focus of the entire engineering organization for two months (through Slowification). This prevented the already large problem from becoming even larger.

LinkedIn developed a suite of software and tools to automatically examine new code for bugs and issues before it was launched to the live site. This reduced the risk of introducing bugs and sped up the development process. Before, engineers had to manually check for potential issues, or worse, discover them during deployment or in production.

The successful results of Operation InVersion were massively positive, with LinkedIn’s engineering corps now able to perform major upgrades to the site three times a day. This is an example of effective generation, transmission, reception, action, and confirmation of corrective actions.

            - About The Authors
            Avatar photo

            Gene Kim

            Gene Kim has been studying high-performing technology organizations since 1999. He was the founder and CTO of Tripwire, Inc., an enterprise security software company, where he served for 13 years. His books have sold over 1 million copies—he is the WSJ bestselling author of Wiring the Winning Organization, The Unicorn Project, and co-author of The Phoenix Project, The DevOps Handbook, and the Shingo Publication Award-winning Accelerate. Since 2014, he has been the organizer of DevOps Enterprise Summit (now Enterprise Technology Leadership Summit), studying the technology transformations of large, complex organizations.

            Follow Gene on Social Media

            No comments found

            Leave a Comment

            Your email address will not be published.



            Jump to Section

              More Like This

              The Missing Link in Your Industry 4.0 Strategy: Industrial DevOps
              By Summary by IT Revolution

              As manufacturers embrace Industry 4.0, many find that implementing new technologies isn't enough to…

              The Original Disruptor of the Music Industry
              By Matt McLarty , Stephen Fishman

              I know. You’re thinking I'm talking about Napster, right? Nope. Napster was launched in…

              From Turbulence to Transformation: A CIO’s Journey at Southwest Airlines
              By Summary by IT Revolution

              When Southwest Airlines' crew scheduling system became overwhelmed during the 2022 holiday season, the…

              High Stakes Communication: The Four Pillars of Effective Leadership Communication
              By Summary by IT Revolution

              You've been there before: standing in front of your team, announcing a major technological…