Skip to content

September 29, 2016

CASE STUDY: The Birth of Automated Testing at Google in 2005

By Gene Kim

The DevOps Handbook (download the free excerpt) is now available. This is one of over 40 case studies you will find in the book.

google case study

The inside story on the problems the GWS team at Google needed to solve in 2005 that resulted in one of the most admired cultures of automated testing around.

By implementing the right principles and patterns, Development and QA are using production-like environments in their daily work, and we are successfully integrating and running our code into a production-like environment for every feature that is accepted, with all changes checked into version control.

However, we are likely to get undesired outcomes if we find and fix errors in a separate test phase, executed by a separate QA department only after all development has been completed. And, if testing is only performed a few times a year, developers learn about their mistakes months after they introduced the change that caused the error. By then, the link between cause and effect has likely faded, solving the problem requires firefighting and archaeology, and, worst of all, our ability to learn from the mistake and integrate it into our future work is significantly diminished.

Automated testing addresses another significant and unsettling problem. Gary Gruver observes that “without automated testing, the more code we write, the more time and money is required to test our code—in most cases, this is a totally unscalable business model for any technology organization.”

Although Google now undoubtedly exemplifies a culture that values automated testing at scale, this wasn’t always the case.

In 2005, when Mike Bland joined the organization, deploying to was often extremely problematic, especially for the Google Web Server (GWS) team.

As Bland explains:

“The GWS team had gotten into a position in the mid 2000s where it was extremely difficult to make changes to the web server, a C++ application that handled all requests to Google’s home page and many other Google web pages. As important and prominent as was, being on the GWS team was not a glamorous assignment—it was often the dumping ground for all the different teams who were creating various search functionality, all of whom were developing code independently of each other. They had problems such as builds and tests taking too long, code being put into production without being tested, and teams checking in large, infrequent changes that conflicted with those from other teams.”

The consequences of this were large—search results could have errors or become unacceptably slow, affecting thousands of search queries on

The potential result was not only loss of revenue, but customer trust.

Bland describes how it affected developers deploying changes, “Fear became the mind-killer. Fear stopped new team members from changing things because they didn’t understand the system. But fear also stopped experienced people from changing things because they understood it all too well.” 

Bland was part of the group that was determined to solve this problem.

(Bland explained further that at Google, one of the consequences of having so many talented developers was that it created “imposter syndrome,” a term coined by psychologists to informally describe people who are unable to internalize their accomplishments. Wikipedia states that “despite external evidence of their competence, those exhibiting the syndrome remain convinced that they are frauds and do not deserve the success they have achieved. Proof of success is dismissed as luck, timing, or as a result of deceiving others into thinking they are more intelligent and competent than they believe themselves to be.”)

GWS team lead Bharat Mediratta believed automated testing would help.

As Bland describes, “They created a hard line: no changes would be accepted into GWS without accompanying automated tests. They set up a continuous build and religiously kept it passing. They set up test coverage monitoring and ensured that their level of test coverage went up over time. They wrote up policy and testing guides, and insisted that contributors both inside and outside the team follow them.”

The results were startling.

As Bland notes:

“GWS quickly became one of the most productive teams in the company, integrating large numbers of changes from different teams every week while maintaining a rapid release schedule. New team members were able to make productive contributions to this complex system quickly, thanks to good test coverage and code health. Ultimately, their radical policy enabled the home page to quickly expand its capabilities and thrive in an amazingly fast-moving and competitive technology landscape.”

But GWS was still a relatively small team in a large and growing company.

The team wanted to expand these practices across the entire organization. Thus, the Testing Grouplet was born.

They were an informal group of engineers who wanted to elevate automated testing practices across the entire organization. Over the next five years, they helped replicate this culture of automated testing across all of Google.

(The story of how they did this is presented in Part 5: over the next three years, they created training programs, pushed the famous Testing on the Toilet newsletter [which they posted in the bathrooms], the Test Certified roadmap and certification program, and led multiple “fix-it” days [i.e., improvement blitzes], which helped teams improve their automated testing processes so they could replicate the amazing outcomes that the GWS team was able to achieve).

Now when any Google developer commits code, it is automatically run against a suite of hundreds of thousands of automated tests.

If the code passes, it is automatically merged into trunk, ready to be deployed into production. Many Google properties build hourly or daily, then pick which builds to release; others adopt a continuous “Push on Green” delivery philosophy.

The stakes are higher than ever—a single code deployment error at Google can take down every property, all at the same time (such as a global infrastructure change or when a defect is introduced into a core library that every property depends upon).

Eran Messeri, an engineer in the Google Developer Infrastructure group, notes:

“Large failures happen occasionally. You’ll get a ton of instant messages and engineers knocking on your door. [When the deployment pipeline is broken,] we need to fix it right away, because developers can no longer commit code. Consequently, we want to make it very easy to roll back.”

What enables this system to work at Google is engineering professionalism and a high-trust culture that assumes everyone wants to do a good job, as well as the ability to detect and correct issues quickly.

Messeri explains:

“There are no hard policies at Google, such as, ‘If you break production for more than ten projects, you have an SLA to fix the issue within ten minutes.’ Instead, there is mutual respect between teams and an implicit agreement that everyone does whatever it takes to keep the deployment pipeline running. We all know that one day, I’ll break your project by accident; the next day, you may break mine.”

What Mike Bland and the Testing Grouplet team achieved has made Google one of the most productive technology organizations in the world.

By 2013, automated testing and continuous integration at Google enabled over four thousand small teams to work together and stay productive, all simultaneously developing, integrating, testing, and deploying their code into production.

All their code is in a single, shared repository, made up of billions of files, all being continuously built and integrated, with 50% of their code being changed each month. Some other impressive statistics on their performance include:

  • 40,000 code commits/day
  • 50,000 builds/day (on weekdays, this may exceed 90,000)
  • 120,000 automated test suites
  • 75 million test cases run daily
  • 100+ engineers working on the test engineering, continuous integration, and release engineering tooling to increase developer productivity (making up 0.5% of the R&D workforce)

Note from Gene: The story that Mike Bland told about the genesis of automated testing at Google in 2005 was so amazing that I asked him to tell this story at DevOps Enterprise Summit 2015. You can find his video here:

Endnotes and citations:

- About The Authors
Avatar photo

Gene Kim

Gene Kim is a Wall Street Journal bestselling author, researcher, and multiple award-winning CTO. He has been studying high-performing technology organizations since 1999 and was the founder and CTO of Tripwire for 13 years. He is the author of six books, The Unicorn Project (2019), and co-author of the Shingo Publication Award winning Accelerate (2018), The DevOps Handbook (2016), and The Phoenix Project (2013). Since 2014, he has been the founder and organizer of DevOps Enterprise Summit, studying the technology transformations of large, complex organizations.

Follow Gene on Social Media

No comments found

Leave a Comment

Your email address will not be published.

Jump to Section

    More Like This

    The Role of the Software Architect in Agile Medical Device Development
    By Summary by IT Revolution

    In a recent presentation at the 2024 Enterprise Technology Leadership Summit Virtual Europe, Tom…

    Calculating the ROI of Flow Engineering
    By Steve Pereira , Andrew Davis

    This post is adapted from the book Flow Engineering: From Value Stream Mapping to Effective…

    What to Expect at Enterprise Technology Leadership Summit Las Vegas 2024

    Holy cow, Enterprise Technology Leadership Summit Las Vegas is happening in August, which is…

    Transforming Telenet’s Operating Model: Insights from a Multi-Year Journey
    By Summary by IT Revolution

    In a recent presentation at the 2024 Enterprise Technology Leadership Summit Virtual Europe, Barbara…