Skip to content

February 25, 2021

Measure Software Delivery Performance with Four Key Metrics

By Nicole Forsgren ,Gene Kim ,Jez Humble

This post has been adapted from Accelerate: The Science of Lean Software and DevOps by Nicole Forsgren, PhD, Jez Humble, and Gene Kim.

There are many frameworks and methodologies that aim to improve the way we build software products and services. We wanted to discover what works and what doesn’t in a scientific way, starting with a definition of what “good” means in this context. This post presents the four key metrics to measure software delivery performance.

MEASURING SOFTWARE DELIVERY PERFORMANCE

Measuring software delivery performance is hard—in part because, unlike manufacturing, the inventory is invisible. Furthermore, the way we break down work is relatively arbitrary, and the design and delivery activities—particularly in the Agile software development paradigm—happen simultaneously. Indeed, it’s expected that we will change and evolve our design based on what we learn by trying to implement it. So our first step must be to define a valid, reliable measure of software delivery performance.

A successful measure of software delivery performance should have two key characteristics.

  • First, it should focus on a global outcome to ensure teams aren’t pitted against each other. The classic example is rewarding developers for throughput and operations for stability: this is a key contributor to the “wall of confusion” in which development throws poor quality code over the wall to operations, and operations puts in place painful change management processes as a way to inhibit change.
  • Second, our measure should focus on outcomes not output: it shouldn’t reward people for putting in large amounts of busywork that doesn’t actually help achieve organizational goals.

THE FOUR KEY METRICS 

When measuring software delivery performance, we settled on four key metrics seen in high-performing technology organizations:

  1. delivery lead time
  2. deployment frequency
  3. mean time to restore service
  4. change fail rate

1. Delivery Lead Time

The elevation of lead time as a metric is a key element of Lean theory. Lead time is the time it takes to go from a customer making a request to the request being satisfied.

However, in the context of product development, where we aim to satisfy multiple customers in ways they may not anticipate, there are two parts to lead time: the time it takes to design and validate a product or feature, and the time to deliver the feature to customers. In the design part of the lead time, it’s often unclear when to start the clock, and often there is high variability. 

However, the delivery part of the lead time—the time it takes for work to be implemented, tested, and delivered—is easier to measure and has a lower variability. The table below shows the distinction between these two domains.

Product Design and Development

Product Delivery (Build, Testing, Deployment)

Create new products and services that solve customer problems using hypothesis-driven delivery, modern UX, design thinking.

Enable fast flow from development to production and reliable releases by standardizing work, and reducing variability and batch sizes.

Feature design and implementation may require work that has never been performed before.

Integration, test, and deployment must be performed continuously as quickly as possible.

Estimates are highly uncertain.

Cycle times should be well-known and predictable.

Outcomes are highly variable.

Outcomes should have low variability.

Shorter product delivery lead times are better since they enable faster feedback on what we are building and allow us to course correct more rapidly.

Short lead times are also important when there is a defect or outage and we need to deliver a fix rapidly and with high confidence.

2. Deployment Frequency

The second metric to consider is batch size. Reducing batch size is another central element of the Lean paradigm—indeed, it was one of the keys to the success of the Toyota production system. Reducing batch sizes reduces cycle times and variability in flow, accelerates feedback, reduces risk and overhead, improves efficiency, increases motivation and urgency, and reduces costs and schedule growth.

However, in software, batch size is hard to measure and communicate across contexts as there is no visible inventory. Therefore, we settled on deployment frequency as a proxy for batch size since it is easy to measure and typically has low variability.

By “deployment” we mean a software deployment to production or to an app store. A release (the changes that get deployed) will typically consist of multiple version control commits, unless the organization has achieved a single-piece flow where each commit can be released to production (a practice known as continuous deployment). 

3. Mean Time to Restore

Traditionally, reliability is measured as time between failures. However, in modern software products and services, which are rapidly changing complex systems, failure is inevitable, so the key question becomes: How quickly can service be restored? How long it generally takes to restore service for the primary application or service they work on when a service incident (e.g., unplanned outage, service impairment) occurs.

4. Change Fail Rate

Finally, a key metric when making changes to systems is what percentage of changes to production (including, for example, software releases and infrastructure configuration changes) fail.

In the context of Lean, this is the same as percent complete and accurate for the product delivery process, and is a key quality metric. Look at what percentage of changes for the primary application or service you work on either results in degraded service or subsequently requires remediation (e.g., lead to service impairment or outage, require a hotfix, a rollback, a fix-forward, or a patch). 

To learn more about the four key metrics of high-performing technology organizations and how to use them in your organization, continue reading in Accelerate: The Science of Lean Software and DevOps and the State of DevOps Reports.

 

- About The Authors
Avatar photo

Nicole Forsgren

Nicole Forsgren, PhD, is Partner at Microsoft Research. She is author of the Shingo Publication Award-winning book Accelerate, and is best known as lead investigator on the largest DevOps studies to date. She has been a successful entrepreneur (with an exit to Google), professor, performance engineer, and sysadmin. Her work has been published in several peer-reviewed journals.

Follow Nicole on Social Media
Avatar photo

Gene Kim

Award winning CTO, researcher, and author.

Follow Gene on Social Media
Avatar photo

Jez Humble

Jez Humble is the coauthor of Accelerate and The DevOps Handbook

Follow Jez on Social Media

No comments found

Leave a Comment

Your email address will not be published.



Jump to Section

    More Like This

    What to Expect at DevOps Enterprise Summit Virtual – US 2022
    By Gene Kim

    I loved the DevOps Enterprise Summit Las Vegas conference! Holy cow. We held our…

    Map Camp: Weird Mapping – How to Create a Revolution
    By David Anderson

    A version of this post was originally published at TheServerlessEdge.com. Dave Anderson, author of…

    Serverless Myths
    By David Anderson , Michael O’Reilly , Mark McCann

    The term “serverless myths” could also be “modern cloud myths.” The myths highlighted here…

    What is the Modern Cloud/Serverless?
    By David Anderson , Michael O’Reilly , Mark McCann

    What is the Modern Cloud? What is Serverless? This post, adapted from The Value…