February 25, 2021

Measure Software Delivery Performance with Four Key Metrics

By Nicole Forsgren ,Gene Kim ,Jez Humble

This post has been adapted from Accelerate: The Science of Lean Software and DevOps by Nicole Forsgren, PhD, Jez Humble, and Gene Kim.

There are many frameworks and methodologies that aim to improve the way we build software products and services. We wanted to discover what works and what doesn’t in a scientific way, starting with a definition of what “good” means in this context. This post presents the four key metrics to measure software delivery performance.

MEASURING SOFTWARE DELIVERY PERFORMANCE

Measuring software delivery performance is hard—in part because, unlike manufacturing, the inventory is invisible. Furthermore, the way we break down work is relatively arbitrary, and the design and delivery activities—particularly in the Agile software development paradigm—happen simultaneously. Indeed, it’s expected that we will change and evolve our design based on what we learn by trying to implement it. So our first step must be to define a valid, reliable measure of software delivery performance.

A successful measure of software delivery performance should have two key characteristics.

First, it should focus on a global outcome to ensure teams aren’t pitted against each other. The classic example is rewarding developers for throughput and operations for stability: this is a key contributor to the “wall of confusion” in which development throws poor quality code over the wall to operations, and operations puts in place painful change management processes as a way to inhibit change.
Second, our measure should focus on outcomes not output: it shouldn’t reward people for putting in large amounts of busywork that doesn’t actually help achieve organizational goals. Within this framework, it is also necessary to be familiar with the dynamic of developer and IT team leader salary trends in order to ensure a proper level of compensation.

THE FOUR KEY METRICS

When measuring software delivery performance, we settled on four key metrics seen in high-performing technology organizations:

delivery lead time
deployment frequency
mean time to restore service
change fail rate

1. Delivery Lead Time

The elevation of lead time as a metric is a key element of Lean theory. Lead time is the time it takes to go from a customer making a request to the request being satisfied.

However, in the context of product development, where we aim to satisfy multiple customers in ways they may not anticipate, there are two parts to lead time: the time it takes to design and validate a product or feature, and the time to deliver the feature to customers. In the design part of the lead time, it’s often unclear when to start the clock, and often there is high variability.

However, the delivery part of the lead time—the time it takes for work to be implemented, tested, and delivered—is easier to measure and has a lower variability. The table below shows the distinction between these two domains.

Product Design and Development	Product Delivery (Build, Testing, Deployment)
Create new products and services that solve customer problems using hypothesis-driven delivery, modern UX, design thinking.	Enable fast flow from development to production and reliable releases by standardizing work, and reducing variability and batch sizes.
Feature design and implementation may require work that has never been performed before.	Integration, test, and deployment must be performed continuously as quickly as possible.
Estimates are highly uncertain.	Cycle times should be well-known and predictable.
Outcomes are highly variable.	Outcomes should have low variability.

Shorter product delivery lead times are better since they enable faster feedback on what we are building and allow us to course correct more rapidly.

Short lead times are also important when there is a defect or outage and we need to deliver a fix rapidly and with high confidence.

2. Deployment Frequency

The second metric to consider is batch size. Reducing batch size is another central element of the Lean paradigm—indeed, it was one of the keys to the success of the Toyota production system. Reducing batch sizes reduces cycle times and variability in flow, accelerates feedback, reduces risk and overhead, improves efficiency, increases motivation and urgency, and reduces costs and schedule growth.

However, in software, batch size is hard to measure and communicate across contexts as there is no visible inventory. Therefore, we settled on deployment frequency as a proxy for batch size since it is easy to measure and typically has low variability.

By “deployment” we mean a software deployment to production or to an app store. A release (the changes that get deployed) will typically consist of multiple version control commits, unless the organization has achieved a single-piece flow where each commit can be released to production (a practice known as continuous deployment).

3. Mean Time to Restore

Traditionally, reliability is measured as time between failures. However, in modern software products and services, which are rapidly changing complex systems, failure is inevitable, so the key question becomes: How quickly can service be restored? How long it generally takes to restore service for the primary application or service they work on when a service incident (e.g., unplanned outage, service impairment) occurs.

4. Change Fail Rate

Finally, a key metric when making changes to systems is what percentage of changes to production (including, for example, software releases and infrastructure configuration changes) fail.

In the context of Lean, this is the same as percent complete and accurate for the product delivery process, and is a key quality metric. Look at what percentage of changes for the primary application or service you work on either results in degraded service or subsequently requires remediation (e.g., lead to service impairment or outage, require a hotfix, a rollback, a fix-forward, or a patch).

To learn more about the four key metrics of high-performing technology organizations and how to use them in your organization, continue reading in Accelerate: The Science of Lean Software and DevOps and the State of DevOps Reports.

- About The Authors

Nicole Forsgren

Nicole Forsgren, PhD, is Partner at Microsoft Research. She is author of the Shingo Publication Award-winning book Accelerate, and is best known as lead investigator on the largest DevOps studies to date. She has been a successful entrepreneur (with an exit to Google), professor, performance engineer, and sysadmin. Her work has been published in several peer-reviewed journals.

Gene Kim

Gene Kim is a Wall Street Journal bestselling author, researcher, and multiple award-winning CTO. He has been studying high-performing technology organizations since 1999 and was the founder and CTO of Tripwire for 13 years. He is the author of six books, The Unicorn Project (2019), and co-author of the Shingo Publication Award winning Accelerate (2018), The DevOps Handbook (2016), and The Phoenix Project (2013). Since 2014, he has been the founder and organizer of DevOps Enterprise Summit, studying the technology transformations of large, complex organizations.

with Dominica DeGrandis

with Matthew Skelton & Manuel Pais

Through May 3, 2024

April 24-25, 2024

August 20-22, 2024

By Gene Kim

By Dr. André Martin

Measure Software Delivery Performance with Four Key Metrics

MEASURING SOFTWARE DELIVERY PERFORMANCE