This post has been adapted from Accelerate: The Science of Lean Software and DevOps by Nicole Forsgren, PhD, Jez Humble, and Gene Kim.
There are many frameworks and methodologies that aim to improve the way we build software products and services. We wanted to discover what works and what doesn’t in a scientific way, starting with a definition of what “good” means in this context. This post presents the four key metrics to measure software delivery performance.
MEASURING SOFTWARE DELIVERY PERFORMANCE
Measuring software delivery performance is hard—in part because, unlike manufacturing, the inventory is invisible. Furthermore, the way we break down work is relatively arbitrary, and the design and delivery activities—particularly in the Agile software development paradigm—happen simultaneously. Indeed, it’s expected that we will change and evolve our design based on what we learn by trying to implement it. So our first step must be to define a valid, reliable measure of software delivery performance.
A successful measure of software delivery performance should have two key characteristics.
- First, it should focus on a global outcome to ensure teams aren’t pitted against each other. The classic example is rewarding developers for throughput and operations for stability: this is a key contributor to the “wall of confusion” in which development throws poor quality code over the wall to operations, and operations puts in place painful change management processes as a way to inhibit change.
- Second, our measure should focus on outcomes not output: it shouldn’t reward people for putting in large amounts of busywork that doesn’t actually help achieve organizational goals.
THE FOUR KEY METRICS
When measuring software delivery performance, we settled on four key metrics seen in high-performing technology organizations:
- delivery lead time
- deployment frequency
- mean time to restore service
- change fail rate
1. Delivery Lead Time
The elevation of lead time as a metric is a key element of Lean theory. Lead time is the time it takes to go from a customer making a request to the request being satisfied.
However, in the context of product development, where we aim to satisfy multiple customers in ways they may not anticipate, there are two parts to lead time: the time it takes to design and validate a product or feature, and the time to deliver the feature to customers. In the design part of the lead time, it’s often unclear when to start the clock, and often there is high variability.
However, the delivery part of the lead time—the time it takes for work to be implemented, tested, and delivered—is easier to measure and has a lower variability. The table below shows the distinction between these two domains.
Product Design and Development |
Product Delivery (Build, Testing, Deployment) |
Create new products and services that solve customer problems using hypothesis-driven delivery, modern UX, design thinking. |
Enable fast flow from development to production and reliable releases by standardizing work, and reducing variability and batch sizes. |
Feature design and implementation may require work that has never been performed before. |
Integration, test, and deployment must be performed continuously as quickly as possible. |
Estimates are highly uncertain. |
Cycle times should be well-known and predictable. |
Outcomes are highly variable. |
Outcomes should have low variability. |
Shorter product delivery lead times are better since they enable faster feedback on what we are building and allow us to course correct more rapidly.
Short lead times are also important when there is a defect or outage and we need to deliver a fix rapidly and with high confidence.
2. Deployment Frequency
The second metric to consider is batch size. Reducing batch size is another central element of the Lean paradigm—indeed, it was one of the keys to the success of the Toyota production system. Reducing batch sizes reduces cycle times and variability in flow, accelerates feedback, reduces risk and overhead, improves efficiency, increases motivation and urgency, and reduces costs and schedule growth.
However, in software, batch size is hard to measure and communicate across contexts as there is no visible inventory. Therefore, we settled on deployment frequency as a proxy for batch size since it is easy to measure and typically has low variability.
By “deployment” we mean a software deployment to production or to an app store. A release (the changes that get deployed) will typically consist of multiple version control commits, unless the organization has achieved a single-piece flow where each commit can be released to production (a practice known as continuous deployment).
3. Mean Time to Restore
Traditionally, reliability is measured as time between failures. However, in modern software products and services, which are rapidly changing complex systems, failure is inevitable, so the key question becomes: How quickly can service be restored? How long it generally takes to restore service for the primary application or service they work on when a service incident (e.g., unplanned outage, service impairment) occurs.
4. Change Fail Rate
Finally, a key metric when making changes to systems is what percentage of changes to production (including, for example, software releases and infrastructure configuration changes) fail.
In the context of Lean, this is the same as percent complete and accurate for the product delivery process, and is a key quality metric. Look at what percentage of changes for the primary application or service you work on either results in degraded service or subsequently requires remediation (e.g., lead to service impairment or outage, require a hotfix, a rollback, a fix-forward, or a patch).
To learn more about the four key metrics of high-performing technology organizations and how to use them in your organization, continue reading in Accelerate: The Science of Lean Software and DevOps and the State of DevOps Reports.
Deployment Frequency could be measured as Mean Time Between Deployments, so that smaller is better for each of the four metrics.