This post was adapted from the Measuring Software Quality white paper, written by Cornelia Davis, Stephen Magill, Rosalind Radcliffe and James Wickett.
In today’s digital economy, where software is central to the business, the overall quality of that software is more important than ever before. Together with a solid market need and business plan, top quality software leads to customer satisfaction, revenue, and profitability, and the best designs can even allow an organization to more easily enter new markets.
On the other hand, even with the most solid business plan, poor quality software can be one of the fastest roads to failure.
Given the great importance of software quality, leaders cannot simply hope for the best. Just as businesses measure market trends, sales pipelines, inventories, fulfillment, and more, they must also measure the quality of their software.
Current State of Measuring Code Quality
As an industry, we have been attempting to assess software quality for quite some time. Today’s continuous delivery pipelines almost always include steps that run static-code analyses. Project managers habitually track and make decisions based on lines of code.
Whether or not a team is practicing test-driven development, the value of test coverage is well understood. But we suggest that these relatively common practices at best provide only the weakest indication of the overall picture of the quality of software and at worst are misleading, giving only the illusion of quality.
There are a host of “ilities” that software development teams strive for, such as reliability, maintainability, scalability, agility, and serviceability, and it’s not difficult to draw a connection between these “ilities” and business outcomes.
We know from the State of DevOps Report published by DORA that high-performing organizations have lower lead times and increased frequency of software deployments.
Clearly, such results are directly related to agility and even scalability, particularly as it relates to team structures—autonomous teams can bring new ideas to market far faster than those that must navigate complex bureaucracies. Lowering mean time to recovery (MTTR) reflects maintainability.
And there is ample evidence that confirms the importance of secure software, with deficiencies in this area having catastrophic effects on consumer confidence and the business’s bottom line.
We know that we are striving for these things: reliability, agility, security, etc., but how do we know we have achieved them? There are several challenges.
Challenges to Understanding and Measuring Code Quality
Some of these things are difficult to measure.
How will we know when we have maintainable code? Any software developer charged with taking over an existing codebase will tell you that the mere existence of documentation does not necessarily make their job any easier, and its value may, in fact, be inversely proportional to how voluminous it is.
Some of these things might be measurable, but the data may not be available in a timeframe that allows for it to easily drive improvement.
For example, measuring the number of software outages gives an indication of reliability; however, assessing whether particular changes to the code move the needle in a positive direction will not be possible until said software has been running in production for quite some time.
Still other outcomes may be influenced by several factors requiring an aggregation of different measures.
For instance, agility is influenced by software architecture (Do you have a monolith or microservices?) as well as organizational structures (Do you have autonomous, two-pizza teams responsible for their components, or do you depend heavily on ticket-based processes?).
Measurable Leading Indicators
We suggest there are a set of measurable leading indicators for these desirable outcomes. That is, improvements in the leading indicators are reflective of improvements in the outcomes. We have categorized these leading indicators into two different buckets:
- Measures against the code: These include some familiar attributes, such as results coming from static-code analysis tools, but we add to this list with some less widely understood elements, such as the use of feature flags.
- Measures of software development and operations processes: For example, how long do integration tests take, and how often do you run them? Do you do any type of progressive delivery—A/B testing, canary deployments, etc.?
In addition, we will also point out when we feel common measures are misleading.
A Framework for Measuring Code Quality
Software quality is not a destination, it is a journey. And it is essential that you address this concept through a practice of continual improvement. We suggest a framework that includes at least the following six elements.
1) Run Experiments
Feedback loops have been established as an essential tenet in many areas of software development, and they should be applied to your strategy for managing your software quality.
Choose outcomes you would like to improve, form hypotheses about leading indicators that could enable gains in these outcomes, gather data, and assess whether your actions are leading to improvements.
This is where you can assess agility as a combination of software architecture, release practices, team structures, and so on. You should also correlate those measures that take longer to gather (number of outages over a month) with those that are easier to attain (use of feature flags).
2) Establish Targets
Because the leading indicators are by definition measurable, defining unambiguous targets is not only possible but essential. Getting integration tests to run in under an hour, for example, is something everyone on a team can understand and apply efforts directly toward. Improvements can be clearly seen and celebrated.
3) Establish Guardrails
Sometimes improvement in one area can come at the expense of another area. For example, tests might be simplified in a manner that negatively impacts test coverage in order to get integration tests to run in under one hour.
To guard against this, it is useful to explicitly set targets for what should remain unchanged during an improvement period. A better integration-testing effort could be phrased, “get integration tests to run in under an hour without decreasing integration-test coverage.”
4) Update Goals and Metrics
Your software-quality improvement initiative should be constantly evolving.
For already established measures, targets should regularly be reevaluated and/or a set of gradual improvements should be enabled. Using the previous example of the goal of getting integration tests to run in under an hour, the first target may simply be a 25% improvement in speed.
These practices contribute to several of the leading indicators discussed earlier. Of course, the actual metrics you are measuring should also shift over time—you should start simple, add to them gradually, and create new aggregates, all in response to what you learn through your experiments.
5) Get Clean/Stay Clean
At the onset, you may be faced with a situation that requires a great deal of remediation, requiring a focus on certain metrics (e.g., the leading indicators for reliability) and a great deal of resource (time) invested in it. And while improvements in quality will reduce the emergency nature of your quality initiative, we caution against adopting a mindset that teams can achieve a “done” state.
Once clean, you must continue measuring your quality with the goal of remaining clean. Team responsibilities may change (e.g., a tiger team may be dissolved as developers assume full responsibility for maintaining quality), and processes may be adapted (e.g., code reviews may be done by a broader set of individuals, not only the most senior engineers); however, measures should remain in place and be regularly audited.
6) Don’t Forget the User
Customer satisfaction is the ultimate goal and should regularly be assessed against the measures you believe will lead to it. It is easy to get caught up in “metrics for metrics’ sake,” or metrics that are interesting from a technical perspective (lines of code covered by tests, production events captured, etc.), but the only metrics that truly matter are those that, when improved, also result in a measurable improvement in customer outcomes.
Continue with Test Metrics
Code can be measured in a variety of ways. It can be analyzed statically to look for bugs, compute structural complexity metrics, or examine dependencies. It can be executed to collect performance metrics, look for crashes, or compute test coverage. It can be monitored in production to collect information on usage, failures, and responsiveness.
Each of these metrics is important and, in line with our guidance to establish guardrails and targets, there is good reason to include multiple metrics. In the full white paper on Measuring Software Quality, we describe metrics that are currently in use and motivate the need for additional, higher-level metrics.