In today’s digital economy, where software is central to the business, the overall quality of that software is more important than ever before. Together with a solid market need and business plan, top quality software leads to customer satisfaction, revenue, and profitability, and the best designs can even allow an organization to more easily enter new markets.
On the other hand, even with the most solid business plan, poor quality software can be one of the fastest roads to failure. Given the great importance of software quality, leaders cannot simply hope for the best. Just as businesses measure market trends, sales pipelines, inventories, fulfillment, and more, they must also measure the quality of their software.
As an industry, we have been attempting to assess software quality for quite some time. Today’s continuous delivery pipelines almost always include steps that run static-code analyses. Project managers habitually track and make decisions based on lines of code. Whether or not a team is practicing test-driven development, the value of test coverage is well understood. But we suggest that these relatively common practices at best provide only the weakest indication of the overall picture of the quality of software and at worst are misleading, giving only the illusion of quality.
There are a host of “ilities” that software development teams strive for, such as reliability, maintainability, scalability, agility, and serviceability, and it’s not difficult to draw a connection between these “ilities” and business outcomes. We know from the State of DevOps Report published by DORA that high-performing organizations have lower lead times and increased frequency of software deployments.
Clearly, such results are directly related to agility and even scalability, particularly as it relates to team structures—autonomous teams can bring new ideas to market far faster than those that must navigate complex bureaucracies. Lowering mean time to recovery (MTTR) reflects maintainability. And there is ample evidence that confirms the importance of secure software, with deficiencies in this area having catastrophic effects on consumer confidence and the business’s bottom line.
We know that we are striving for these things: reliability, agility, security, etc., but how do we know we have achieved them? There are several challenges.
Some of these things are difficult to measure. How will we know when we have maintainable code? Any software developer charged with taking over an existing codebase will tell you that the mere existence of documentation does not necessarily make their job any easier, and its value may, in fact, be inversely proportional to how voluminous it is.
Some of these things might be measurable, but the data may not be available in a timeframe that allows for it to easily drive improvement. For example, measuring the number of software outages gives
an indication of reliability; however, assessing whether particular changes to the code move the needle in a positive direction will not be possible until said software has been running in production for quite some time.
Still other outcomes may be influenced by several factors requiring an aggregation of different measures. For instance, agility is influenced by software architecture (Do you have a monolith or microservices?) as well as organizational structures (Do you have autonomous, two-pizza teams responsible for their components, or do you depend heavily on ticket-based processes?).
In this paper, we suggest that there are a set of measurable leading indicators for these desirable outcomes. That is, improvements in the leading indicators are reflective of improvements in the outcomes. We have categorized these leading indicators into two different buckets:
- Measures against the code: These include some familiar attributes, such as results coming from static-code analysis tools, but we add to this list with some less widely understood elements, such as the use of feature flags.
- Measures of software development and operations processes: For example, how long do integration tests take, and how often do you run them? Do you do any type of progressive delivery—A/B testing, canary deployments, etc.?
In addition, we will also point out when we feel common measures are misleading.