Skip to content

October 3, 2022

Novel Security Vulnerabilities: The Ongoing Threat

By IT Revolution

This post has been adapted from the 2022 DevOps Enterprise Forum guidance paper Responding to Novel Security Vulnerabilities by Randy Shoup, Tapabrata Pal, Michael Nygard, Chris Hill, and Dominica DeGrandis.


Despite large and increasing investments in IT security, enterprises are still ill-prepared to respond to novel security vulnerabilities like Log4Shell. Companies often tend to invest in tools and processes optimized for “fighting the last war.” That is, we naturally create defenses against known vulnerabilities and attacks we experienced in the past. The last several years have shown that new classes of vulnerability continue to be discovered, including

  • CPU cache and branch prediction side-channel attacks
  • supply-chain compromise by attackers
  • components modified by their authors with political motivations
  • widely used packages subverted by new maintainers or naive additions

These challenge our organizations in several ways. First, the scope and impact of novel vulnerabilities may be unclear or difficult to determine. Our configuration management information may be incomplete, outdated, or fragmented across operational and development support systems. Each class of vulnerability requires dynamic reteaming involving different parts of the organization. For example, Heartbleed required close collaboration between security staff, OS administration, and operations. Log4Shell, on the other hand, needed the involvement of internal development teams, platforms teams, and external vendors.

Second, novel vulnerabilities require large-scale redeployment of infrastructure and application software. Organizations that have not completely automated their operating system, application, or container deployment will struggle with the scale of (re)deployment required. Organizations that review changes manually (via change approval boards or other human reviews) will struggle with the volume of changes needed. In effect, these vulnerabilities act as a denial-of-service attack on the systems of software change management itself.

Third, the scale of the threats makes our response critically urgent. Each vulnerability can hugely affect our customers, brand, and business. When faced with one of these vulnerabilities, our teams must work overtime to mitigate and remediate the threats. To the organization, this is an opportunity cost. Efforts that could go into economically productive work must instead go into work that, at best, keeps us in the same competitive and economic position. To our people, it means lost evenings, weekends, and holidays. Responding to these vulnerabilities takes a heavy emotional and psychological toll on our employees and their families.

We have every reason to expect that these will not be the last new vulnerability types to be uncovered, and cybersecurity insurance costs will correlate. Procurement questions will include your upgrade life cycle as security vulnerabilities continue appearing. As they do, the cost asymmetry between attacker and defender continues to worsen. The rigorous processes that protect us from existing, well-known vulnerabilities are necessary but insufficient. We also need to become better at responding to novel attacks and vulnerabilities by creating adaptive capacity within our organizations.

The Vulnerability Life Cycle

In our experience, organizations largely follow this sequence of events when responding to a vulnerability. But first, let’s define vulnerability: A vulnerability is broader than a single security incident. It affects many (potentially all) systems simultaneously but is not necessarily the site of an active intrusion. A vulnerability may also take a long time to remediate.

The Log4Shell vulnerability was like this. Some instances of that vulnerability were exploited—these became incidents. The Log4Shell vulnerability itself was much broader, comprising the incidents and the potential for incidents. The vulnerability life cycle consists of a series of actions and reactions (as illustrated in Figure 1) and is detailed in the next section.

Unknown

This early part of the life cycle is when we begin to notice signs of a potential vulnerability. In the case of Log4Shell, this stage involved social media posts describing the vulnerability. This stage is characterized by uncertainty about the nature of the vulnerability and whether an organization needs to take action. At some point, the signal rises beyond a threshold where some people in the organization begin to look seriously at the vulnerability and their exposure to it. This leads to the next phase: the detection of the vulnerability.

Detection

The detection event should trigger triage and assessment of the vulnerability. It can also trigger reactions of shock and denial among the staff, the initial stages of the Kübler-Ross Change Curve, which we will discuss in detail later. After the vulnerability is detected, the organization must begin the action of assessing the risk.

Assessment

Next, the organization determines whether and to what degree the vulnerability affects them. The unknowns have shifted from external to internal: How many of our systems might be affected? Which ones are affected? As we will see later, this stage can be protracted if individuals in the company disagree about the degree of risk. Part of the hesitancy is based on cost—it can be expensive to actively respond to a threat. But once the assessment is complete, the next step is declaring what, if any, action the organization will take.

Declaration

Initiating the formal declaration event often requires a high level of decision-making authority. Reaching the appropriate authority with actionable information can also be a source of delay. Further, some individuals’ emotional reactions of denial and frustration can also cause hesitancy and delay (more on this when we discuss the Kübler-Ross Change Curve). From declaration, the organization must now implement an active response.

Active Response

Active response includes the early actions to triage the vulnerability. Triage includes assessing the scope of vulnerable systems and products as well as their impact on customers. Triage leads to prioritized steps to first mitigate and then remediate the vulnerability. The active response phase involves more people in the organization. As each cohort of people becomes involved, they also go through the unproductive stages of shock, denial, frustration, and depression (part of the Kübler-Ross Change Curve). Meanwhile, they’re being asked to ascertain the situation quickly and catch up to those who are already in the know. Be warned: a company might experience communication and collaboration challenges when part of the organization is working on mitigation while another part is still looking for reasons their system should not be affected (a denial reaction).

Mitigation and Remediation

An organization’s active response typically leads to one of two results (or a combination): mitigation or remediation of the vulnerability. Mitigation is temporary. It usually involves workarounds to prevent active exploitation of the vulnerability while the vulnerability itself is fully addressed. Remediation is the permanent solution to remove the vulnerability completely. We’ll detail the activities that take place during the mitigation and remediation phase later in this paper.

Retrospective

Many organizations’ response stops when remediation is completed, but this is a lost opportunity. The highest-performing and most secure organizations treat large-scale vulnerabilities as unplanned investments in learning about security. Retrospective techniques such as blameless post-mortems can help organizations examine what went well during the event and what went poorly, and look for opportunities to improve for the next vulnerability.

As John Allspaw says, “Incidents are unplanned investments, and they are also opportunities. Your challenge is to maximize the ROI on the sunk cost. To do that, the organization has to invest in really exploring and understanding these events, and share that understanding broadly and over time.” When mitigation and remediation are complete, do not skip over this necessary retrospective phase.

Reflection and Learning

Related to the retrospective is the action of reflecting and learning from an incident or vulnerability. There are many ways to reflect and learn. Consider the following questions:

  • How was the vulnerability introduced?
  • How might it have been contained?
  • How might damage have been controlled?
  • Was the response appropriate?
  • Was the initial triage accurate? If not, why not?
  • Did the deployment pipelines help or hinder remediation?

As in any retrospective, nothing is gained by making scapegoats from developers who made decisions a decade ago (as was often the case for Log4Shell).

Adaptation

After the retrospective and reflection and learning phases, it’s time to actually implement adaptation actions. The vulnerability might be closed, but it is still fresh in the participants’ minds. This is the ideal time to improve tooling, processes, system data, design and testing practices, and deployment pipelines. Only once adaptation measures have been acted upon should an organization consider a vulnerability or incident “complete.”

Log4Shell Vulnerability Examples

Different companies handled the Log4Shell vulnerability in different ways. For such a sweeping issue, it’s no surprise that some organizations dealt with Log4Shell neatly while others floundered. In the full paper (download here), we’ll examine various paths some (anonymized) organizations could have taken in addressing Log4Shell. 

In our next post, we’ll explore lessons learned along the way.

- About The Authors
Avatar photo

IT Revolution

Trusted by technology leaders worldwide. Since publishing The Phoenix Project in 2013, and launching DevOps Enterprise Summit in 2014, we’ve been assembling guidance from industry experts and top practitioners.

Follow IT on Social Media

No comments found

Leave a Comment

Your email address will not be published.



Jump to Section

    More Like This

    Serverless Myths
    By David Anderson , Michael O’Reilly , Mark McCann

    The term “serverless myths” could also be “modern cloud myths.” The myths highlighted here…

    What is the Modern Cloud/Serverless?
    By David Anderson , Michael O’Reilly , Mark McCann

    What is the Modern Cloud? What is Serverless? This post, adapted from The Value…

    Using Wardley Mapping with the Value Flywheel
    By David Anderson , Michael O’Reilly , Mark McCann

    Now that we have our flywheel turning (see our posts What is the Value…

    12 Key Tenets of the Value Flywheel Effect
    By David Anderson , Michael O’Reilly , Mark McCann

    Now that you've learned about what the Value Flywheel Effect is, let's look at…