Inspire, develop, and guide a winning organization.
Create visible workflows to achieve well-architected software.
Understand and use meaningful data to measure success.
Integrate and automate quality, security, and compliance into daily work.
Understand the unique values and behaviors of a successful organization.
LLMs and Generative AI in the enterprise.
An on-demand learning experience from the people who brought you The Phoenix Project, Team Topologies, Accelerate, and more.
Learn how making work visible, value stream management, and flow metrics can affect change in your organization.
Clarify team interactions for fast flow using simple sense-making approaches and tools.
Multiple award-winning CTO, researcher, and bestselling author Gene Kim hosts enterprise technology and business leaders.
In the first part of this two-part episode of The Idealcast, Gene Kim speaks with Dr. Ron Westrum, Emeritus Professor of Sociology at Eastern Michigan University.
In the first episode of Season 2 of The Idealcast, Gene Kim speaks with Admiral John Richardson, who served as Chief of Naval Operations for four years.
New half-day virtual events with live watch parties worldwide!
DevOps best practices, case studies, organizational change, ways of working, and the latest thinking affecting business and technology leadership.
Is slowify a real word?
Could right fit help talent discover more meaning and satisfaction at work and help companies find lost productivity?
The values and philosophies that frame the processes, procedures, and practices of DevOps.
This post presents the four key metrics to measure software delivery performance.
October 4, 2022
This post has been adapted from the 2022 DevOps Enterprise Forum guidance paper Responding to Novel Security Vulnerabilities by Randy Shoup, Tapabrata Pal, Michael Nygard, Chris Hill, and Dominica DeGrandis.
We explored the ongoing threat of novel vulnerabilities in our first post. But different companies handled the Log4Shell vulnerability in different ways. For such a sweeping issue, it’s no surprise that some organizations dealt with Log4Shell neatly while others floundered. In this section, we’ll examine various paths some (anonymized) organizations could have taken in addressing Log4Shell.
Detection
12/9/21, evening: A junior Java developer notices this tweet while winding down from a hectic day: They forward the tweet to their team lead immediately. The lead texts the screenshot to the development manager and sends an email to the Information Security Officer (ISO) assigned for the business unit.
12/10/21, morning: The ISO forwards the email to the internal Red Team. Someone from the Red Team receives the email and immediately starts reviewing a document called “Daily Threat Review” that is provided by a third-party security firm. The report mentions the Log4j vulnerability but does not have details about CVE (common vulnerabilities and exposures) score. The Red Team member escalates to leadership immediately. A conference call is scheduled for 10:30 a.m. with the third-party vendor. In that call, the vendor advises the cybersecurity leadership (VP+, minus CISO) that a zero-day vulnerability was announced that morning.
Declaration
12/10/21, 11:00 a.m.: The CISO is notified and informs the CIO immediately, who sends an email to all business unit CIOs at lunch with a call to action. In the afternoon, the cybersecurity team schedules an emergency meeting with all senior technical leaders from all business units. In the meeting, the technical leaders create a rough action plan:
Create a new web application firewall (WAF) rule immediately.
Report Log4j use across all applications.
Create plan to fix external-facing applications immediately.
Create plan to fix internal-facing applications next.
Mitigation
12/10/21, night: Cybersecurity team deploys WAF rule change in production.
Remediation
12/11/21 to 12/12/21 (weekend): A team member creates an internal Slack channel to keep developers and leaders connected as the cybersecurity team compiles a report of impacted applications. People share all kinds of blog posts, news reports, tweets, LinkedIn posts, and more from social media about the yet-unnamed vulnerability. The cyber platform team finds a way to extract reports of a security scanning tool that was in use for scanning external-facing applications only. No one knows clearly how to report on affected internal-facing applications. People write custom scripts to “crawl” internal Git repositories and look for Log4j dependency entries in POM or Gradle files. Various development teams start upgrading Log4j to 2.15.0. A handful of applications are found to use very old versions of Log4j. Team members express a lot of confusion around what needs to be fixed.
12/13/21, morning: Many vendors start reporting their impact. Some acknowledge that they are affected and will be releasing a patch for their product, while some deny any issue related to Log4j. Some vendors say nothing. Dev teams for internal-facing applications work through code changes and the build, test, and deploy process. It’s lucky that CI/CD (continuous integration/continuous delivery) pipelines are functioning properly to make things faster. The company’s emergency change process allows devs to skip long-running SAST and functional and performance tests.
12/13/21, afternoon: The cybersecurity team notices newly published versions of Log4j: 2.16.0 and 2.12.2. The Slack channel is overwhelmed as people voice opinions about staying on 2.15.0 or upgrading to 2.16.0. In the evening, the CISO decides to upgrade to 2.16.0. However, there’s still no way to identify affected internal-facing applications.
12/14/21: People report back to the Slack channel about their progress in fixing the affected apps. The cybersecurity team creates virtual war rooms to manage, report, and track all activities. Someone reports in the Slack channel that 2.16.0 has a new vulnerability event: CVE-2021-45046. The cybersecurity team calls up external security consultants for advice. External consultants advise that the new CVE has a low score and can be ignored. Many associates cancel their holiday vacations. There is still no scalable solution to identify internal applications. Someone runs an internal Git repo crawler that brings down the Git platform, impacting ongoing fixes for external applications. About half of applications are fixed in production.
12/17/21: While many developers are already exhausted by the rush to fix their applications, CVE-2021-45046 (the low-score CVE from 12/14) gets a new score of 9.0 (critical). This incites another round of emergency meetings, emails, and Slack notifications. Teams that are in the process of deploying new applications with Log4j 2.16.0 hesitate: social media posts indicate yet another version of Log4j will be released soon. In the meantime, a developer repurposes a custom, homegrown scanner to scan for the 2.14.1 Log4j library in internal applications, but poor performance of that scanner hinders much progress.
12/18/21: Log4j 2.17.0 comes out. Developers experience the same confusion, meetings, reports, agony, and frustrations. All applications that had fixes in production are back at work. Those that have fixes in the release pipeline pause and go back to upgrading again.
12/23/21: The first report of impacted internal applications is published. Developers find a lot of false positives—applications that appear to be vulnerable at first but are ultimately determined to be safe. Many internal applications that have not been touched for a long time are found to use older versions of Log4j that reached their end of life years ago and are now in need of further investigation. The cybersecurity team announces new deadlines: (a) All external-facing applications must upgrade by 12/31/21. (b) All internal-facing applications must upgrade by 1/31/2022.
1/4/22: It’s the first official working day of 2022—but many developers worked through the holidays and the weekend. Most external-facing applications have been upgraded. Many internal-facing application teams are struggling. Some internal applications have been fixed but not all. People are burnt out. Many associates are on vacation after working through the holidays. High demands on CI/CD platforms bring out platform instability. Many vendors are still working on patches.
12/9/21, evening: An Info Security engineer sees the same tweet early in the evening. The engineer immediately contacts the Info Security Platform team that manages their software composition analysis (SCA) solution, which continuously monitors the company’s code repository Git activities and triggers alerts for new vulnerabilities. The platform support team sends an inquiry via email to the SCA tools vendor, copying the Info Security Platform team senior management.
12/10/21, early morning: The SCA tools vendor notifies Info Security that a new Log4j vulnerability will cause mass alerts from their tool in most Java-based application source code repositories.
12/10/21, morning working hours: The internal Slack channel for Info Security support starts receiving queries from development teams. Developers start seeing alerts (GitHub issues) in their repositories that describe the vulnerability as well as instructions on how to fix it. The InfoSec Platform team uses the SCA solution’s reporting capability to generate a list of all impacted repositories as a baseline. The team notifies InfoSec and the CI/CD Platform Engineering Team’s senior leadership.
12/10/21, before lunch: InfoSec deploys a new WAF rule to production.
12/10/21, noon: InfoSec sends automated communications via email and Slack messages to owners of the impacted repositories. External-facing applications have the first priority in releasing the fix. Developers start fixing code and trigger the automated CI/CD pipeline.
12/11/21 to 12/12/21: Developers upgrade all external-facing applications in production.
12/13/21: Developers begin the work of fixing internal-facing applications and deploying code to production via an automated CI/CD pipeline. InfoSec generates a new report on the SCA tool to confirm all external-facing applications are fixed. A few applications did not complete full production deployment and need follow-up, but developers quickly tackle these issues. InfoSec receives an urgent alert from the SCA tool vendor about a new Log4j version, 2.16.0. This frustrates developers and InfoSec engineers. However, due to good automation, alerting, and reporting, developers go back to upgrading again.
12/18/21: Another version of Log4j, version 2.17.0, comes out. Team members repeat their previous actions with alacrity.
This example went a lot better than the previous one, didn’t it? By December 19th, most applications(internal and external facing) have been fixed (most for the third time)—just in time for holiday vacations. A handful of internal applications (mostly nonproduction batch, custom homegrown development tools, etc.) are fixed in January. Some developers delay their vacations a few days, but overall, there is no major impact on personal times. That said, some vendors still lag in sending fixes, and patches continue over the holidays and into January 2022.
Let’s look at one more example scenario, one that (we hope) no other organization will ever experience.
12/10/2021, morning: A normal working day. The Enterprise Threat Monitor team receives Log4j vulnerability news from various feeds that they subscribe to. They escalate to the Enterprise Risk office by sending a note to the director of that organization.
12/10/2021, noon: The Director of Enterprise Risk schedules an urgent meeting with the Enterprise IT Asset Management team. The Enterprise Security team and Enterprise Audit Team are also invited, along with a few senior leaders from a few “DevOps Organizations.”
12/10/2021, afternoon: The Enterprise Risk office opens a new Risk in their Risk Management system and assigns one of the Vice Presidents of Enterprise Security as the owner. The Risk describes the Log4j vulnerability and explains how all Java applications have that Risk.
Mitigation and Remediation
12/13/2021, morning: Some developers hear “rumbles” from their friends in other companies about a new vulnerability. The Enterprise Asset Management team uses their configuration management database (CMDB) to create a list of applications that have Java listed in the “technology used” column in the CMDB. The list is exported into a spreadsheet. The spreadsheet also captures owner information, such as the owner of the application, the application architect, and production support for the application. The team uploads the spreadsheet to an internal shared drive. The Enterprise Risk Management team sends out an email to all the application owners and attaches the link to the aforementioned spreadsheet. A Project Manager from a DevOps team is assigned to track progress on “closing” the Risk and schedules daily meetings for the rest of 2021 for all impacted owners to report on progress.
12/14/2021, morning: The first daily meeting occurs and has more than one hundred virtual attendees. The meeting becomes chaotic. Many application owners say that their applications do not use Java technologies and are not impacted. Many say they don’t know why they were called to the meeting or what they’re supposed to do. Many complain that they’re in the middle of pre-holiday releases and cannot stop their already-planned work.
12/14/2021, noon: An urgent meeting occurs with the Project Manager, the Director of Enterprise Risk, the Director of Enterprise Audit, some development VPs, and the Enterprise Security VP. The Director of Enterprise Audit shows concern that the spreadsheet created from CMDB has incorrect data—if some data is wrong, it’s likely the spreadsheet did not capture all valid data. The Development VPs complain about a lack of clear direction on what needs to be fixed and how. The Enterprise Risk Director declares that the Risk has to be closed by the end of the year. There is disagreement on all fronts. The project status is Red on day 1—four days after the Enterprise Threat Monitor team was notified of the vulnerability.
12/15/2021, morning: The second daily meeting. This meeting also has over one hundred attendees. Some senior developers attend the meeting too. The meeting starts with the Project Manager reiterating that the project status is Red and talking about the Risk opened and the end-of-year deadline to close the Risk. One senior developer jumps in and says that she has documented the fix process, which she can share with all present. The Director of Enterprise Security urges that the process be reviewed and approved by the Security Team before it is shared among all developers.
1/31/2022: Nearly eight weeks later and a month after the end-of-year deadline, it is still unclear as to what needs to be fixed. Some legacy application source code may have been archived during SCM tool migration by mistake, but those applications still run as is. The Risk is still not closed. Enterprise Audit has opened issues against CMDB data, listing many discrepancies.
—
Next in our series, we’ll look at several organizational response lessons learned from these examples.
Trusted by technology leaders worldwide. Since publishing The Phoenix Project in 2013, and launching DevOps Enterprise Summit in 2014, we’ve been assembling guidance from industry experts and top practitioners.
No comments found
Your email address will not be published.
First Name Last Name
Δ
If you haven’t already read Unbundling the Enterprise: APIs, Optionality, and the Science of…
Organizations face critical decisions when selecting cloud service providers (CSPs). A recent paper titled…
We're thrilled to announce the release of The Phoenix Project: A Graphic Novel (Volume…
The following post is an excerpt from the book Unbundling the Enterprise: APIs, Optionality, and…