Part 2 of 4: Where to Start with DevOps Series
Last week, I highlighted how to identify a value stream to which you can begin to apply DevOps principles and patterns in this post: Selecting Which Value Stream to Start With
Our next step in our DevOps transformation is to gain a sufficient understanding of how value is delivered to the customer, by evaluating what work is performed, by whom, and what steps we can take to improve flow.
Let us begin by identifying the teams supporting our value stream.
In value streams of any complexity, no one person knows all the work that must be performed in order to create value for the customer—especially since the required work must be performed by many different teams, often far removed from each other on the organization charts, geographically, or by incentives.
As a result, after we select a service for our DevOps initiative, we must identify all the members of the value stream who are responsible for working together to create value for the customers being served.
In general, this includes:
Once we identify our value stream members, our next step is to gain a concrete understanding of how work is performed, documented in the form of a value stream map.
In our value stream, work likely begins with the product owner, in the form of a customer request or the formulation of a business hypothesis.
Some time later, this work is accepted by Development, where features are implemented in code and checked in to our version control repository. Builds are then integrated, tested in a production-like environment, and finally deployed into production, where they (ideally) create value for our customer.
In many traditional organizations, this value stream will consist of hundreds, if not thousands, of steps, requiring work from hundreds of people.
Because documenting any value stream map likely requires multiple days, we may conduct a multi-day workshop, where we assemble all the key constituents and remove them from the distractions of their daily work.
Our goal is not to document every step and associated minutiae, but to sufficiently understand the areas in our value stream that are jeopardizing our goals of fast flow, short lead times, and reliable customer outcomes. Ideally, we have assembled those people with the authority to change their portion of the value stream.
Using the full breadth of knowledge brought by the teams engaged in the value stream, we should focus our investigation and scrutiny on either:
A) The places where work must wait weeks or even months, such as getting production-like environments, change approval processes, or security review processes
B) The places where significant rework is generated or received
Our first pass of documenting our value stream should only consist of high-level process blocks.
Typically, even for complex value streams, groups can create a diagram with 5 to 15 process blocks within a few hours. Each process block should include the lead time and process time for a work item to be processed, as well as the %C/A as measured by the downstream consumers of the output.
Once we identify the metric we want to improve, we should perform the next level of observations and measurements to better understand the problem.
We will construct an idealized, future value stream map, which serves as a target condition to achieve by some date (e.g., usually 3 to 12 months).
Leadership helps define this future state and then guides and enables the team to brainstorm hypotheses and countermeasures to achieve the desired improvement to that state, perform experiments to test those hypotheses, and interpret the results to determine whether the hypotheses were correct. The teams keep repeating and iterating, using any new learnings to inform the next experiments.
With these metrics in place, the next step in our DevOps transformation will be to create a dedicated transformation team.
One of the inherent challenges with initiatives such as DevOps transformations is that they are inevitably in conflict with ongoing business operations.
Part of this is a natural outcome of how successful businesses evolve. An organization that has been successful for any extended period of time (years, decades, or even centuries) has created mechanisms to perpetuate the practices that made them successful, such as product development, order administration, and supply chain operations.
While this is good for preserving status quo, we often need to change how we work to adapt to changing conditions in the marketplace. Doing this requires disruption and innovation, which puts us at odds with groups who are currently responsible for daily operations and the internal bureaucracies, and who will almost always win.
So how do we move forward?
Based on the research of Dr. Vijay Govindarajan and Dr. Chris Trimble, both faculty members of Dartmouth College’s Tuck School of Business, they assert that organizations need to create a dedicated transformation team that is able to operate outside of the rest of the organization that is responsible for daily operations (which they call the “dedicated team” and “performance engine” respectively).
First and foremost, we will hold this dedicated team accountable for achieving a clearly defined, measurable, system-level result (e.g., reduce the deployment lead time from “code committed into version control to successfully running in production” by 50%).
In order to execute such an initiative, we do the following:
- Assign members of the dedicated team to be solely allocated to the DevOps transformation efforts (as opposed to “maintain all your current responsibilities, but spend 20% of your time on this new DevOps thing.”)
- Select team members who are generalists, who have skills across a wide variety of domains.
- Select team members who have longstanding and mutually respectful relationships with the rest of the organization.
- Create a separate physical space for the dedicated team, if possible, to maximize communication flow within the team, and creating some isolation from the rest of the organization.
If possible, we will also free the transformation team from many of the rules and policies that restrict the rest of the organization.
After all, established processes are a form of institutional memory—we need the dedicated team to create the new processes and learnings required to generate our desired outcomes, creating new institutional memory.
Creating a dedicated team is not only good for the team, but also good for the performance engine. By creating a separate team, we create the space for them to experiment with new practices, protecting the rest of the organization from the potential disruptions and distractions associated with it.
Next, we will agree on a shared goal for the organization to move towards.
One of the most important parts of any improvement initiative is to define a measurable goal with a clearly defined deadline, between 6 months and 2 years in the future.
It should require considerable effort but still be achievable, and should create obvious value for the organization as a whole and to our customers.
These goals and the time frame should be agreed upon by the executives and known to everyone in the organization.
We also want to limit the number of these types of initiatives going on simultaneously to prevent us from overly taxing the organizational change management capacity of leaders and the organization.
Examples of improvement goals might include:
- Reduce the percentage of the budget spent on product support and unplanned work by 50%.
- Ensure lead time from code check-in to production release is one week or less for 95% of changes.
- Ensure releases can always be performed during normal business hours with zero downtime.
- Integrate all the required information security controls into the deployment pipeline to pass all required compliance requirements.
Once the high-level goal is made clear, teams should decide on a regular cadence to drive the improvement work. Like product development work, we want transformation work to be done in an iterative, incremental manner.
A typical iteration will be in the range of 2 to 4 weeks. For each iteration, the teams should agree on a small set of goals that generate value and makes some progress toward the long-term goal.
At the end of each iteration, teams should review their progress and set new goals for the next iteration.
Lastly, in order to be able to know if we are making progress toward our goal, it’s essential that everyone in the organization knows the current state of work.
There are many ways to make the current state visible, but what’s most important is that the information we display is up to date, and that we constantly revise what we measure to make sure it’s helping us understand progress toward our current target conditions.
With our goal in place, the next step will be for the organization to keep their improvement planning horizons short, just as if we were in a startup doing product or customer development.
Our initiative should strive to generate measurable improvements or actionable data within weeks (or in the worst case, months).
By keeping our planning horizons and iteration intervals short, we achieve the following:
- Flexibility and the ability to re-prioritize and replan quickly
- Decrease the delay between work expended and improvement realized, which strengthens our feedback loop, making it more likely to reinforce desired behaviors—when improvement initiatives are successful, it encourages more investment
- Faster learning generated from the first iteration, meaning faster integration of our learnings into the next iteration
- Reduction in activation energy to get improvements
- Quicker realization of improvements that make meaningful differences in our daily work
- Less risk that our project is killed before we can generate any demonstrable outcomes
The final responsibility of the dedicated transformation team will be to reserve 20% of cycles for non-functional requirements and reducing technical debt.
A problem common to any process improvement effort is how to properly prioritize it—after all, organizations that need it most are those that have the least amount of time to spend on improvement. This is especially true in technology organizations because of technical debt.
Organizations that struggle with financial debt only make interest payments and never reduce the loan principal, and may eventually find themselves in situations where they can no longer service the interest payments.
Organizations that don’t pay down technical debt can find themselves so burdened with daily workarounds for problems left unfixed that they can no longer complete any new work.
In other words, they are now only making the interest payment on their technical debt.
We will actively manage this technical debt by ensuring that we invest at least 20% of all Development and Operations cycles on refactoring, investing in automation work and architecture and non-functional requirements (NFRs, sometimes referred to as the “ilities”), such as maintainability, manageability, scalability, reliability, testability, deployability, and security.
By dedicating 20% of our cycles so that Dev and Ops can create lasting countermeasures to the problems we encounter in our daily work, we ensure that technical debt doesn’t impede our ability to quickly and safely develop and operate our services in production. Alleviating added pressure of technical debt from workers can also reduce levels of burnout.
With our dedicated team in place, we can use tools to reinforce desired behavior.
As Christopher Little, a software executive and one of the earliest chroniclers of DevOps, observed, “Anthropologists describe tools as a cultural artifact. Any discussion of culture after the invention of fire must also be about tools.”
Similarly, in the DevOps value stream, we use tools to reinforce our culture and accelerate desired behavior changes.
One goal is that our tooling reinforces that Development and Operations not only have shared goals, but have a common backlog of work, ideally stored in a common work system and using a shared vocabulary, so that work can be prioritized globally.
By doing this, Development and Operations may end up creating a shared work queue, instead of each silo using a different one (e.g., Development uses JIRA while Operations uses ServiceNow).
A significant benefit of this is that when production incidents are shown in the same work systems as development work, it will be obvious when ongoing incidents should halt other work, especially when we have a kanban board.
Another benefit of having Development and Operations using a shared tool is a unified backlog, where everyone prioritizes improvement projects from a global perspective, selecting work that has the highest value to the organization or most reduces technical debt.
As we identify technical debt, we add it to our prioritized backlog if we can’t address it immediately. For issues that remain unaddressed, we can use our “20% time for non-functional requirements” to fix the top items from our backlog.
An amazing dynamic is created when we have a mechanism that allows any team member to quickly help other team members, or even people outside their team—the time required to get information or needed work can go from days to minutes.
In addition, because everything is being recorded, we may not need to ask someone else for help in the future—we simply search for it.
With these practices in place, we can enable dedicated transformation teams to rapidly iterate and experiment to improve performance. We can also make sure that we allocate a sufficient amount of time for improvement, fixing known problems and architectural issues, including our non-functional requirements.
In our next post we will look at how the organization or our teams can affect how we perform our work.