The following is an excerpt from a presentation by Mark Schwartz, Enterprise Strategist at AWS, titled “Napoleon, DevOps, and Delivering Business Value.”
You can watch the video of the presentation, which was originally delivered at the 2018 DevOps Enterprise Summit in Las Vegas.
I am an enterprise strategist at AWS which means I’m part of a small team of ex-CIOs and CTO’s of large enterprises, whose job it is to talk to our customers (other large enterprises) who are in the midst of a transformation moving to the Cloud, adopting DevOps, etc. and are hitting impediments of some sort.
Usually, it’s non-technical impediments, it’s a cultural change, organizational change, financial issues, or bureaucracy that they have to overcome, and we try to give them some strategies based on our experience and what we’ve seen other customers do.
Before I joined AWS
I was the CIO at US Citizenship and Immigration Services, which is part of the Department of Homeland Security. Before that, I was CIO of a company in San Francisco, CEO of a software company in Oakland. I think what happened to me is one day I was reading a newspaper article about all of the IT problems in Homeland Security, and being the problem solver that I am, my reaction immediately was, “I can fix that— let me at it. We can do this.”
Somehow all the stars aligned, I wound up at USCIS, and about a year ago it suddenly hit me— we were done. Everything was fixed. The government was running like clockwork and it was probably time to move on to the next opportunity, so I wound up at AWS.
Along the way, I published a couple of books and I am just finishing off my third book, called War and Peace in IT. As the title would suggest, it’s a sequel to Tolstoy’s novel and I’m trying to bring out his critical messages on DevOps, which I thought required a sequel to really articulate.
a story from War and Peace
This will set the stage for what I want to say about some of the impediments that large enterprises face when they’re adopting DevOps.
In this novel, Napoleon is invading Russia, which is one of the major plots throughout the whole novel. Napoleon is a great military leader, and as his army is coming into Russia, the Russians keep withdrawing. His army pushes in deeper and the Russians withdraw a little bit more. Then finally the big climactic scene comes when they’re going to have their big battle, the Battle of Borodino, which goes something like this…
The day before the battle, you have Napoleon surveying the ground with his direct reports, giving instructions, and setting the scene for the battle. Of course, each of the generals thinks they know better than Napoleon so they’re mostly not listening to him, and they decide to do some other things that don’t fit together very well.
When the battle starts the next day, Napoleon withdraws to a fort that he had taken about a mile away from Shevardino, and from this fort he’s issuing commands. Napoleon does this by having messengers ride up from the battle to tell him what’s going on, he makes a decision, gives an order and the messengers go back to the front with the order.
Napoleon can’t actually see what’s going on in the battle for a variety of reasons. He has to wait for the messengers to come. Unfortunately, by the time they get to Napoleon, the news that they’re bringing is completely outdated. In one very memorable moment, they ride up and they say, “We’ve taken the bridge at Borodino, do you order the army to cross it?” Napoleon says, “Yes, have them cross it and form into ranks on the other side.”
ATTEND THE DEVOPS ENTERPRISE SUMMIT
What he doesn’t know is that not only have the Russians taken it back but they burned it, there is no bridge anymore. All of which happened the moment the messenger had left to go tell Napoleon. Tolstoy’s point is that Napoleon has nothing to do with what’s actually going on on the battlefield. He thinks he’s issuing orders and none of it means anything.
What is Tolstoy trying to tell us about DevOps?
The point here is that Napoleon’s decision cycle is really long because he has to wait for the messenger to come and go back, but the action that’s actually happening has a very short cycle time. It’s moving very quickly. There’s sort of an impedance mismatch. There’s no way you can actually control something that’s moving quickly if you’ve got this big decision cycle.
If you look at DevOps in an enterprise, you have business situations that are full of complexity, uncertainty, and rapid change. You have DevOps, a technical process, that works religiously to shorten lead times, and which is embedded in a business process that is nothing of the sort.
DevOps more or less takes us from the point where we write code and commit it, to the point where it gets deployed and gets into users hands. Monitoring and telemetry kick in and prove that it’s actually working, but it’s essentially from code commit to use in production.
The process for making business changes is this big thing, however. Generally, it starts with something like realizing that you have business needs or mission needs. Then you need to figure out what they are, put a group of them together into a big project or program, build a business case for that program, pass it through an investment management process, governance process, a steering committee, etc. then find the resources who are going to work on it. Eventually, you get to the point where code is written and deployed. You also have a bunch of things downstream to realize the value out of these changes that you’ve made.
Essentially, what you have is from concept to deployment (or concept to cash, in some cases,) is this really long lead time. This is Napoleon on his hill. He’s making decisions about what needs to happen, what’s going to be valuable and by the time it gets implemented, everything has changed.
No matter how much you speed up the part that DevOps covers, it won’t make a difference for the big project work, at least, when you’re in maintenance mode and you have rapidly churning requirements. In a lot of cases in a big enterprise, you’re dealing with big capital investments that need to go through this drawn-out process before they get to the point where you can start writing code.
This is the area that really interests me. What can we do about all of this?
In order to take full advantage of what DevOps can give us, we need to speed up the entire lead time. That’s what’s going to let us maintain our nimbleness, our agility, our innovation, our ability to try experiments and make decisions based on the results. My interest, then, is in that big upfront fuzzy front end and what we can do about it.
The first realization I had was that the way we set up this big process is designed for a world that’s predictable or at least reasonably so. Yet, we happen to be in an environment, as Napoleon was, that’s filled with uncertainty, complexity, and rapid change.
The mental model you use to make decisions, investment decisions, in particular, is very different in an environment where things are reasonably predictable than the one you’d use when things are changing rapidly or are uncertain.
The old scheme was to put together a business case, which involved projecting revenues and costs based on the new IT system you’re building. Everyone understands that you can’t project your exact revenues and costs, but the assumption was that if you put a stake in the ground and you put a number there, you’re going to be somewhere close.
But, what if any accurate projection was so uncertain that a confidence interval of 60% would require plus or minus 200% from your estimate? In other words, “this is going to make us $10 million of revenue…plus or minus $40 million.” The more extreme your uncertainty is, the less meaningful a business case is.
With the environment that we’re in, there is uncertainty that trying to project revenues and costs or make decisions based on them, is not the right mental model. When you’re making your decisions based on a business case, it’s really important that when you can actually execute the project exactly as you planned it. If the business case says, this is going to make us $10 million a year and it’s going to cost $1 million to build, well, when you execute that project you’d better spend $1 million and you better to get back $10 million. Execution according to plan is highly valued.
In a world of uncertainty, however, where you know that things are not going to go as planned, it’s a bad idea to pretend that they are. That’s willfully misleading yourself and making bad decisions.
Therefore, we have this oversight process that is based on assumptions about the old world and that doesn’t quite tie to the new world. If you think about why we have all of those processes, I would submit that there are really two reasons:
- First is to reduce risk of the actual execution.
- Second is to make the best decisions about where to put your resources.
You have an opportunity cost if you’re going to invest in ‘this project’ rather than ‘that project,’ and you better have a sense that the return you get from ‘this project’ is bigger than ‘that one’. We set up these processes of building business cases, reviewing them, and trying to align with strategic objectives in order to mitigate risk and to allocate capital to the appropriate investments.
In Napoleon’s world, making those decisions in advance about exactly how you’re going to conduct the battle was not that effective— it was not effective at mitigating risk, it was not effective about using the right deployments of your resources to conduct the battle, etc.
In an environment of uncertainty where unexpected events are going to happen and you know that they are, sounds at first like a highly risky environment. The traditional way of thinking about risk would call it that.
But, it only sounds pretty risky if you’re still thinking about it as trying to execute exactly on a business case. The fact that unexpected things are going to happen is not necessarily a hazard, it could be an opportunity as well. Something unexpected happens, either it destroys you or it makes you better. The difference between those, whether the unexpected is going to lead to something good or lead to something bad, is your ability to seize an opportunity out of that change. In other words, it depends on your agility, your speed, your flexibility, and your inventiveness.
The best way to manage risk in this kind of environment is to make sure when something unexpected happens we can turn it into an opportunity. We have the agility and the inventiveness to be able to turn it into a positive thing.
ATTEND THE DEVOPS ENTERPRISE SUMMIT
In a sense, in today’s environment, risk is the same as not being agile. The agility I’m talking about is that entire value stream where you have to notice the opportunity, figure out what you’re going to do, justify it, apply the capital resources and everything else and do all of that quickly. How can we rework this long value stream in a way that promotes speed, agility, flexibility, and innovation or inventiveness? Well, I can think of three basic models, which I’ll walk through with you. This is not to say that these are the only three models, these are just the ones that come to mind when I think about it.
- Product Model
- Budget Model
- Objective Model
The Product Model
The product model says we’re going to reduce the time that it takes to have fast decision-making cycles because we’ve decentralized them into a product team.
This is very much the way that AWS works, for example.
In AWS, we offer 125 or so different services. There’s a team responsible for each of those services and the team maintains its own roadmap. It is influenced by central input, but the team has control of its roadmap and it works with customers to figure out what customers need and then it decides what it’s going to do about those customer needs. In fact, 90-95% of our product roadmap is drawn directly from requests from customers, so it’s a process that’s optimized for that.
Now, the central concerns do have some influence on it. For example, there’s an imperative to all of the service teams to reduce their costs and pass the savings on to customers. Each service team can interpret that however they need to, they figure out how to reduce costs. But there’s this high-level objective that gets passed on to the teams. That’s what a product model might look like.
The Budget Model
I think we’ve had this myth in the IT community for a while that we have two kinds of costs in our IT budget. We have ‘keeping the lights on’ costs and we have innovation costs. And I think all of us say innovation is where we want to spend our money, maintenance is not where we want to spend our money.
But, I disagree. I think a lot of what goes by the name of ‘keeping the lights on,’ is actually innovation work. It’s actually what’s advancing the business, it’s actually what’s changing the IT systems to keep them consistent with what the business needs.
You don’t have to maintain software like you do a car. You have to maintain a car to make it keep functioning the way it did when you bought it. Software keeps doing exactly what it’s always done, you don’t have to put money into it to make it keep doing that. The problem is, you never want software to keep doing what it did when you bought it. You want to keep changing it as your business changes. The maintenance spend to a large extent is remaking the decision that this is the right software for you to use and making changes to it as you need to.
I say all this as a backdrop to my budget model because usually the maintenance side of things is done out of that sort of a budget. We’re going to put this much money into keeping the lights on and then we’ll figure out how to spend it, as opposed to innovation money which is usually a large capital expense that has to go through a governance process.
Why not treat everything that we do as ongoing maintenance of our IT assets? Sometimes it involves building or buying new systems, sometimes it involves making changes to an existing system. More often, it’s refactoring existing systems. Maybe a strangler pattern, breaking off pieces and doing something with them. It doesn’t really matter which of those it is and it doesn’t matter to the business users how you’re getting them the capabilities. It could be from an existing application or a new one. In fact, if you’re building on an existing application, it’s a very effective way to spend your money because you’ve already got something there.
But either way, you’ve got this big legacy estate of IT and you’re constantly changing it to keep up with what the business needs, and you do that through a budget which is passed down through an organizational structure. That means the amount of money that makes its way to whoever’s spending it, they have control over how it’s spent in their budget. So you don’t need these long cycles of asking, is it okay if we add this feature and this feature? You don’t have to go to a steering committee generally for that.
The budget model is potentially a way to speed up your time to decisions around your DevOps initiative.
The Objective Model
The objective model I think is really interesting. This is something we tried out at USCIS that was really effective, and since then I’m hearing about more organizations using it.
I’ll give you an example of how it’s used because it’s easier to see that way.
At USCIS, one of the systems that we were in charge of is E-Verify, which is the application that employers can use to make sure that their employees are eligible to work in the United States. We anticipated that at some point soon for political reasons, that’s going to have to scale up like crazy. At some point, Congress is going to say all employers need to use this. Right now it’s just a very small set of employers that do.
We knew we were going to have to scale the thing like crazy, but we realized that it wouldn’t scale. This was not because of technical reasons, it’s in the Cloud now we have elasticity in the Cloud. The problem is the human part of the system.
ATTEND THE DEVOPS ENTERPRISE SUMMIT
E-Verify, at the time we started this, could in an automated way handle 98.6% of the cases. That’s great, except the other 1.4% of the case, a human being has to look at them, and if we suddenly got a lot more cases, we wouldn’t have enough people to do that. So an objective we had was to increase this 98.6% to a higher number.
The second objective was a human being who was adjudicating these cases could do about 70 cases a day, we decided that we needed that number to go way up. We needed to develop software or do something that’s going to let them adjudicate more cases every day.
Third, when companies signed up to use E-Verify, we had a shopping cart abandonment problem. About 40% of them actually made it through the whole process and the rest did not. They got stuck somewhere. We said we want that number to go closer to 100%.
Altogether, we realized that we had five big business objectives in this project. In the traditional way of making investment decisions, you would translate those objectives into a bunch of requirements. Then you would put all those requirements into a bundle, build a business case for it and then try to execute those requirements.
This is a little strange if you think about it because adding those requirements actually adds risk to the project. If you’re just going to take those objectives and try to execute them, that’s one thing. However, now you’ve said that those objectives turn into requirements and that’s going to have the impact we want. You’ve now added the risk that your requirements are not the right ones. What you really want is to accomplish the objectives. You’ve also tied the hands of the innovative people who you want to be thinking of good solutions by adding these requirements.
What we chose to do in this case is instead of creating requirements, we just took those objectives and passed them to teams directly. We would create a team that was a cross-functional team, this might sound familiar from DevOps. It included developers, operations, infrastructure people, testers, security people and it also included business operations people— truly a cross-functional team.
It had business skills, it had technical skills. Then we said, 70 cases a day, make it go up and that’s the only instruction we gave. We didn’t say, here’s a bunch of requirements, execute the requirements. That team then owned making the objective ‘make 70 cases go up’ and they could do it in whatever way they could figure out that would make that number increase.
In fact, we told them specifically if you can do this without writing any software, just changing business processes, go right ahead and do that. If you want to write some software, whatever it is that’s going to make that number go up, all we care about is the number.
And since you’ve got the Cloud and you’ve got DevOps tooling and the process all set up, you should be able to start producing functionality tomorrow. In two weeks, you’ll tell us what that number is now. Then every two weeks after that, we’re going to review it and see what number you’ve gotten it up to. Every time, we’re going to remake the decision to keep investing in this based on what we see.
After a month, we could see that the number of cases per day was going up and we said to the oversight body that was responsible for the investment, ‘look, the number is going up. We think we should continue investing in this objective. What do you think?’ And they said, yeah, sounds good.
With the shopping cart abandonment rate problem (remember it was 40%,) it went up and then it started plateauing. In our bi-weekly discussions with the team that owned that objective, we asked what kinds of things they were doing to make the number go up and what they told us made a lot of sense and it just didn’t seem to be budging. So when we went back to the investment committee, we said, ‘Here’s what we’re seeing. It did this. We’re trying everything, it’s not helping. We suggest that we stop investing in this objective and put the money on something else.’
That they thought was pretty strange because no project ever returns money. The point of this process was that because we were going to constantly be reviewing it, the central body still had control over it in every relevant sense. In fact, more control than if we had started with a big set of requirements and could constantly remake the decision about whether the spending was going well and divert resources.
Ultimately, this project was continued. It was planned to be about a four-year project, after about two and a half years, the team said, we’re done. We’ve accomplished these five objectives to the extent that they can be accomplished so let’s just return the rest of the money.
This I think is the perfect model for the Age of Napoleon, where you have all this complexity and uncertainty.
By decentralizing the decision-making to the teams but yet agreeing on an objective that could be used for control centrally, everybody was aligned, all of the controls were in place around the investment, the team could be innovative and could be testing hypotheses and continuing to invest in the things that were working. Their accountability was to accomplish the objectives, which is what the business case was built on in the first place.
These three models are all ways to stop this big long investment decision cycle
They can help make it much shorter by decentralizing control over the decisions so they don’t have to go to Napoleon who’s a mile away and come back again. But yet, the central authority still has control over it in every meaningful sense and can manage the risk of it.
The three models again were product team that owns its own roadmap but has influence from the center.
The budget model where funds are allocated through a hierarchy until they reach a team which now has the funds (or really in most cases the number of teams, the production capacity,) and they’re going to decide how to use it. Then the central authority can reallocate funds and make changes based on how well it’s being spent.
Then the third model where what’s cascaded from the center is just an objective and the team then has the freedom within that objective to do whatever it is able to find that will accomplish the objective. Essentially, the requirements can vary, which is really what we want and agile world.
These aren’t the only a possible models but I wanted to throw them out there as ways to think about the problem. But the problem is still we have to shrink this long cycle if we’re going to take full advantage of the DevOps short cycle time and the flexibility that we get from it.
Going back to Napoleon, I think the important thing to realize is shrinking cycle times is not about doing things faster.
His army is on the field, it’s going to take as long as it takes. It’s not about the speed, it’s about the quality of the decisions. Napoleon can’t make good decisions from where he is because the cycle time for his decisions is out of sync with the cycle time that things are actually happening.
I think when you put DevOps into an enterprise, it’s the same concern.
It’s not just about how quickly you can get stuff to market (although that’s important,) it’s also about how your central organization can still have good control and still make good decisions while the action is moving really fast both in the business context and in the DevOps process.
You need to make this cycle time fast so that you can make really good decisions, so that you can lead your army to success as Napoleon didn’t do against Russia in the end.
That I think is what Tolstoy has to teach us about DevOps.