The following is an excerpt from a presentation by Cornelia Davis, Senior Director for Technology at Pivotal, titled “DevOps Who Does What.”
You can watch the video of the presentation, which was originally delivered at the 2017 DevOps Enterprise Summit in London.
Throughout the years, I’ve had the great opportunity of working with very, very large enterprises across all verticals. My background is as a technologist, I’m a computer scientist, and initially I spent a lot of time talking tech at the whiteboard. But then I realized that there was so much more that needed to change, which is why I’m sharing now about the organizational changes that can support our technology needs.
So my first question to you is, is this your reality today?
We have different business silos across the organization and different individuals that are coming from those silos. When we have a new idea for a product, we kick off a project and individuals go into the project to do some work.
The first individuals from the first silos come in, and they generate their artifact. Then what do they do? They throw it over the wall to the next step. If you look at this slide below you’ll notice once they’re done they leave the project.
If for some reason we have to go backwards, we have to figure out how to get them back into the project. And, so it goes through each silo. We all recognize that this is a slow and challenging process. If it only moved linearly, it might be okay. But we all know that it goes this process goes backwards, and forwards, even circular!
But that’s not even the biggest problem of these things.
The biggest problem is that each one of these organizations are incentivized differently.
My favorite examples are App, Dev, and QA — so let’s look at these.
Application Development is almost always incentivized by ‘Did you release the features that you promised on time, and ideally on budget?’ And, if you released the features on time you get a ‘Way to go, you achieved your goals.’
Then it moves over to QA.
What is QA incentivized on? Well, they’re responsible for quality. So they are generally incentivized by the number of bugs that they have found and fixed.
Now, let’s look at these things in combination. What happens when the application development process starts to fall a little bit behind? Developers start working late into the evenings. They work on weekends, they start working very unsustainable hours, and what happens? Quality suffers, but they hit their features on time.
Well, when they throw that over the wall to QA, what’s gonna happen now?
QA is going to find more bugs. Way to go! So we’ve got locally optimized metrics that do not create a globally optimized solution. That’s a big problem.
Well, the answer is really simple…
The answer is balanced teams!
What we’re going to do is we’re going to center things around a product and the product team is incentivized to deliver value to a customer, to deliver value to some constituency.
For example if I’m in an e-commerce scenario:
I have a product team that is really about the best experience around showing product images, recommendations, soliciting reviews, or it could be some back office product that is enabling your suppliers. These are all the different product teams.
There’s been a lot of research, and a lot of discussion, and a lot of proof points that product teams are really the way to go.
But what if we don’t have product teams? What if we have different roles within the SDLC, how do you create product teams of these different disciplines to come together into a product?
Well, we’re going to try and put things through the sorting hat.
If you have been living in a cave for the last 10 years, and you don’t know what the sorting hat is, this comes from Harry Potter. When new students come to Hogwarts School of Witchcraft and Wizardry on their first day, each one of them places the hat on their head and they get sorted into one of four houses, and that’s the house that they live in for the next seven years.
So we’re going to take those roles and we’re going to sort them into houses. But the question then is, what are the houses that we’re going to sort into? So let’s take a little bit of a tangential ride over to the side and think about a couple of houses. (I’m going to end up with four houses in the end, but I want to start with two.)
The left part of this slide you’ve all seen for the last several years.
That is where we were maybe 15 years ago. IT was responsible for the entire stack from the hardware all the way up through the application.
Then VM Ware came along and virtualized infrastructure.
Then a whole host of people made infrastructure as a service available. Amazon web services of course being kind of the behemoth of that.
That made it so that we could just get machines, EC2 machines for example, and then we could stand up everything that we needed on those machines. Getting machines was easy.
Then in the last five years or so, we’ve taken that abstraction up another level and we’ve created application platforms where we have individuals who can be building applications, and the only thing that they need to worry about is their application code.
What’s important about that application platform is that it generates a new set of abstractions. Those abstractions are at a higher level. They are fundamentally the application, or maybe some services there in support of that application, and it allows us to not do things like implement security by creating firewall rules that machine boundaries, but instead allows us to implement security at the application boundary.
This new abstraction is one of the key things that’s happened in platforms over the last five years. It’s given us something really interesting and really important. It’s allowed us to define two different teams. And it’s defined a contract between those teams that allows these teams to operate autonomously.
When we hear about all of the different goals of an enterprise, they all talk about needing to bring software solutions to market more quickly, and more frequently. So agility, and autonomy, and teams is incredibly important. We’re always looking for those boundaries where we can create more autonomy.
So now the application team…
The team that’s going to create the next mobile app or the next web app or even some analytics app for example, can focus on building that application, and they don’t need to worry about even the middleware that sits below it.
They’re responsible for creating the artifact. They’re also responsible for configuring the production environment, deploying to production. They are doing Dev and Ops. It’s not necessarily the same person, but it is the same team. They’re deploying to production, they’re monitoring. When they notice that they need more capacity, they’re scaling so that they can achieve better performance. They deploy new versions when they need to.
It’s entirely up to them.
Now there’s another product team, and that is the platform team.
So that’s the team that’s providing the platform, and notice that they’re doing exactly the same things.
They are deploying the platform, they’re configuring it, they are monitoring it, they are upgrading it when they need more capacity, or upgrading it to the next version. They’re doing the same things but they have their own products that they’re working on. So the product orientation is really key.
This separation gives us the first two houses that were going to sort into. The APP team, and the platform team.
Now let’s take all of these roles that come from traditional organizations and start sorting them. And, so here’s our two houses, the APP team and the platform team.
We’re going to do this piece by piece and I’ll explain the steps as we go along.
The first ones that we’re going to do is we’re going to start with the purple bubble there. Before I sort them, notice that this Middleware and App Dev team is actually taking care of both the Middleware and the application development.
In retrospect, having worked in this new world for the last five years, I find this kind of counterintuitive because why would somebody who’s creating an application i.e., using the middleware be in the same group as the middleware itself? To a large extent, it’s because in the past middleware required a great deal of expertise. You had to know a lot about the middleware to be able to effectively program against it. That’s something that we’re trying to move away from. We’re having more agile middleware platforms and so on.
Notice what happens here.
We’ve got middleware and we’ve got App Dev, and we break those apart. We put the middleware engineers inside of the platform team. They’re part of that team providing the capabilities that the APP team can then use. Then we take kind of a full stack application development team, and put them up in the APP team. We’ve got front end, and we’ve got back end. All of those individuals are there.
That one’s pretty straightforward.
The next one that’s also pretty straightforward is we’re going to pull some of the folks out of the infrastructure team, the folks responsible for building out the servers and the networks.
You might have noticed I put virtualized infrastructure and platform together in one team that many of our customers actually keep those as separate, but in this case it really wasn’t important to make that separation. You could be separating the platform team into two separate individual ones as well. The thing that I would caution you is you need to make sure that you then have a very crisp contract between the platform team and the infrastructure team.
I’ll be honest with you, that’s a little bit harder to find at the moment, so that’s part of the reason I’m put them together.
Again, server build out, network build out, they are part of the platform team providing the view of the infrastructure up to the App team.
The next one that we’ll talk about here are what I like to call the control functions.
There is information security for example, and change control. Why did I move them at the same time? Change control was usually often coming out of the infrastructure team, and information security coming out of the chief security office. I moved them at the same time because they share a common characteristic. They are functions that today can stop a deployment. They are functions that on every release, on every release into production, they need to give their blessing.
We’ve seen when it comes to the very end, and we find problems in information security or any other types of security, it can actually stop things. There’s a great huge ball of things that we need to check off.
These functions here, information security and change control should engage with your teams that are providing the platforms, and the automation around the deployments to ensure that their concerns are satisfied. Their concerns are not wrong. It’s just the way that we’ve been solving them is something that’s in need of transformation.
All right, next. Let’s talk about Ops.
I have talked to countless organizations where operations is in the infrastructure group, and they’re part of the run, plan, build, etc. They run everything. They run the platform, they run the infrastructure, they run the middleware, they run the applications.
What we could be talking about here is really DevOps. Let’s put operations, let’s make operations part of the product teams. Again, it doesn’t have to be the same exact individual. It has to be the team though. Instead of having one operations group, let’s put operations capabilities into each of the product teams so that the people who are experts in operating the platform product can operate the platform product, and we empower the teams and the application team. We give them the right abstractions so that they can do their own operations.
That doesn’t mean that they have to learn the entire stack down to the infrastructure. For goodness sake, no!
We don’t all become experts at everything, but we give them the tools and the empowerment to do their own operations. We take a function that was one function, and we split it out over the different product teams.
The next one is capacity planning.
I was working with a very large automotive manufacturer in the United States, and I was talking with somebody from their ops team. I was poking at these roles, trying to understand exactly what theirs looked like, and I said, “Who’s responsible for capacity planning?” And, I kid you not, the individual from this organization pulled up the it manual and said, “See, it says right here, we’re responsible for capacity planning.” It was that rigid. There was one group that was responsible for capacity planning across this entire spectrum.
That’s pretty normal. So what happens?
Well, the capacity planning process goes something like this. Really early on, well before production, we have to come up with some estimate of how much capacity you’re going to need, and you know what? We’re lousy at that. It’s impossible to come up with a really good prediction of what the capacity is that we’re going to need.
Since, we know that we’re lousy at it, the worst thing that would happen is if we underestimate. So we overestimate. We end up over provisioning, and we have resources that are under utilized.
The answer here is to put capacity planning in both of the places. Now, it’s not as easy as that. It comes back to the contract that’s sitting between the platform team and the application team. You cannot, for example, have the App team doing their capacity planning and doing the scaling. Capacity planning is not just an estimation function now. Capacity is really capacity management. If I need more, I get more.
But how do I keep the application teams from exhausting the resources that are in the platform? Well, we do that with contracts. Simple things like quotas.
Even if you’re using GCP, or Azure, or EC2, or any of the AWS capabilities, you have quotas. Yes, it’s very simple to get more, but you have that contract with AWS that says here’s the amount of capacity that I need. AWS or whoever your platform team, is going to use those quotas to estimate the actual capacity that they need to provide from the platform. So it’s important to come up with that contract and then each of the teams comes up with the processes that they’re going to use to both provide enough capacity to their consumers and to estimate their capacity needs going down.
The next ones, you’ll notice I pulled from the data team. Now I will confess to you right now that, again, I’ve been working on Cloud Foundry for the last just about five years, that we as an industry have made a lot of progress on breaking things up and figuring out how to reorganize groups when it comes to application capacity, when it comes to compute, but we haven’t done as well on the data side.
For example, in most cases, we’re seeing, organizations creating microservices based architectures that if you peek behind the covers just a little bit, you notice they’re all tied to the same very large monolithic database. From an organizational perspective, we’ve seen very little movement on the way that the data team is reorganized and they’re the ones that are responsible for providing any kind of database capacity into the organization.
I sometimes like to say this is the group that you go to and they say, “Hi, Oracle is the answer. What’s the question?” We want to break that apart as well. Whether these terms are the exact right terms or not, you can notice that I moved the DBA, and I’m considering the DBA, the individual who’s responsible for providing the database servers, the database clusters, for providing that capacity. They belong as a part of the platform team.
Now the platform team, can in fact be subdivided into smaller two pizza teams, so you might still have a team that specializes in providing relational database capacity, another team that specializes in providing compute capacity, another one that specializes in providing graph database capacity and so on. But we have that team that’s responsible for providing those services as a part of the platform substrate.
Then what we want to do is give them control about their databases and their Schemas. Let them evolve their Schemas, let them version those Schemas, let them figure out how they can have multiple Schemas running in parallel, all of those types of patterns. We want to break up data into the right groups as well.
Next I want to talk a little bit about product teams needing product managers.
I’m going to bring another organization into the picture here, and that’s the business. What we’ve had in the past, you’ll notice there under enterprise architecture, are business analysts. Business analysts have generally been in the business of taking the requirements from the business, and translating them into something that they can start to launch the rest of the IT process.
What I want to do here is I want to take your business analyst and not make them THE product manager. I want to pair them with somebody from the business, because if you don’t pair them with somebody from the business, then you’re still throwing things over the wall. Remember the picture at the very beginning, I’m still starting with the business who’s throwing things over the wall, and so you’re still going to have that conflict, that tension, that finger pointing that happens when we do scope creep, etc. Make them part of the product management team, though, they’re now responsible for the scope themselves. There are no longer change orders.
Now, of course it’s not just the application, the consumer facing application team that needs product managers. The platform team needs product managers as well, and what we found that is really, helpful is that the folks from your enterprise architecture group are really good candidates for becoming the product managers, or maybe pairing with somebody from the infrastructure teams to be the product managers for the platform team.
You’ll also notice that I’ve put enterprise architecture as a part of the platform team, but I’ve left them over there in their bubble as well.
Now, I’ve got two things left.
I’ve got some enterprise architecture roles, and I’ve got enterprise application roles.
The third house that I’m going to add is your enterprise applications house.
So DCTM, you’re probably wondering what that is.
Documentum Enterprise Application was built 30 years ago. It’s one of those monolithic applications. Built on top of kind of a three tier architecture. It’s got that big old oracle database, or a sequel server database at the bottom. Very resilient, resilient storage systems. It’s got a big, thick tier in the middle, and it started out with the desktop client application, and this was pre web of course.
You all have these types of enterprise applications in your organization, and we have to continually deal with them. The first thing that I’ll tell you is that I want you to start thinking about your Documentum team, or your enterprise application team, as the product teams.
Now, they are not going to be able to move as agile as some of those other organizations. However, there are like Rosalind Radcliffe who is applying DevOps principles to the mainframe. The lessons that you may learn from her are the lessons that you should be applying here. You can do product management, you can do DevOps in these settings as well.
These systems, however, will tend to move, particularly while you’re still making the DevOps transformation at a pace that is not quite the same cadence as the daily or multi daily releases that are happening by the App teams.
What we want to do now is create multiple product teams, multiple application teams across the top that are leveraging both the new platform as well as connecting into the enterprise systems. I call that the legacy service team here.
This is the the team that is generating the interface that’s going to mediate between the application teams on the left-hand side, and the enterprise system on the right. Notice that it’s a product team just like any of the other product teams with a product manager, capacity planning, all of those types of things.
Coming down to the final couple of roles here…
Let’s talk a little bit more about enterprise architecture. I was at a conference a couple of years ago, and I was having breakfast with a number of individuals that I didn’t know. There was somebody from QVC. There were two individuals from that organization there. One of the individuals was telling a story where he said, “Last year when I was at the conference, I was part of one of these teams over here. This year I’m in enterprise architecture. Last year I was policed, this year I am the police.” I thought, “Remind me not to go work there.”
What we want to move away from is this notion of enterprise architecture being the ivory tower. This has been one of the most successful things that I’ve seen with the organizations that I’ve been working with. By and large the enterprise architects love this transformation.
Here’s what we’re doing, I’m going to introduce a new house. This last house I’m adding is what I’m calling the ‘Enablement house.’ It say: take some of those functions that are in enterprise architecture and instead of making them an ivory tower, follow the practices, have them in the role of enabling teams. One of the best ways that you can have them in a role of an enabling teams is actually to make them part of the team.
You’ll notice here that I didn’t remove them from the enablement organization. These are individuals that can stay as a part of a matrixed organization like enterprise architecture, but they actually spend part of their time pairing in the teams. They become parts of the team members. They’re measured on that team success. It’s important, though, that they still have this broad view across the different projects because that’s where you start to see about the reuse.
One final thing that I want to add into this picture is that enterprise architecture is not the only organization that I’m suggesting that stays together as an organization and is then paired into the product teams. Other organizations like that would be things like information security.
That is the final sorting that I want to do, now, let the wizarding begin.