DevOps Culture (Part 1)
After the first US based Devopsdays in Mountainview 2010 Damon Edwards (@damonedwards) and I coined the acronym CAMS, which stands for Culture, Automation, Measurement and Sharing. Jez Humble (@jezhumble) later added an L, standing for Lean, to form CALMS. In this post I wanted to start with an introduction and overview of what culture might look like in the DevOps movement and identify some patterns.
Patrick Debois, (@patrickdebois) godfather of the DevOps movement, always says DevOps is a human problem. I think most of the DevOpserati would certainly agree. Damon takes it one step further and claims it is a management problem. To illustrate the human and management nature of this problem, I’d like to use the example of a fictional organization that has the greatest development and operations teams in the world. We’ll call it “Banana Corporation.”
The Banana dev team makes some of the greatest software products known to mankind. They can take any whiteboard idea and transform it to a clean artifact in one day every time. Meanwhile, Banana Corp also has a super human class of operational engineers that can take any artifact created by the dev team and bullet-proof it in production in under 24 hours every time. Sounds perfect right?
There is, however, one small problem. The dev team is in Singapore and the operations team is in LA. Banana Corp had decided years ago to create an elaborate notification system structure to communicate all processes between different regions. Over the years the system had become very convoluted but no one could ever find enough time to fix it or replace it. Enter Betsy, the kind hearted super administrator.
Betsy was hired as the first employee of Banana Corporation and has been only a year away from retirement for over 10 years now. Betsy receives all inbound requests for the LA group and Betsy has been battling some health problems along with being extremely overloaded with other administration endeavors due to recent cutbacks. Stay with me here, and let’s just say for purposes of this story the average time it takes a notification of a developed artifact to travel between the two teams is on average 8 days (10 days total cycle time).
Recently, Banana Corporation’s CEO happened to read an article in an airline magazine about how one of his competitors recently cut their software delivery cycle time down to 3 days from 10 days. He immediately calls his dev and ops managers individually to scream at them wanting to know why their cycle time is 10 days on average. He demands at least a 50% reduction or heads will roll. The Dev manager brings in one of the finest and most expensive consultants to review the development processes. The Ops manager also brings in a high priced consultant to review their operations processes. Both teams improve their respected individual processes by 50% and are proud to report their findings to Mr Banana III.
By now you should be able to guess what improvement he sees. Yep, 10% decrease in cycle time (total cycle time now 9 days). How could this be both dev and ops decreased their respective process times by 50% each. Patrick Debois was right. Betsy is the human problem. Furthermore Damon Edwards was correct that Banana Corporation had a management problem due to the head Banana in charge.
“You can’t directly change culture. But you can change behavior, and behavior becomes culture” – Lloyd Taylor VP Infrastructure, Ngmoco
There is also a great story from General Motors where one of the stations on a production line was showing 5 minute average MTBF and 1 minute MTTR. Before they started thinking about constraints and what I would call “Lean Thinking” they never thought much of those numbers. However, after some bottleneck investigations they found that this station was responsible for stuffing the fluffy fiber in the roof interiors. Turns out every 5 minutes they would run out of fiber and have to walk around a temporary office to retrieve more materials. The office was supposed to be moved out of the way years ago but was caught up some old ignored project. Basically this simple improvement (getting rid of the office) reduce the overall cycle time of building a car by 12%.
I also have a personal more recent example to share with you. I had lunch a few weeks ago with a CIO of a Manhattan based fortune 500 company in which I tried to help him understand his DevOps questions. I asked him if they had ever tried to use the Lean Value Stream Mapping process to understand their over all cycle times. He proceed to give me a tongue lashing about how he has been studying Lean Sigma for over 1000 years and they had tried every trick in the book. I mention a few other DevOps hacks but we seem to be going nowhere fast. However, near the end of the conversation he says something about his operations team that gives me a clue to ask one more last question. I asked him “When you did the Value Stream Mapping, did you do it across engineering and operations?” After about 15 seconds of dead silence he sheepishly answered, “Shit, we never thought about that.”
1. a : the manner of conducting oneself
b : anything that an organism does involving action and response to stimulation
c : the response of an individual, group, or species to its environment
2.: the way in which someone behaves; also : an instance of such behavior
3: the way in which something functions or operates
Do you have Insane or Kaizen Habits?
Albert Einstein said that the definition of insanity is doing the same thing over and over again and expecting different results. I can’t tell you how many times over the years I have heard the sentence “We can’t improve this system because we don’t have control over it.” I have been in companies where engineering and operations only meet at the board level. It’s classic “You can’t fight city hall” syndrome.
About 60 years ago Taiichi Ohno, Shigeo Shingo and Eiji Toyoda decided to “fight city hall” by creating the prototype at Toyota for what is now considered Lean. I’ll save the Lean discussion for another post but I would like to talk about Kaizen.
Kaizen means improvement in Japanese. A Kaizen culture is described as implementing behaviors that continuously show improvements. Day one is better than day two so on and so forth. If Mr Banana really wanted an improvement he would have looked at the whole process and would had tried to get his managers to identify their insane habits and focus on a Kaizen Culture.
Aha to the Ka-Ching
In devops we use a charactature of Mr Banana’s situation we call it the Aha- to the Ka-Ching”. Lee Thompson calls Banana Corp’s 8 day gap “The Wall of Confusion” as depicted in the cartoon figure below.
From the little dude’s lightbulb to the cash registers ka-ching sound is what we classically call cycle time. A lean thinking manager would look outside the known processes and try to identify the wall as waste. In DevOps we try to smash those kind of walls and of course more often than not those walls are hidden behind people. Bringing in a fancy database tuning company to improve your database performance is typically a no brainer and companies that are not using Chef, Puppet of CFEngine for automation typically incur inefficiencies. However, none of those solutions will trump bad culture boundaries. In some cases they can make things even worse.
Bring In Jonah and the Robots
Faster, better, and more secure does not necessarily solve all problems. Solutions that do not improve an organizations goals are not improvements. In Dr. Eliyahu M. Goldratt’s “The Goal’ there is an interaction between two characters named Alex and Jonah. Alex is a plant manager who is traveling to Dallas to speak at a conference to tell is story on how he has improved his plant efficiencies by 36% implementing new floor plant robots. Alex just happens to meet an old physics professor in an airport lounge named Jonah. Alex confides in Jonah that his plant is probably going to be closed in three months if they can’t start making money. Jonah asked him some simple questions and they wind up talking about the robots. Jonah, in a socratic dialect, suggests that maybe the Robots aren’t helping. Alex thinks that Jonah is crazy and that he just doesn’t understand plant floor manufacturing. However, later in the story Alex realized that Jonah was right and in fact the robots increase in efficiency actually made other things in the plant floor worse. Robots in plant floors changed manufacturing forever and just like Chef, Puppet and CFEngine are changing IT infrastructure as we speak. They do not, by themselves, always improve an organizations goal.
A question that often comes up about DevOps culture discussions is “can behavior patterns really be changed with out buy-in from leadership?” The easy answer is no. While there are some hacks than can be be successful, more often than not, without a buy-in from leadership, failure is imminent.
There are three types of bad leadership:
- The really nice guys. The ones who you love to hang out with but who don’t have a clue about how bad it is.
- The leaders who know how bad it is but have an incentive not to change. One of my favorite scenes in Martin Scorsese’ “Temptation of Christ” is where Jesus and Pontius Pilate are debating. Pontius Pilate says says to Jesus something to the effect of “I don’t give a rip if you are here to change everyone to love or change everyone to hate, doesn’t matter. I don’t want either, I like it the way is right now.”
- The leader who, for some unknown or rational reason, doesn’t seem to give a shit.
A CEO who ignores emails is going to foster a culture of ignored emails as a norm. Manager to manager feuds are almost impossible to hack. The best advice I can give in fixing dysfunctional leadership is what is called “coaching up.”
A few startups ago I learned a very important lesson, one that took me years to learn. You can teach your CEO new tricks. It’s not easy, but if you can create a transactional pull system where your boss learns how to ask you important questions instead of you giving out unsolicited advice, you are on a great path. Personally, if I can’t accomplish this, I usually take the easy way out. I leave the company.
Michael Cote (@cote) of Dell steals a line from the movie Glen Gary Glen Ross with a spin on the “ABC’s, always be coding. I like to say always be doing. Actions speak louder than words. The best advice you can give someone is to show them by doing it yourself.”
Another great leadership hack is what I call the “Do now and ask for forgiveness later approach”. This is a matter of taking things into your own hands. Fix a problem and be ready to suffer the consequences if you fail and hope you succeeded where there will be no apologies necessary.
The greatest “Do now and ask later” story is the “Blue Shirt Nation story behind Best Buy. A few years back there were a couple of best buy marketing field support guys who’s jobs were to visit Best buy stores and gather information from the blue shirt sales reps. One of these guys had a clever idea to spend more time with his family and get off the road. He went outside the corporation infrastructure (IT) and bought the domain “blueshirtnation.com’. He spent around hundred bucks of his own money and installed Drupal then sent accounts to all of the blue shirt sales people at the local Best Buy stores.
At first this solved his single travel issue, but before long amazing conversations started happening among the blue shirt folk. Guys in Idaho were debating stereo speaker effectiveness with other blue shirt guys in NYC. At one point HR realized that they might be able to drive an IRA campaign through the Blue Shirt Nation portal. It was a huge success. HR, for the first time in many years, actually increased 401k registrations by over 60 percent by having employees post videos on the blue shirt portal. Before long it became a common practice for Best Buy’s board to ask questions like “Has anyone checked this out on the BlueShirt Nation portal yet?”
The core of Best Buy’s cultural improvement was born in a hidden group of 18 to 25 year olds in cities like Des Moines and Albuquerque out all over America. In fact this just happened to accidentally be their same selling demographic. I once saw a video with one of the original marketing guys who created BSN. He was asked what would have happened if he had asked corporate IT for that portal. You can guess the answer.