Episode 4: (Dispatch from the Scenius)
Elisabeth Hendrickson's DevOps Enterprise Summit Presentations
Guest: Elisabeth Hendrickson
In the second installment of the Idealcast’s Dispatch From the Scenius series, Gene Kim explores Elisabeth Hendrickson’s 2015 and 2014 DevOps Enterprise Summit presentations. Listen as Gene breaks down Hendrickson’s experience and learnings, all to help you find fundamental principles to apply to immediately keep your feedback cycles healthy and happy.
In this episode, Elisabeth, an experienced QA engineer, shares her realization that the better she got at her job, the worse she made things for the organization as a whole. Thus began her journey to uncover the relationship between testing and quality, which has led her to a reality of increasingly tight feedback loops.
“…the better you get at feedback in your organizations, not only are you able to release software faster, you’re creating a learning organization. And that ultimately is how you retain, you remain competitive in this rapidly changing world.”
Elisabeth Hendrickson is a leader in software engineering. She most recently served as VP R&D for Pivotal Software, Inc. A lifelong learner, she has spent time in every facet of software development, from project management to design for companies ranging from small start-ups to multinational software vendors. She has helped organizations build software in a more efficient way and pioneered a new way to think about achieving quality outcomes and how that hinges on fast and effective feedback loops. Her book, Explore It!: Reduce Risk and Increase Confidence with Exploratory Testing, was released in 2013 and is explores technical excellence and mastery, and creating effective feedback loops for everyone. She spoke at the DevOps Enterprise Summit in 2014, 2015, and 2018, and received the Gordon Pask Award from the Agile Alliance in 2010.
Gene Kim is a Wall Street Journal bestselling author, researcher, and multiple award-winning CTO. He has been studying high-performing technology organizations since 1999 and was the founder and CTO of Tripwire for 13 years. He is the author of six books, The Unicorn Project (2019), and co-author of the Shingo Publication Award winning Accelerate (2018), The DevOps Handbook (2016), and The Phoenix Project (2013). Since 2014, he has been the founder and organizer of DevOps Enterprise Summit, studying the technology transformations of large, complex organizations.
In 2007, ComputerWorld added Gene to the “40 Innovative IT People to Watch Under the Age of 40” list, and he was named a Computer Science Outstanding Alumnus by Purdue University for achievement and leadership in the profession.
He lives in Portland, OR, with his wife and family.
Gene Kim (00:00):
This episode is brought to you by IT Revolution, whose mission is to help technology leaders succeed through publishing and events.
Gene Kim (00:10):
You're listening to the Ideal Cast with Gene Kim, brought to you by It Revolution.
Gene Kim (00:22):
If you haven't listened to episode three, where I talked to Elisabeth Hendrickson, go listen to it now. If you have listened to it, here are the talks that I promised you. This is the entirety of Elisabeth's 2015 DevOps Enterprise Summit Presentation. I inserted about 10 minutes from her 2014 presentation because she goes into such amazing detail about some phenomenal technical practices that I know you'll enjoy as much as I did. Join me as we listen to these presentations. And like last time, I'll be breaking in periodically adding my own running commentary on points that I found particularly impactful, both when I originally saw her talk, and now listening to it five and six years later. Here's Elisabeth.
Elisabeth Hendrickson (01:02):
So this seems like a good time to mention a couple of things. First, this talk would not have existed without Gene, because despite the fact that this has the same title as last year's talk, it's actually not. You should think of this as the sequel. And this talk ties together the stuff that I've been thinking and writing about for 15 years, and I wouldn't have made those connections. I wouldn't have connected the dots if Gene hadn't pointed out to me that, "Hey, you've been writing about this stuff for 15 years." I wouldn't have thought about it that way.
Elisabeth Hendrickson (01:35):
The second thing that I should point out is that, yes, my handle is testobsessed. You can find me on Twitter as testobsessed. You can find me on GitHub as test obsessed. Pretty much anywhere where there's a public handle, I am testobsessed. And people think that this is because I am a QA person, and that I self-identify as a tester. I don't. Instead, well, you'll see, because it's kind of part of the talk.
Elisabeth Hendrickson (02:02):
So this is a talk with digressions. My first digression is that I would like to take you back in time to 1999 Silicon Valley, the first dotcom bubble that we referred to as the dotcom bubble, because it was the first time there was dotcom. So it wasn't the first bubble, but it was the first dotcom bubble-like thing. Anybody around in Silicon Valley in that time?
Elisabeth Hendrickson (02:26):
Okay. It was a crazy time. It was an insane time. And during that time, I worked for a small startup as many of us did. I had worked for startups before, and in the past, when I had worked for startups, there was an emphasis on do things cheap. It was an amazing realization the day that I realized that at my little startup, my leadership would pay almost anything if we could ship one day sooner. So this is a completely different dynamic. I'm no longer optimizing for cost, I'm now optimizing for time.
Elisabeth Hendrickson (03:01):
However, remember this is 1999. The term DevOps certainly didn't exist. And in fact, when I proposed something remarkably like continuous integration, the VP laughed me out of his office. So this was a completely different time, and cycle times were long and painful.
Elisabeth Hendrickson (03:18):
During this time, I participated in a working group called Steamer. This is where that paper that you saw the front page came from. In fact, Sam Guckenheimer, who's sitting in the front row, I think participated if not in Steamer than in LAWST, Los Altos Workshop on Software Testing, two working groups that Cem Kaner had started.
Elisabeth Hendrickson (03:37):
At Steamer three, in the fall of 2000, we considered the question, what should the ideal ratio between testers and developers be? This was a hot topic at the time. Microsoft had set the standard at about one to one. And so that was a much discussed ratio. And my leadership in any company that I was with always wanted to know what the ratio should be, because that was how they wanted to do their staffing planning.
Elisabeth Hendrickson (04:05):
Now at this working group, the format is there's a bunch of people, somewhere between 12 and 17 people who are all sitting around and telling each other war stories. And then in a particularly stylized facilitative approach, we pick apart each other's war stories and we really drill into the details. So in this particular working group, the facilitator said, "All right, I would like to give you all two stickies. And on one sticky, I want you to write down the number, the ratio for the project that was the very best project you ever worked on, the ratio of testers to developers. And on the second sticky, I want you to write down the ratio for the very worst project that you were ever a part of." And then he gathered up the stickies and we put them on the walls. And on the best, there was no real discernible pattern that was terribly interesting. But on the worst, they were all really high. The worst project had the most testers.
Elisabeth Hendrickson (05:07):
Now, we don't want to confuse correlation and causation. That would be the fundamental attribution error. We did not immediately want to assume that the testers caused the failure of these projects, but there was certainly something very interesting there. And frankly, that stuck so hard in my brain, that it led to Cem Kaner and Jennifer Smith-Brock and me writing the paper. By the way, the punchline in the paper is ratios are irrelevant, so don't bother using them, terrible staffing model, use critical thinking. That was the punchline of the paper.
Elisabeth Hendrickson (05:43):
Now, time passed and I was still at this little startup. The little startup, I still have nightmares about that place. It was the kind of place that just leaves you kind of a little bit nuts for a while. So time passed, pressure had increased at the startup, and a funny thing happened. I realized that our quality was getting worse. And this seemed odd because I was originally brought in to help improve quality. And I had done a lot of work to build up the test team there. And I came to realize something really horrifying. This was the worst day in my professional career to date. I realized that the better I got at my job running a quality group, the worse I did for my company. That's pretty depressing, right? That's pretty bad.
Elisabeth Hendrickson (06:32):
So let me explain how that works. This is a system of effects diagram. And if we look over on the top perceived quality, we started with a problem-
Gene Kim (06:42):
Gene here. First off, it's so wonderful to hear Elisabeth tell again, that story of the Steamer round table that she talked about in the last episode. I can't get enough of that story. Also, when Elisabeth mentioned the system of effects diagram, I laughed because I think so many of us in the DevOps community did something similar. 15 years ago, Kevin Behr, George Spafford and I built similar diagrams trying to understand how and why operations and security tended to work very poorly together. More recently, Damon Edwards created a similar diagram showing how outages often would cause operations to do things that actually made future outages worse and more frequent.
Gene Kim (07:22):
The system of effects diagram, or sometimes called causal loop diagrams, are one of the most famous tools in the system dynamics toolkit. It's a powerful tool to uncover why systems behave in unexpected ways and often leads to very powerful insights. I will say from personal experience, that these diagrams are very fun to make, but when you have pages and pages of them, and you're trying to walk people through them, people will often lose patience. Don't do that. I will put the diagram that Elisabeth showed in the show notes, as well as where to find more information on system dynamics.
Elisabeth Hendrickson (07:54):
-perceived quality was too low. Management at that point, gets to make a choice. And that's what the little flaggy things mean. And if you look on the right hand side, it says QA team test effort. So one choice that management can make, you've got low quality, we'll throw a bunch of QA people at it. That'll fix it. Remember this was 2000-ish, 1999-2000. It was totally the accepted norm. That's how you fix your quality problem. You throw a bunch of testers at it.
Elisabeth Hendrickson (08:23):
So we tested away. We found a bunch of bugs. We were doing our jobs. This is good. We fixed a bunch of bugs. Now, theoretically, that means that we're shipping fewer bugs, we have fewer unknown bugs. And the bugs that we choose to ship are known quantity. We've made a rational decision about those. So theoretically, if we follow along with the system effect diagram, our perceived quality should go up. Sounds like a great theory, right?
Elisabeth Hendrickson (08:49):
Anybody at a company currently operating under this theory? Oh good. Because you were, I have really bad news for you. There's another effect within the system, and this was one that I hadn't even thought about until this moment when I realized that our quality is getting worse. And I'm putting that together with the Steamer round table from the previous year. And I realized that up until I got there, the developers had felt responsible for doing all of the testing because the test effort had been so weak. And the more they trusted me, the less testing that they did. So no wonder we were finding more bugs. It was a target-rich environment. So we thought we were doing great, because we're testing, we're finding all these bugs, we're fixing all these bugs. And what we didn't realize was our very existence increased the number of bugs that there were to find. This is pretty depressing, right? Okay. The talk gets better from here.
Elisabeth Hendrickson (09:52):
Okay. All right. So that was the flashback. That was the digression. Now let's talk about feedback because this talk is about care and feeding of your feedback cycles. Feedback is actually a super simple concept. You do a thing and then you observe and you see the effect of your actions. So feedback is the information that you get by observing the effect of your actions. Make sense, right? Okay.
Elisabeth Hendrickson (10:20):
We have lots of fancy ways of describing this. We've got the Deming Cycle, Plan-do-check-act. We've got the OODA loop, observe, orient, check, act. All of these are simply feedback cycles that boil down to do a thing, see what happened. We have more recently the Lean Startup method of build, measure, learn. They're all still just feedback loops.
Elisabeth Hendrickson (10:43):
There are various types of feedback that we might care about as a programmer. Did I do what I intended to do? When I check it in and CI kicks off, did my changes violate any expectations within the code? Exploratory testing is looking for unintended consequences. At Pivotal, we do a whole lot of exploratory testing. In fact, we do so much of it that we have a whole position called Explorer. Did I get the feature that I asked for? This would be the acceptance feedback loop. Stakeholder feedback, are we headed in the right direction? And ultimately the users or the customers, are we producing something of value?
Elisabeth Hendrickson (11:23):
And as we think about all these different types of feedback, they take different time, different amounts of time. And so I think of this in my head, kind of like a series of concentric circles. The very fastest loops are at the developer's workstation, where they're running local tests. And if you're practicing test driven development, that's a whole feedback loop in itself because you set an expectation, you set an intention, you express that intention as an expectation in an executable test, one little itty bitty, tiny test. You then execute that test, see that it fails, write the code that makes it pass, watch it pass, and then refactor the code. And that's one loop. And it should be relatively tight, just minutes.
Elisabeth Hendrickson (12:05):
I've had, by the way, QA professionals ask me, "How do I insert myself in that cycle?" I said, "Pair with the developer." Because that's the only way you insert yourself in that cycle. They're not going to ask you to write a test and then ping pong it back to them over 50 feet of cue balls.
Elisabeth Hendrickson (12:19):
Okay. So as you think about the feedback cycle times, you'll note that there's this notion of latency. So let's cast our minds back to that day when I realized that everything that I did was actually making things worse for the company. We were following a pretty traditional model. There was some market analysis, so we had product managers who would write these big MRDs Market Requirement Documents. Then the developers would go off and design some stuff and then they would implement some stuff. And then they would hand it over to QA to stabilize. This was the traditional model at the time. I realized nobody actually does this anymore.
Elisabeth Hendrickson (12:58):
So as we're analyzing, the thing is, we're speculating. We're speculating that we actually identified the right problem to solve.
Elisabeth Hendrickson (13:03):
... by saying the thing is we're speculating. We're speculating that we actually identified the right problem to solve. We're speculating that we got the requirements, right, that we understand the actual constraints that we need to solve too. Then we designed some stuff, it looks great on paper. Awesome. PowerPoint architecture always works. Then we go off and we implement. We're still speculating because even though we're learning a lot about the extent to which the PowerPoint architecture didn't actually work, what we're not learning is whether or not the implementation matches the expectations. So then we get into test and we start to learn a little bit and ultimately the rubber hits the road when we ship and customers tell us whether or not what we shipped was crap.
Elisabeth Hendrickson (13:41):
The area under that curve is risk. This is why it didn't work. Worse, when I realized that I was doing my company a disservice, not only was participating in a system that increased risk, I was also driving requirements sideways through the organization because with every single bug that we filed, we were making an assertion about the actual requirements. Only those assertions were based more on opinion than on extensive market research, because we were QA, we didn't go do market research. So this is why this doesn't work.
Elisabeth Hendrickson (14:16):
I think of this kind of like Schrodinger's cat. So this is a thought experiment in quantum physics that says, all right, hypothetically, you stick a furry critter in a box. The box is sealed, the box is not clear like this, it's opaque, you can't see in the box, you have no idea whether the animal is alive or dead. This is all theoretical. It was a thought experiment that Schrodinger wrote to Einstein as they were debating effects of quantum mechanics. So then you hook this up to a valve, the valve is controlled by something that's got a 50/50 shot of going off or not, decaying radioactive atom, and if it goes off, then the cat's dead because the poison gas is released. And if it doesn't, then the cat's fine.
Elisabeth Hendrickson (15:01):
And the punchline to the Schrodinger cat thing is the idea that the cat is neither alive or dead until you open the box. And in that moment when you open the box and the probability wave collapses, and you know whether or not the cat's alive or dead, this is pretty much what we were doing with releases. So we had the Schrodinger release. We had no idea if it was alive or dead, until we actually opened up the box, talked to the customers and found out how much they hated us. There was one morning at that startup when I came in to work and I swear I could hear the screams of pain coming from support. It was sad. It was very sad.
Elisabeth Hendrickson (15:37):
All right, so theoretically agile is going to eliminate all of this risk because we do this in iterations and so, although we may have a little bit of speculation building up, it never actually gets to build up completely. So theoretically this is eliminating all of our risk that only works if we're actually doing agile. If we're doing fragile, which is where you have iterations but you don't actually finish everything in the iteration, you know, you're going to save that performance testing until later. You're going to save that full system and testing because it takes a long time. You're going to save that to the end, but everything else is tested, so it'll be fine, right? That's speculation, which means that we've got speculation buildup, which means that over time, doesn't that curve look awfully familiar?
Elisabeth Hendrickson (16:21):
That's why fragile and waterfall have the same exact characteristic with respect to risk. Only you've actually taken away all the controls that made waterfall kind of work sort of maybe, so now you don't have the controls in place at all and so it's probably actually worse. But, this is not supposed to be a depressing talk, so I'm going to move on.
Elisabeth Hendrickson (16:43):
I really wish that I could have found a way to make this a picture of a squirrel because then I could make this degression about squirrel. However, instead I want to make this digression about fruit flies. Anybody know why you use fruit flies in scientific experiments?
Short life cycle.
Elisabeth Hendrickson (16:58):
Eeny bitty, teeny tiny, short lifespan, right. You get lots and lots of generations. Super awesome. You can have school kids do genetic experiments, yay. So, feedback latency, eeny bitty, teeny tiny life cycle. You get more turns of the crank. Ultimately, that's what I learned out of all of this, is how critical it is that the feedback cycle, that the feedback latency not be so large. And at that startup back in 1999, our feedback latency was excruciatingly large. We were proud to get our regression cycle down from six weeks to two weeks. These days I'm complaining at my development team, that it takes hours upon hours to do the regressions. But they still finish in like, you know, total runtime of something like eight hours. And if we had had an eight hour fully automated regression cycle in 1999, I would have thought I was amazing. And that the team was amazing.
Gene Kim (18:05):
Gene here. I think it is so interesting what Elizabeth just said. In 1999 she was so happy when they reduced test time from eight weeks to two weeks and yet when she was presenting in 2015, she found it so frustrating that the test cycles was taking eight hours. It just shows that the best are always getting better and that the high performers are always accelerating away from the herd. Now we are going to go on a little detour into Elisabeth's 2014 presentation, where she talks about the phenomenal technical practices when she was managing Pivotal's Cloud Foundry project. She describes several technical practices that I think are just fascinating.
Elisabeth Hendrickson (18:43):
All right, so this is our team room. Creating visibility around all of these feedback cycles is very important to us. If you look above the word visibility, you'll see a bunch of monitors. On the far right hand side of your screen you'll see something that looks a little bit Christmassy. We've got some red and some green going on there. We use an open source build display thing, utility called Check Man. One of our engineers wrote it. It's open source, you can totally use it, it allows us to compact a whole lot of builds onto a single page.
Elisabeth Hendrickson (19:16):
We have hundreds of builds at this point across all the teams. And so these are separated out, each team gets a monitor for their builds. There are some monitors back there that are really, really hard to see, but they're showing the Datadog metrics from our acceptance environment, our development test environment, and then on the other side of the wall where you can't see it, there's our production environment. So we are literally surrounded by information as we're working.
Elisabeth Hendrickson (19:47):
So, how do you figure out what are sufficient feedback loops? And here's a simple recipe that I would recommend that you try. Now, this is a recipe, it's not a prescription. As a recipe you're going to have to decide what the proportions of ingredients are right for your particular environment, but it's really checking, exploring, and then releasing. So you want to make sure that you have automated checks for all of the expectations in your system. Code level expectations, system level expectations, integration, level expectations. Any part of the system, that's responsible for a thing, you want automated checks that it does that thing.
Elisabeth Hendrickson (20:22):
Then you want to explore to discover risks. So you've got all of those checks going all the time. Oh, by the way, and you stop the build on red, right? So the build goes red, it doesn't stay red for six weeks. While it gets punted between three different teams, your fault, no, your fault, no, your fault. We get that back to green as a top priority. And then even if you cannot release for whatever reason, actually going through all of the motions for everything except for give it, put it into the hands of the customer, that rehearsing of the releases is crucial to making sure the actual release goes very, very smoothly. And as you're doing this, you want to be tightening those feedback loops.
Elisabeth Hendrickson (21:00):
So when I joined cloud Foundry in fall of 2012, the process that was in place then was very different from the process that we have ended up with now. At the time, there was Garrett for automating that code review, but the pivotal way is to pair on everything. So there's always two sets of eyes on every line of code that gets committed. And we prefer that to using Garrett or a code review mechanism. I'm not saying you should prefer that, I'm saying we do. Because I realized that there's a debate out there and that it's a religious issue for some people, and I'm not trying to offend your religion, but one of the things about the way that Garrett was implemented in this environment was that one engineer once told me that she spent an entire week just trying to get a simple, single change into the code base.
Elisabeth Hendrickson (21:49):
This is a highly qualified, presumably highly paid engineer who had the frustrating and soul crushing experience of not being able to get a simple, single change into the code base because the Garrett process was you submit your thing and then it has to get plus two'd and the only people who have the plus one ability were the very senior people. The very senior people didn't care about her particular productivity, didn't particularly care about her fix, had their own set of things that they were responsible for and would take days to even bother to look at her particular check-in.
Elisabeth Hendrickson (22:27):
Now, during those days, of course, other people's check- ins were getting merged. And so her days looked like, get the latest stuff, merge, rebase, test and then resubmit. That was a week of her time. So one of the first things that we did was to get those kinds of wait States out of the system. We dismantled and I'm really sorry if I'm offending your particular religion, but we did totally dismantle the Garrett check-in process and instituted a, you can either get a code review, but we're not going to manage that through Garrett. You can either get a code review with somebody or you can pair, and then that quickly turned into you can pair, that's your option. You can pair to get code into the system and still have it code reviewed.
Elisabeth Hendrickson (23:16):
We had, at the time, there was a QA process that involved a QA team that was off shore. Still part of the company but they were somewhere else, many, many, many times zones away. They were really good and I could tell they were really good because they would run the tests and then report back, not only with the bugs, but having isolated the bug down to the line of code that was the problem, but they were not empowered to fix it. So they would be lobbying for weeks at a time. "Dear engineering team, this set of tests failed, here's the problem area?. Could you please prioritize the fix so that we could get back to green." And the engineering team didn't have direct visibility into it, it wasn't in their face, they didn't care, nobody was listening to the QA team. So it was too far away. This is why we moved to teams are responsible for their own quality and there is nobody else who is going to test this for you.
Gene Kim (24:09):
Gene here. I love these two stories so much because it deals so much with structure. The first story is of a junior engineer who must get plus one code reviews from distant senior engineers who don't view it as very important that they get feedback to that junior engineer quickly. The structure of the remote QA teams who were not viewed as important enough to have their issues responded to quickly. I think both of these stories show so well how we must create lines in a way so that all components in the system, and all people in the system, can get what they need naturally and quickly.
Elisabeth Hendrickson (24:41):
So, our process, we work in small pieces stories, take a few days to implement absolute worst, like a week's worth of stuff. We tend to split stories when they get that big, we found and squashed all of the wait States in the process. The end result was that we were able to go from a given set of changes going through the system, took like weeks, to given set of changes going through the system, took like days. And we're now at the point where we can do it in hours.
Elisabeth Hendrickson (25:10):
Parallelize all the things. We put a lot of energy into being able to parallelize our deployment as well as parallelizing our tests, both. Taking the time to remove duplicate tests. The test suite that the QA engineers that were off shore, that they had written and that they were running, it wasn't a bad test suite, but it took a lot of hours to run and a lot of the tests actually tested the same thing. They didn't look like it on the surface, it takes work to go through and curate, but taking out all of those duplicate tests paid off in spades. Our acceptance test suite now runs in 10 minutes.
Elisabeth Hendrickson (25:46):
And then finally, part of what we did there was to take tests that were being done at the system level and they really reflected responsibilities that were at the unit level. And so, somebody somewhere coined the term, the Tetris game of testing, where you're pushing it down to the lowest level. If you've ever played Tetris, you want to get your puzzle piece-
Elisabeth Hendrickson (26:03):
Well, you're pushing it down to the lowest level. If you've ever played Tetris, you want to get your puzzle pieces down to the bottom level because otherwise it fills up your screen and then you lose the game. The Tetris game of testing or the Tetris principle of testing is to drive your tests down to the lowest possible level so that they will run as fast as possible.
Elisabeth Hendrickson (26:19):
Okay. And now, here's where I'm going to let you in on the secret sauce of this entire conference. I don't know if anybody has shared the secret with you yet, but the thing about feedback is that if you look at learning models, learning models, involved, do a thing, see what happens, integrate that new knowledge, get a new idea, integrate that new knowledge, do another thing. It's a feedback cycle, right? It's kind of like plan, do, check, act. Except it's do, observe, explore or Kolb's learning cycle, which is concrete experience, reflective observation, notice what happened, notice how you feel about what happened, abstract conceptualization, consider something new and then active experimentation.
Elisabeth Hendrickson (27:03):
This learning cycle is just another kind of feedback cycle, which means the better you get at feedback in your organizations, not only are you able to release software faster, you're creating a learning organization. And that ultimately is how you retain, you remain competitive in this rapidly changing world. That's my big ta-da.
Gene Kim (27:24):
Holy cow. Kolb's learning cycle. How did I miss that in all the times I've watched this talk? As you may recall, one of the things I set out to achieve in my quest of which this podcast is a part of, is to understand why organizations behave in the way they do. And what we really mean when we say that we have a learning organization. Holy cow, I totally missed this reference to Kolb's learning cycle. I'm definitely going to be reading about that tonight. Thank you again, Elizabeth. All right. Let's go back to Elizabeth 2015 presentation and pick up where she started talking about how teams branched and merged.
Elisabeth Hendrickson (28:03):
Then there's this merge feature branch-like thing, then more unit tests, then more system test. And this picture here is actually a little bit simpler than it was. I didn't go into all of the nooks and crannies of the process.
Elisabeth Hendrickson (28:15):
We went from that and it took like days to weeks to get a change in and then get feedback on it. Real, honest to goodness feedback. By the way, there is a difference between feedback and opinion in the way that I'm using it. An opinion is merely speculation. And remember what happens when you have speculation buildup? Big risk curve thing, right? That's, we're here when I'm talking about real feedback, it's empirical evidence, because empirical evidence trumps speculation every single time.
Elisabeth Hendrickson (28:47):
After we simplified, the way we did code review is because we were doing a very strict form of extreme programming. We paired on all the things, so pair is a living code review all the time. We're pairing, we're running the local tests before we check in. We check in, when we check in, we check in directly to master. We weren't doing feature branches at all even.
Elisabeth Hendrickson (29:08):
And then, see I picked it up and ran all of the tests, not just the ones that could run quickly on your local machine. And the end result was that even if, even when the system grew so that it was a sufficient level of complexity, that the system tests took a couple hours, still it's two hours instead of a week. That's tremendously driving down the time in the feedback cycle.
Elisabeth Hendrickson (29:31):
More recently, in another part of Pivotal, we are shifting to tighten up feedback cycles in a way that frankly, I've been a little surprised by what I've learned.
Elisabeth Hendrickson (29:43):
We had very long-lived team branches on one of our other products, but by which I mean like two months. A team goes off and develops a way for a couple of months. This was their history. This is how they used to work. And it felt great at the time, right? Like we get to ignore all those other people. We're so productive. We're writing code. We control our destiny. We're in our branch.
Elisabeth Hendrickson (30:06):
And then, it comes time to merge. You all know what happens then, right? Everything comes to a screeching halt for like four weeks while we try to get everything slammed together and make things get back to green.
Elisabeth Hendrickson (30:18):
This was not actually nearly as effective as it felt at the time, but it feels so good for just that, those few weeks when you have the illusion of progress. And so instead, we're moving to short-lived feature branches. And the end result of that is that we're getting these changes in much faster, which means we're able to get real feedback on those changes. Now, we've tightened up those feedback cycles and ultimately again, remember the fruit fly. Now, we've got much shorter-lived branches. We get more turns of the crank. We get more opportunities to fix things. We get feedback faster.
Gene Kim (30:55):
This story is so great. I told you in the last episode about the catastrophe that occurred when our CruiseControl build server went down and our code merge times went from taking one week to taking over six weeks. But that was over 15 years ago.
Gene Kim (31:10):
I've had some far more recent experiences that reinforces [inaudible 00:31:13] less than I'm needing to merge code frequently. I work on a lot of solo coding projects where I'm the only person working in the code base. Of course, it's always in my favorite programming language, Clojure. I'm often working on numerous features at a time and I've found that I can't even merge changes with myself. There have been numerous instances where it was actually easier for me to retype my changes in from scratch, than figure out how to resolve a merge conflict. This would invariably happen on feature branches where I forgot to merge my changes weeks, or maybe even months before.
Gene Kim (31:46):
To me, it just shows that it's an utter illusion, that you can have tens or even hundreds of developers working on long-lived feature branches and merge those code changes together without causing utter chaos and disruption. In fact, one of my favorite scenes in The Unicorn Project was exactly this. Maxine, watching and total horror as the teams try to merge over 300 changes that have accumulated over a quarter, resulting in three days of carnage. And indeed, in the state of DevOps research, trunk-based development was one of the top predictors of performance. Back to Elizabeth.
Elisabeth Hendrickson (32:20):
One thing that I have had to learn about though, is that fast feedback doesn't do you any good if it's a completely polluted feedback stream. False alarms, false failures, cause people to lose faith in the system. And that ultimately can lead to a series of dysfunctions, including the pattern of, "Oh, it's red again, just kick off the build, I'm sure it's fine. Let's ship anyway." You ever seen that? Sad, right? Okay. Distortions, remember that feedback is not the same as opinion.
Elisabeth Hendrickson (32:52):
I was actually distorting, my team was at that little startup, was actually distorting the feedback stream with a fair amount of opinion. And then finally, entirely broken feedback loops, like if you don't have a solid feedback loop from support, this is something that we're also working on at Pivotal to make sure that we've got that feedback loop so tight that we get that feedback from our customers, not just when there's an escalation, but also to start to mine the incredible gift of information that is in every single support case. Getting that back into the system so that we can make sure that we're acting on that and steering, using that, making sure that your feedback streams are not polluted, super critical.
Elisabeth Hendrickson (33:40):
Another digression, this time, I can't say squirrel, but I can say hamster. Those are little hamster erasers. They're super cute. Once upon a time, before I joined Pivotal, I ran a little consulting company. I was running Quality Tree Software, Incorporated when I wrote that little paper that Kim saw. Or sorry, that Gene showed.
Elisabeth Hendrickson (34:02):
And at the time my job was get on airplanes and go help people. And I would run this simulation to help them as they were transitioning to agile called Word Count. And in it, it becomes a microcosm for an agile transformation. I set up the room at the beginning of the day with four areas, one for the testers, one for the developers, one for the product managers, and then a funny little area for the computer, which is played by human actors, executing written instructions that are written by the developers.
Elisabeth Hendrickson (34:29):
And so, this is this little simulation. They are not allowed to talk in the beginning of the simulation. They can only communicate through the interoffice mail courier, who is another person in the org, in the simulation who walks around with little envelopes and passes messages between the tables. As you can imagine, the effect of running this simulation, it tends to resemble the company that brought me in as a consultant as they run it. And there's much laughter because in fact, it resembled reality for a lot of those companies where their developers and their testers and their product managers didn't actually talk except through email.
Elisabeth Hendrickson (35:07):
As I would run this simulation, in the first round, they have to follow these rules. In the second round, they get to make up their own rules and then we would do four rounds and there would be plenty of opportunity to reflect and adapt.
Elisabeth Hendrickson (35:19):
One of the first adaptations that a lot of the... I ran this 150 times. One of the adaptations that most groups would adopt very, very early on is first of all, to do away with the interoffice mail courier, they were almost always fired in round two.
Elisabeth Hendrickson (35:35):
I only had one company that continued to employ their interoffice mail courier throughout the entire four rounds. They had tremendous difficulty making money in this, it's fake money, but in this simulation. The other thing that would frequently happen is they would realize the power of visibility. The people in the simulation would start to put all the artifacts up on a board where everybody could see everything all at once. And then, you'd start to see the division between the teams melt. And then, you'd start to see them really crank and able to turn out value very, very quickly.
Elisabeth Hendrickson (36:12):
We see something similar. At Pivotal, in the parts of the organization, Pivotal, you have to understand is the world's biggest startup. We're 1700 people. We were only two-and-a-half years old. Imagine forming a company of 1700 people with only two-and-a-half years of shared history. It's pretty amazing how well we've done.
Elisabeth Hendrickson (36:33):
And one of the things that we did on the Cloud Foundry team, they created this thing called Concourse. And what you're seeing here is Concourse. It's a CI system, it's open source and it gives you a tremendous view of your entire pipeline. Because the software we work on is the software that we work on on our products, it's all enterprise software, massively complex pipelines. By just being able to visualize our pipelines and see red in a different way than just having a Jenkins build, shows it to us is, is tremendously powerful. We're adopting this elsewhere within the organization.
Elisabeth Hendrickson (37:09):
The other kinds of things that we're getting visibility on are all of our escalations for all of our data products are now in a single dashboard. And by having them all in one place, it's prompting conversations that nobody had ever had before around, wait a minute, what exactly does constitute an escalation? What isn't is not. And so just by making things visible, we're having better higher quality conversations that are improving our ability to serve our customers.
Elisabeth Hendrickson (37:42):
We start to get to the takeaways portion of this talk. One of the things that I've learned is that without extreme effort, you're going to suffer from feedback entropy. I view this as anything that's causing feedback to break down. One thing is that test suites and build times therefore are going to grow unbounded.
Elisabeth Hendrickson (38:02):
As one of our engineers says, "You do realize that test suites are an append only thing, right? Nobody ever goes back and takes tests out." And that means that the test time left unchecked, it will just continue to grow unbounded and you'll end up with thousands and thousands of tests and nobody actually knows what value those tests have. But dang, they take 12 hours to run every single time and we've got a policy that says we have to run and have them green before we can ship.
Elisabeth Hendrickson (38:31):
Left unchecked, test suites and build times will therefore grow unbounded. The feedback will become polluted naturally. It's something you have to pay a lot of attention to. Opinion will start to creep in, information will start to get dropped. If you don't take care of those tests and the build and all of the CI infrastructure, you'll have false failures. And it's too tempting. It's so tempting to just say, :Oh, it's fine. Just run it again. It'll be fine." Because it actually takes real effort and dedication and commitment.
Elisabeth Hendrickson (39:03):
Because it actually takes real effort and dedication and commitment to go in and say, "Wait, why did this thing fail? Okay, it shouldn't have failed for that reason, I'm going to go fix it right this time so that it stays fixed." That takes, the willingness to set aside whatever else might've felt like a high priority, to treat fixing the infrastructure as a high priority. And that's a level of discipline that a lot of organizations don't have and that I can appreciate how difficult it is, because every single day I end up making a decision like that. No matter what else was important from a product facing standpoint, if we don't have clean CI runs and we don't have clean infrastructure, we can't trust our feedback.
Gene Kim (39:43):
Gene here, I love what Elizabeth just shared about the need to invest and groom those automated test suites. One of my favorite statistics is from Google, in 2013 they had 15,000 engineers, they were committing 5,500 code permits per day, they were running 75 million test cases daily. And the question is, why did they do that? And there's a famous quote from Aaron [Masery 00:40:07], he said, "It is only through automated tests that can transform fear into boredom." But it's such an astounding figure because those tests are not free. So I'm going to have to write the tests, you have to run the tests, which consumes power, generates heat, which drives up costs.
Gene Kim (40:22):
So those 75 million test cases represent the enormous commitment that Google has made to provide fast feedback developers so that they can be productive and be able to deliver high quality code to their customers. So what is also very, very amazing is that in 2017, Google had 30,000 developers, they're committing 45,000 permits per day, and they're running 150 million test cases per day. So just another example of how the best are getting better and that the best are always accelerating away from the herd. Back to Elizabeth.
Elisabeth Hendrickson (40:56):
And ultimately, it is so tempting to seek the illusion of speed over real progress. The temptation to say, "I know how we're going to go really fast, we're going to ignore everything else. We're going to ignore everybody else. We're just going to work on our team branch, because that way we don't have to worry about these painful merges. We're just going to get our feature done." It's not actually delivering value until it's in the shipping product, but it's so hard to remember that when it's feeling so good to write code. So it takes such effort, fighting that feedback entropy takes an enormous amount of energy and dedication and cannot be underestimated. One of the other things I've learned is that meetings are easy, by the way I broke the rules, I don't have five takeaways, I have six.
Elisabeth Hendrickson (41:47):
Meetings are easy and getting real work done is hard. So at pivotal, if you want to get anything done, it has to end up in a team's backlog. And early on in my tenure there, we would have these meetings about how were going to tighten the feedback cycles or improve the test suites or whatever. And the meetings felt awesome because I'm working with incredibly smart people who are all incredibly test obsessed, and they care deeply about the feedback cycles, they care deeply about CI, and we'd have these fantastic meetings and we'd whiteboard out all the things, and then nothing would happen. And I came to realize that if you want something to happen, it has to end up in a backlog, and so the action items to come out of that meeting instead of just being action items, because nobody ever did the action items, they would just... We made lists of action items, it's not like we were just talking at each other.
Elisabeth Hendrickson (42:35):
The action item had to be, "All right, who's going to put this item in, in which back log?" Because then ultimately it becomes a discussion with a team and the product manager to figure out how to prioritize that work, but if it doesn't end up in a backlog, it's not going to happen. So, it's so easy to have a great meeting and so hard to make something actually come of that within your organization structure, whatever that is, you have to figure out how do we make sure that we've made the commitment to actually do the work
Elisabeth Hendrickson (43:04):
On both Cloud Foundry and on the data teams, we've now built tool smith teams. And one of the lessons that I've learned is that the tools that these teams are building, they are foundational. So there was a wonderful blog post about, "Let 1,000 flowers bloom, and then rip out 999," from [gigamonkeys 00:43:24] who is at Twitter and I forget his real name and I'm really sorry, and if you're in the audience, I'm going to be incredibly embarrassed about that. But, oh my goodness, what a wonderful, wonderful blog post about their engineering effectiveness organization and how they thought about engineering effectiveness and the joy of coding. We had similar challenge and we have a similar process for making sure that we have tools that make a developer's life easy. And what we figured out is that the team that makes those tools had better be some of the best developers in the organization. And it's so tempting to put the new grads on that team to say, "That's an easy on ramp into the organization." It's so tempting to say, "It's just a script. It's just Python," right? That's so tempting, but the fact is that these are foundational. This is the foundation on which the rest of your software is built. And if that foundation is shaky, then that means that the software you're building is going to be even shakier. Visibility turns out to be more important than I ever even imagined, and just getting visibility into things can cause change. Figuring out where to draw the lines in the system, this requires just a teeny bit of explanation. So, if you're writing a class you have to figure out, "Os this one method or two? Do I extract this out into a private method that's a helper method to this thing?" Even if you've never written code, you can appreciate how this might be an interesting challenge. How do I draw the lines there?
Elisabeth Hendrickson (44:56):
Then there's, if you're doing architecture at an architectural level, "Is this one module or two? What should their responsibilities for this module be? Should this module take responsibility for this other stuff?" That's at an architectural level. At an organizational level, "Should this be one team or two teams? Should we have centralized QA or should we have all of our QA people be part of our development teams? Or should we just not have anybody with that title whatsoever?" These are all decisions in which we decide where in our organization, our system, our code to draw the lines. And it turns out that one of the hardest things about designing anything is figuring out where to draw those lines. And by experimenting on where should the lines be, you can find new perspectives and also transformational ways to shift your organization. And that's one of the things that we've been experimenting with, is where to draw the lines.
Elisabeth Hendrickson (45:56):
Thus Cornelia, who's sitting in the second row over here, introduced me to someone today as, "Yes, she's the person who came onto Cloud Foundry, is the Director of Quality Engineering and promptly got rid of all of QA," which pretty much sums it up. There was a little bit more there to the story that I'll skip for now, but that's an example of drawing the lines. And then finally, I always knew that branching was a little bit painful, but oh, branching hurts productivity far, far more than I ever thought. So those are my takeaways, that's the end of what I came to say. Almost. Sorry, forgot two more things. How to care for your feedback cycles, one is to make sure that they're tight, two is to make sure that they're clean, thus the rubber ducky.
Elisabeth Hendrickson (46:37):
And then finally the super secret sauce is that the cold learning cycle is just another example of a feedback cycle. So as you're improving your ability within your organization to turn the crank faster, to have those better feedback cycles, ultimately you're becoming a learning organization. And as Andrew Clay Shafer, who also works for Pivotal and is known as, Little Idea, on Twitter, as he says, " If you're not becoming a learning organization, your company will be beat by somebody who is." So, there's no failure, there's only learning, and that's what I came to say.
Gene Kim (47:11):
I hope you enjoyed Elizabeth's talk as much as I enjoyed re-listening to them. In the next episode of the Ideal Cast, I interview a mentor of mine who I've mentioned so many times already in this podcast, Dr. Steven Spear, author of the High-Velocity Edge, a senior lecturer at the MIT Sloan School of Management, and another person who has profoundly shaped my thinking over the last seven years. This episode is brought to you by the 2020 DevOps Enterprise Summit, London, which will be a virtual conference due to the global pandemic. For seven years we've created the best learning experience for technology leaders, whether they're experience reports from large complex organizations, talks from the experts we need, or through the peer interactions that you'll only find at DevOps Enterprise.
Gene Kim (47:59):
At the time of this recording, I'm busy prerecording all of the exciting speakers for the conference. I'm so excited at the amazing speakers we've got lined up for you. Some of the exciting experience reports include executives from Adidas, Swiss Re, Nationwide Building Society, Maersk, CSG, Siemens, and so many more. Also speaking is Coats, which manufactures fibers and threads, which was founded in the year 1755. I'm also so excited that we have speaking, Peter Moore, who you heard in episode one of this podcast, teaching us about Zone to Win, David Crossman, a coauthor of the book Team of Teams, Dr. Carlotta Perez, who you've heard me quote so often in the last few years, and John Allspaw who helped form the DevOps movement. I'm super excited about the high learning and networking event that we've created for you, which I'm hoping will be an incredibly valuable and fun way to learn, so different than the endless video conference calls we've all been stuck in for weeks. To register, go to events.itrevolution.com.