The Surprising Implications of Architecting for Generality
With Michael Nygard
On This Episode
On this continuation of Gene Kim’s interview with Michael Nygard, Senior Vice President, Travel Solutions Platform Development Enterprise Architecture, for Sabre, they discuss his reflections on Admiral Rickover’s work with the US Naval Reactor Core and how it may or may not resonate with the principles we hold so near and dear in the DevOps community. They also tease apart the learnings from the architecture of the Toyota Production System and their ability to drive down the cost of change.
They also discuss how we can tell when there are genuinely too many “musical notes” or when those extra notes allow for better and simpler systems that are easier to build and maintain and can even make other systems around them simpler too? And how so many of the lessons and sensibilities came from working with Rich Hickey, the creator of the Clojure programming language.
About the Guest
Michael Nygard strives to raise the bar and ease the pain for developers around the world. He shares his passion and energy for improvement with everyone he meets, sometimes even with their permission. Living with systems in production taught Michael about the importance of operations and writing production-ready software. Highly-available, highly-scalable commerce systems are his forte.
Michael has written and co-authored several books, including 97 Things Every Software Architect Should Know and the bestseller Release It!, a book about building software that survives the real world. He is a highly sought speaker who addresses developers, architects, and technology leaders around the world.
Michael is currently Senior Vice President, Travel Solutions Platform Development Enterprise Architecture, for Sabre, the company reimagining the business of travel.
You’ll Learn About
- Building great architecture for generality.
- Admiral Rickover’s work with the Naval Nuclear Reactor Core.
- Architecture as an organizing logic and means of software construction.
- Toyota Production System’s ability to drive down the cost of change through architecture.
- Clojure programming language.
- Cynefin framework
- Failure Is Not an Option: Mission Control from Mercury to Apollo 13 and Beyond by Gene Kranz
- “Why software development is an engineering discipline,” presentation by Glenn Vanderburg at O’Reilly Software Architecture Conference
- “10+ Deploys Per Day: Dev and Ops Cooperation,” presentation by John Allspaw
- “Architecture Without an End State,” presentation by Michael T. Nygard at YOW! 2012
- “Spec-ulation Keynote,” presentation by Rich Hickey
- re-frame (re-frame is the magnificent UI framework which both Mike and I love using and hold in the highest regard — by no means should the “too many notes” comment be construed that re-frame has too many notes!)
- “Fabulous Fortunes, Fewer Failures, and Faster Fixes from Functional Fundamentals,” presentation by Scott Havens at DevOps Enterprise Summit Las Vegas, 2019
- “Clojure for Java Programmers Part 1,” presentation by Rich Hickey at NYC Java Study Group
- Simple Made Easy presentation by Rich Hickey at Strange Loop 2011
- Love Letter To Clojure (Part 1) by Gene Kim
- The Idealcast, Episode 5: The Pursuit of Perfection: Dominant Architectures, Structure, and Dynamics: A Conversation With Dr. Steve Spear
- LambdaCast podcast hosted by David Koontz
Gene Kim (00:00:00): This episode is brought to you by IT Revolution, whose mission is to help technology leaders succeed through publishing and events. You're listening to the Idealcast with Gene Kim brought to you by IT Revolution. The last two episodes were with Mike Nygard, senior vice president of Enterprise Architecture & Platform Development at Saber, and whose work I so genuinely admire. That first episode was an interview I did with him. The last episode was Mike's 2016 DevOps Enterprise Summit Presentation, where he talks about maneuverability and how to get team of teams working towards a common objective. If you haven't listened to those yet, I'd recommend you listen to those first because this is a continuation of that first interview.
Gene Kim (00:00:56): Today, we discuss his reflections on Admiral Rickover's work with the US naval reactor core and how it may or may not resonate with the principles we hold so near and dear in the DevOps community. We talk about and tease apart the learnings from something I recently learned from Dr. Steven Spear about the architecture of the Toyota Production System and their ability to drive down the cost of change. We talk more about the characteristics of great software architectures. Specifically, I asked him to help me understand further the amazing example he gave in that first interview.
Gene Kim (00:01:32): How can we tell when there are genuinely too many musical notes to quote a phrase from the movie Amadeus or when those extra notes allow for better and simpler systems that are easier to build and maintain and can even make other systems around them simpler too? And how so many of the lessons and sensibilities came from working with Rich Hickey, the creator of the Clojure programming language. As with every one of these episodes, I've listened to it many times because I was so dazzled by the insights. And several passages I had to listen to many more times so I could convince myself that I actually understood what Mike was saying. Okay, let's jump in. We start as Mike and I discuss his reflections on the episode I did with Steve Spear on Admiral Rickover and the US naval reactor core.
Michael Nygard (00:02:30): I did listen to the first of your two episodes with Steven Spear. So I understand a bit more about what you were saying with structure and dynamics. I'm really enjoying it. I want everyone in my company to listen to that. Particularly the idea of emitting signals to allow coordinated action without requiring micromanaging every detail. That's really good. Then also you started talking about team of teams and I'm like, "This is the exactly the situation we're in." What was it? Like 72 hours from sighting to-
Gene Kim (00:03:06): [crosstalk 00:03:06]-
Michael Nygard (00:03:06): Yeah. That doesn't work. I also picked up several books about Admiral Rickover. I had been aware of who he was and that he did remarkable things, but I didn't know anything about him or the specifics of how he did it. I'm finding that pretty interesting. In some ways it's counter to the idea of DevOps because Rickover wanted everything solved in advance.
Michael Nygard (00:03:34): There's a story in one of the books about him having a bundle of envelopes with a rubber band around it. And when somebody came to him and described a problem that one of the nuclear subs was having in the Bering Sea or someplace like that, he went to his desk, went to one specific compartment in his desk, pulled out one specific envelope and gave it to his subordinate and said, "Tell them this." And it was four words written in there that solved the problem.
Michael Nygard (00:04:06): Rickover had worked out that this problem could occur and he'd worked out what the correct solution was years in advance. That's pretty different from the sort of test and learn and trial approach that we tend to take. But there is a commonality in that he didn't allow any defects or work arounds to persist. Things had to be fixed.
Gene Kim (00:04:33): Okay, Gene here. I was so thrilled to hear Mike talk about his reflections on Admiral Rickover and I'm going to jump in to state more clearly what I could not during the interview. After the interview, we both talked about how we were both grappling with to what extent the values that Rickover espoused, are they consistent with or not consistent with what we believe in the DevOps community? There is a 1962 memo from Rickover that Steve Spear showed me. It's a pretty remarkable memo that I'm going to read to you.
Gene Kim (00:05:03): The context is people in The Naval Reactor Organization or NR, granting waivers to their contractors from NR rules, which again, embodied the best understanding of the system as a whole. And it reads, from time to time, I note evidence that NR representatives at field offices, such as a shipyard or laboratory, do not fully understand their primary mission. It is amazing to me how representatives new to these positions uniformly get themselves into the frame of mind, where they conceive of themselves as intermediaries between NR and the contractor.
Gene Kim (00:05:38): That is, that their job is to judge who is right, NR or the contractor, and then make the decision on their own. In many cases, not even notifying NR. In this way, the NR representative then becomes in effect NR's boss. All NR representatives are of course, encouraged to state their views to me at any time, but it is not their job to assume my responsibility. Another and more serious mistake arises when the NR representative decides what he should or should not report to me. Frequently, he decides not to report things to me because he feels he can handle the matter better himself or he is afraid that by notifying me of the situation, which is his job, I will take ignorant, improper action and upset the applecart.
Gene Kim (00:06:28): Nearly all NR representatives have had inadequate experience to handle the important and complex tasks they face. I do not expect them to be able to make wise decisions on all matters by themselves. Under some circumstances, it is better to have no NR representative at all because I would not then be lulled into thinking the NR interests are being taken care of. Please bear in mind always that you are the NR representative. That you are to carry out the policies of NR. That you are not to judge NR or to represent the contractor to NR. To achieve the status of a true NR representative requires the acquisition of godlike qualities, but you can try. Signed H.G. Rickover.
Michael Nygard (00:07:15): Holy cow. It's an amazing memo to me in so many ways: the tone of the memo, the incredulity he has that anyone would take an action that by fierce logical argumentation puts the contractor goals over the NR goals or the NR representatives judgment would be placed over the hard-earned collective wisdom of NR. At times it seems, as Mike says, contradictory to the principles that we love so much in DevOps, but it's difficult to argue that if you want to make the best decisions... Because each decision is informed by all the knowledge of the outcomes of all the decisions made by other people in the organization, which are codified by rules, then we want anything that could improve those rules, put back into the rules. Not corrected or waivered away at the edges.
Gene Kim (00:08:01): To be more specific, I was also feeling conflicted as Mike was. On the one hand, I think it's easy to call the Rickover approach only applicable to the domains of the simple and complicated. Those are the domains of rules and best practice. So of course, I'm referring to the Cyefin framework by Dave Snowden, where he describes four domains: the obvious domain, formerly known as simple, complicated, complex, and chaotic. The obvious and complicated are the domains where rules and practice can be used.
Gene Kim (00:08:33): On the other hand, complex and chaotic are where simple cause and effect rules don't apply, usually require a different mode of problem solving. But I don't think anyone can call the creation of a system that has allowed nearly 20,000 hours of safe and accident-free nuclear operations in a dynamic sometimes near war time conditions merely complicated. It is clearly in the complex domain and to call it anything else would do a grave injustice to that achievement.
Gene Kim (00:09:01): I recently read Gene Kranz's book, Failure Is Not an Option, about his experiences as a mission controller for NASA during the Mercury, Gemini and Apollo programs. And you could definitely see a similar philosophy at work there too. One thing that caught my attention was their continual insistence on resolving all the funnies as in other words, that's funny, why did that happen? For example, anytime there was an unexpected instrument reading or a fault from the computer or telemetry that wasn't there, at the end of the shift they had to resolve all those funnies. They either had to explain it and resolve it or they would assign the funny to the next shift for them to try to explain.
Gene Kim (00:09:41): Across Mercury, Gemini and Apollo, they forced themselves to reconcile their imperfect understanding of the system and make it better. They exposed their ignorance of the system through drilling and simulation as a way to challenge their assumptions. Without a doubt, Apollo was definitely in the complex domain and especially in situations like Apollo 13, it was definitely in a chaotic domain. Reading the book, it's so clear that Gene Kranz viewed as supremely important the three-ring binders at every mission control carried around, which is full of their procedures.
Gene Kim (00:10:14): And yet reading the book, I kept thinking, "Gosh, that sounds like rules and best practices only applicable for the obvious and complicated domains." But Kranz's goal was to make sure that as many of those problems were thought through ahead of time, especially around anomalies and what could have caused them, understanding the faults often revealed dependencies that were unknown and would trigger generating solutions uncovering even more dependencies about what would be required to implement that solution. By the way, I also learned that Apollo 9 was actually a Hail Mary to beat the Russians to the moon. Lunar orbit wasn't actually planned until Apollo 11.
Gene Kim (00:10:53): And in the actual Apollo 11 mission, the simulations team were the unsung heroes. They exposed blind spots and key decisions that were happening too early, each time resulting in all lives lost during the landing. The result of those simulations was always a crash rewriting of those procedures, which was critical to enabling a successful lunar landing. Going back to Rickover, I don't think Rickover is saying that he always knew best, but he absolutely believed that the system knew best. I think Rickover and Apollo shared many of the same principles. In fact, I learned Kranz held very dearly the notion that mission controllers knew the best more than the astronauts and certainly more than manufacturers.
Gene Kim (00:11:33): In fact in space, the edge can't always know best. Those are the astronauts who in an emergency are often under enormous physical strain, overwhelmed as information, sometimes disoriented or almost passed out unable to make sense of their environment. In fact, in the book and the awesome movie, The Martian, the stranded astronaut Mark Watney had to solve problems all the time. And his return to earth was enabled by being able to tap into all the collective intelligence and resources back on earth, which helped him overcome all the challenges required to survive on the planet for over a year and eventually figure out how to return back to earth.
Gene Kim (00:12:11): I think that's what Rickover and NASA was all about, empower the edge with the full support of the core. Which means yes, fix the problem at the edge or make sure that any solution are brought back to the core. Follow the process is the best way we know how to do things. And if there's a better way than the process, improves the process. In short, I think the principles that Rickover held so near and dear are very much applicable to even the DevOps space. In fact, Mike had a thought about that. Back to the interview.
Michael Nygard (00:12:42): We did that in the form of the system automation, right? We have that same kind of belief that you should take the wisdom and codify it in tests and scripts and automation. In some sense, Rickover had and advantage though because the laws of nuclear physics don't change, whereas in the software world we change our laws of physics every few years.
Gene Kim (00:13:08): I'm wondering if it's no surprise that those kinds of strict rules apply kind of to the build, test, and deploy, right? Those more mechanical things where the infrastructure really is more in our control, right? Versus the adversary, which is probably not as easily codified and you can't enforce the rules there. Right? I mean, that seems like that would lead to disaster.
Michael Nygard (00:13:35): Yeah. I also think Glenn Vanderburg did this great talk on what is software engineering really. He determined that the part of what we do that is most akin to engineering is actually the build phase, the construction, the validation and the creation of the artifacts. And if you think about it in those terms, the six, eight, eight class of nuclear submarine all had the same reactor, right? So you could write down rules for what to do with this reactor.
Michael Nygard (00:14:12): Every system we build is different because of the competitive nature of our business. No two companies have exactly the same system. You sort of have to rediscover or reinvent the rules for this company and this company and this company. But the part that is the same is you regard each deployment as a new construction of the same class of system all built at the same shipyard, all built, right?
Gene Kim (00:14:39): Right.
Michael Nygard (00:14:40): Ideally deterministically. But then what that means is the procedures that work for your particular nuclear reactor may not appropriate for someone else's. In fact, they might be actually dangerous. Which is why we sometimes see this hard time of picking up somebody else's methodology or somebody else's deployment tool and just dropping it in because it doesn't fit our reactor.
Gene Kim (00:15:07): Because of the environmental factors of which the reactor exists in. Oh, that's super interesting. And just to maybe even conquer to that further, right? The biggest aha moment for me in The DevOps Handbook was really the bifurcation of the creative act of design and development, right? Where it's lead times is measured in weeks, months or quarters versus build, test and deploy, where it should be minutes or hours. Of which the dividing line is a point of CodeCommit into version control.
Gene Kim (00:15:38): I thought that was really great because I mean, even what you just said is the engineering part, most akin to engineering is that build, test, and deploy phase, which you one can imagine a recovery and adherence to the rules. But then I think we're also saying that the design and development phase, that's hard to believe that that same philosophy will lead to good outcomes.
Michael Nygard (00:16:03): I think that's true. I think Glenn would agree with you as well.
Gene Kim (00:16:07): Gene here again. Wow. I think this is so interesting. And as a little aside, this is why I'm finding these interviews in this podcast to be so illuminating. This quest to find a more parsimonious set of principles to explain the world around us, to explain the most amount of observable phenomena is just so dazzling. And that's exactly the feeling I had when talking to Mike. Let's go back to the interview where you will hear me tell Mike a story that Steve Spear had only told me the week before. Part of me wanted to present a cleaned up version of story to you, but I decided to keep the original so you could hear Mike's reaction to the story because it so much mirrors the incredulity I had when I first heard it from Steve.
Gene Kim (00:16:50): I want to share with you this other thing that Spear told me last week that I did the verbal equivalent of tripping and falling flat on my face. It was so riveting. So my question was kind of this notion of structure shows up in Toyota plants. I asked who in the Toyota plants is creating that organizing logic of the plant runs that results in these amazing dynamics? And he goes, "Nobody." In software, right? His theory is maybe it's because you do it every two years so there's this discipline in, in manufacturing plants. But one is stood up every 15 years, right? So nobody really has... It's not really in anyone's job description, not the chief engineer, not everybody.
Gene Kim (00:17:29): And I find this a little bit preposterous, but then he told me the story. He said in the mid-90s, he went to visit a Toyota plant, which his mentor, Kent Bowen at Harvard Business School, and a VP of manufacturing at a Big Three plant. One of the things that they were showing off at Toyota was the fact that they did 60 line-side store changes per day. I didn't know what that was, bu so at every work center is basically the racks where you store all the inputs, right? So changing that 60 times a day and the VP of manufacturing from the US auto manufacturer said, "That's crap. That's bullshit." I asked him, [inaudible 00:18:09], "What does that mean? That's a bad idea that it's absurd, it's crazy?"
Michael Nygard (00:18:17): It's impossible.
Gene Kim (00:18:19): It's impossible, exactly,. Right. Disbelief. And he said, "We tried six and it shut down our plant for three days." And so it evoked kind of... I think what many people how they reacted when we heard the 10 Deploys A Day at Flickr, right? The Allspaw/Hammond presentation.
Michael Nygard (00:18:41): My first thought was this is a clickbait title. There's no way. [crosstalk 00:18:46] like some phony definition of deploy.
Gene Kim (00:18:51): That's bullshit, right? I tried one.
Michael Nygard (00:18:53): Yeah. We were trying to do three a year and it was shutting us down.
Gene Kim (00:18:58): Gene here, brief break in. I just wanted to make sure that you caught that reference. I was referring to the famous 2009 presentation by John Allspaw and Paul Hammond about how they were doing 10 deploys a day, every day at Flickr. Mike summarized how so many of us reacted when we heard about that presentation, which was primarily disbelief. And even if they were telling the truth, it just seemed preposterous because it seems so dangerous and reckless and maybe even immoral. I love how Mike suspected that they were even being a little bit disingenuous of how they were using the word deployment, just like the VP of manufacturing from that Big Three auto plant.
Gene Kim (00:19:41): Alright. Back to the story. So, I was asking what's the difference between a system where you can do, you tried six and it blows everything out versus where you can do 60? He said, it's because pieces are decoupled from each other. Imagine in the Big Three plant, there's a central MRP planning system that says, "Here's the production control. Here's the routings. Here's whatever." And everything's so coupled together that when you try to change six things that you get something wrong and the whole system falls apart. Whereas in the Toyota plant is driven primarily through Kanban cards, is an envelope with three pieces of information on it, here's who I am, here's what parts I need, here's why I need them from, the parts and quantity.
Gene Kim (00:20:29): Basically, no one's actually need to know, except for the originator and where the parts need to go. We just hand the materials handler the envelope and they'll be able to find it. So if you and I both have worked center, you and I both trade jobs, all we got to do is write it down on a kanban card, right? And the parts will eventually find us. When I told that to Jeffrey Fredrick, he said, "Oh, that's information hiding." I recognize that because it allows things to get done without having to tell the central planning system every detail, which impedes the ability to change things.
Michael Nygard (00:21:04): What's interesting about that to me is the same debate plays out over and over again in different contexts. In the microservices design world, there's this argument between orchestration or choreography. Now, I'm not fond of these two terms because with choreography, there is still a choreographer who decides where everyone goes. But the way it's meant in microservices is that there is not a central controller telling everyone what to do it, is that the services themselves know how to react and who they call. So you have this localized knowledge and you don't require the global sort of controlling mind.
Michael Nygard (00:21:51): I see it inside the design of software as well, right? I've certainly seen software designs where everything worked perfectly, but if you changed one piece, you had to change the whole thing, right? Versus other designs where it's more built out of composition and you can change things pretty freely and they only have local effects. And you don't require someone editing locally to have global knowledge of the whole system in order to be safe. So this idea comes up over and over again, but then how do you have... But who is the person that says, "We're going to build a system that doesn't require global knowledge?" Isn't that sort of a paradox? Like you have to have someone in that global position to say, "We're not going to require global knowledge?"
Gene Kim (00:22:42): Steve Spear said something to me that it was kind of equally stunning. He said, "I've had the blessing to be able to study the Toyota Production System for 30 years, that the miracle that is Toyota and people still think it's about manufacturing." Also he's saying that the miracle... And I'm going to use miracle [inaudible 00:23:01], right? Is to your point, right? Who decided that the kanban card, they keep these pieces decoupled from each other and not impose a higher-level order on it? And-
Michael Nygard (00:23:13): And isn't there almost a seductive nature to the idea that if you want optimization, you need a global view and you need to optimize everything from the top? Again, it seems sort of obvious that that's what you should do, doesn't it?
Gene Kim (00:23:26): Yeah, totally. I think what Spear is asserting is that... I think what he's saying or certainly what the implications are is that that was never actually decided, it was a synchronization problem that takes you you don't know what's it trying to solve. In other words, how do you get parts from A to B, which resulted in the deployment of kanban cards, which has this other property of keeping things decentralized? But I'm dazzled by the implications and the benefits that that cause. Are you finding that also pretty freaking amazing?
Michael Nygard (00:24:03): It is amazing. I think it takes a special type of mind and a special personality to do that because it requires somebody who can think of simple rules that generate complex behavior, which is not common. It also requires somebody who doesn't desire to be in control day-to-day, which, let's say, it's not the most common attribute of managers
Gene Kim (00:24:34): Gene here again. Let's see if I can describe why I think the story is so important. In the typical Big Three automotive plant in the 1990s, everything was tightly coupled together in a centralized system. So when you try to do say six line-side store changes in a given day, it was too easy to miss something. Suddenly parts weren't where they needed to be and now you can't ship completed cars at the end of the production line. And that is what ends up shutting the plant production down. It would take them three days to resume production.
Gene Kim (00:25:09): What this says is that the cost of change was too high. In other words, there are genuine changes they may want to do, but can't because the potential consequences were to grave causing too much chaos and disruption. So therefore, the organization is unable to do the things they need to do. This is very much like the team of teams story, where the enemy leader might've been cited, but the US forces were unable to respond quickly enough to capture them. So contrast this to the Toyota plant, where they were doing 60 line-side store changes per day, presumably quickly, easily and fearlessly.
Gene Kim (00:25:46): So the incremental benefit of each one of those changes might be small, but it allowed them to cost experiment, to tweak and tune to improve the standards work. Which over a longer period of time allowed them to continually set the world standard. And this goes to one of the themes emerging about the role of architecture to ensure that the cost of change will continue to be low enough so that everything that needs to get done can be done easily, safely, quickly, and fearlessly both now and in the future.
Gene Kim (00:26:17): Okay, let's go back to the interview where I started to ask Mike more about the concrete characteristics of great architecture. You may recall from my first interview that he gave an example of a business process that defined not only the payment methods that customers could use, but which payment methods were accepted in a certain country. He described the first option where we can solve the problem by putting more logic into the same place where the payment methods are defined. He gave a second option where we create a second service that would enable country managers to define which payment methods are accepted.
Gene Kim (00:26:51): Then he presented the exciting alternative of adding a third service, which might seem more complicated, but is actually easier and simpler to maintain in the long-term. Mike had this amazing comment, he said that most people react to that third option just like the court musicians did in the movie Amadeus, which was, "I don't like it. It has too many notes." So, before we go into that payment method example, I wanted to get a better understanding of how you can actually tell whether something has too many notes or maybe you don't have enough notes. Let's hear Mike Nygard talk about this.
Gene Kim (00:27:26): By the way, when he mentions Rich Hickey, Rich Hickey is the inventor of the Clojure programming language, which he and I had love so much. Here we go. You mentioned the notion of some might think too many notes, which I love, but I also... That reaction is very familiar to me. So I remember over the last 20 years when I pick up a certain software library or trying to use a certain API, my reaction is I would recoil from it. I just want to do a simple thing, like send a log event or draw a rectangle on the screen. And I'm looking at 12 parameters that I have to fill out of which I don't even know what they are. I'm like, "What is a graphics content?" Or, "What is this thing that I need to pass it?"
Gene Kim (00:28:11): And my reaction is, "Oh." It was disgust, like too many notes. At that time, built an application on top of the reframe, our architecture in Clojure, which I love. I think it magnificently decomposes the system so that they can be kept apart. But I remember my first reaction was, "Holy cow! What are all these notes? I don't know what an effective is or co-effect or a interceptor." So I have that emotional reaction of like, "Why are there so many notes?" Can you help me understand how does one develop that sensitivity, that sensibility to understand when notes are useful when there are too few notes? Maybe start with that example again with the e-commerce payment processing. Can you help me understand that better?
Michael Nygard (00:29:02): Yeah. Well, one way to do it is to work with Rich Hickey for several years, which I had the privilege to do. Which is a way of saying, one way to develop that sensibility or taste is to work with somebody who already has it. That shoulder to shoulder learning is always kind of the best.
Gene Kim (00:29:24): Can I interrupt you with one quick question? Have you had that as well where you look at something and you're like, "Holy cow! That's a lot of notes to have to get my [crosstalk 00:29:32-"
Michael Nygard (00:29:32): I absolutely can. In fact, my first reaction with reframe was, "I just want a database field on the screen." One of the things that I had the opportunity to do was work in Objective-C and a little bit in Smalltalk. They have a very interesting approach to one of these things. If there is a method that takes 12 parameters, there will also be one that takes 11 of those 12, one that takes 10 of the 12 and all the way down to the simplest possible thing. So if you wanted to just draw a rectangle, the simplest method would take four points, right? X, Y, one X, Y two. So in a sense, the parameters... Small talk and Objective-C used named parameters. The set of parameters was a little bit open. You could provide anywhere from the minimum up to the maximum with some optionality in there.
Michael Nygard (00:30:27): This is one of the things that I learned from Rich, is making your parameters an open set provides a lot of benefits. So if you say I take exactly these six parameters and the use case changes... Say we're talking about distributed systems where you can't easily refactor across the boundary. Now, if I need a version with the seventh parameter, I either have to change everybody all at once and deploy everything all at once. Or I have to add another API method that takes the new seventh parameter.
Michael Nygard (00:31:01): Well, if I just take a map and I don't enforce the parameters sort of at the boundary, but at just one step beyond the boundaries, sort of validating that I've got a payload I can operate with, well, then I can add a seventh parameter quite easily and I can start looking for it. And if no one is sending it to me, okay, I just behave in the old way. If I start to receive it, great, I can use it and do the new thing. So there's this idea that expansion is safe when you're using open sets. This is one way to get around the problem of proliferation of things that are almost the same, but not quite the same.
Michael Nygard (00:31:44): I also have a few sort of rules of thumb that I apply. In one of my talks, Architecture Without an End State, I talk about this rule that says augment, upstream and contextualized downstream. What that really is referring to is upstream and downstream in terms of data flowing through your system. So data in this case may be requests from users, it may be feeds from outside, but you receive data in in kind of a basic form. And what some systems try to do is immediately reject some of that data. So filter out entities that we think don't fit our schema. They try to decompose it into a relational format where we're fixing the cardinality of relationships. So, part-whole relationships are one-to-one or one-to-many and changing from a one-to-one relationship to a one-to-many is hugely disruptive, right?
Michael Nygard (00:32:47): So almost the first thing you do is you take this data coming in and you say, "Whatever fits into my schema is real and anything that doesn't fit my schema doesn't exist." That's already contextualizing. You're throwing away information. What I prefer to do is take in all the data and say, "This data is real and somebody somewhere downstream might be able to work with it." This is part of my war on required attributes for example. Maybe I don't have all the attributes needed to put an item on the online storefront and sell it and ship it and deliver it, but maybe I have enough to show it to the marketing people who are going to slot it into a category and start making it useful, right? Then as the additional attributes are available, then we can use them.
Michael Nygard (00:33:41): So the context about what I can do with those entities really is determined by downstream systems, not the upstream. What the upstream can do is mix in additional information by joining to other sources, by applying inferences and adding fields. You go through this expansion phase upstream and then as you propagate downstream, different systems get to apply their policy about what they can do with it or what they should do based on the attributes they see.
Gene Kim (00:34:14): I was wiping tears from my eyes. I had a sort of visceral reaction when you were talking about this, when you were coursing count data. When I take data in, and I am often guilty of changing data to fit my parochial needs in that function, and destroy data and make it not available to someone else, I mean, that's-
Michael Nygard (00:34:37): Then don't you regret it later on when you need it-
Gene Kim (00:34:39): Yes.
Michael Nygard (00:34:39): ... or you want to do something else?
Gene Kim (00:34:42): Right. The notion is that really what should be happening is you can add to it, but you really should not remove things so that other people can use it later. Yeah, I love that and that Rich Hickey notion of you can [inaudible 00:34:58], but you can't-
Michael Nygard (00:35:00): Take away. Yeah.
Gene Kim (00:35:00): ... destroy. Take away here, right? That suddenly seemed even more important. So what other parts are there to the sensibility of too many notes versus too few? For example, I'm intrigued by your reframe experience as well, right? I think we both have a tremendous amount of admiration for it, but that feeling of all you want to do is put something from the database on the screen and there's a fourth component pieces that need to be understood to even write your first event handler. What distinguishes that from the too many notes problem?
Michael Nygard (00:35:36): I'm hesitant say anything that would be critical of reframe because I think they've done a great job. The documentation is some of the best I've seen. It's very explicit about everything. I want to separate the getting started experience from the day two experience. So, we've all had situations where the initial on-ramp is pretty tough, but then the rewards are high, right?
Gene Kim (00:36:03): At Clojure in my case.
Michael Nygard (00:36:05): We can build better on-ramps, right? You can create templating tools. You can create project generators that give you stuff. You could imagine adding some macros in your Dev workspace that would create for you the pieces that reframe needs. Then you only need to sort of unpack them and care about them once you have to make variations. In terms of the too many notes, one of the other kind of recurring patterns I see is this difference between the archetype and the instantiations. I'm trying to be careful about terminology because I see this pattern happening in a few places.
Michael Nygard (00:36:47): I've been in, say, Java code bases, where there are a high number of classes which only ever have one instance. And they may have interfaces where the interface is a one-to-one match with the implementation and it's only instantiated one time. In those systems, you get a proliferation of classes. If you look really hard at the behavior, you'll start to see that there's a lot of behavior being repeated across the classes. A lot of the interfaces will look like near duplicates of each other, but not quite.
Michael Nygard (00:37:22): I've worked in other Java code bases, but more commonly Smalltalk code bases where classes are instantiated many, many, many times that it would be extraordinarily rare to find a class that only has one instance. So because a thing is reused, it becomes reusable and the cognitive overhead is way less. I only have to understand the class one time, whereas in the former type of code base, I have to understand each of these sort of megalithic God classes independently. The same exact thing happens with services in a microservices environment. Most microservices environments have only one instance of any given service. So one code base, you can think of that as the class or the archetype, one instantiation and everyone uses that one instantiation.
Michael Nygard (00:38:20): Well, that means I have to understand how to interact with that service and the other service and every other service independently just like those mega-classes. Whereas if I can find ways to generalize the components, I can reuse the components. One of my favorite examples of these is with Kafka. There are these Kafka connectors. If I need to take a topic, receive all the messages, flattened by a key and make it persistent so I have a materialized view of the latest of that key for the whole topic, I don't need to write a new component. I instantiate an off-the-shelf component with some parameters, some configuration that says what topic, what's the key field, what database, what table does it go into.
Michael Nygard (00:39:11): If I have a lot of instances of those little Kafka connectors, it doesn't really add that much cognitive overhead to try to understand each connector. What I need to understand then is how is data flowing through the system? So I'm operating at a higher-level because of the simplicity of the underlying components. That notion of simplicity goes along with generality. This is another one of my ongoing arguments that I contend that making something more general almost always means making it simpler, not making it more complex. You don't achieve generality by adding every special case possible. You achieve generality by removing all the special cases.
Gene Kim (00:40:01): We are so much looking forward to the DevOps Enterprise Summit, Vegas Virtual, which will now be held on October 13th to the 15th. As always, the goal of the programming committee is to bring you the best experience reports and to outprogram all our previous events. This year we expect to deliver on that promise again. I am so excited about the speaker lineup we have for you partly because they are among the most senior technology and business leaders that have spoken at this conference showing you how important the work of this community is.
Gene Kim (00:40:31): Maya Leibman, the CIO of American Airlines who presented at our annual forum in April and we were fascinated by the perspectives that she shared with us. I'm so excited that she will be co-presenting with our long-time friend [Bras Clinton 00:40:44] about the American Airlines journey. Since 2014, we've all been dazzled by the CSG journey as told by Scott Prugh and Erica Morrison. I am so thrilled that this year Scott Prugh will be co-presenting with his boss, Ken Kennedy, executive vice president and president of CSG, the largest provider of customer care billing and order management in the US.
Gene Kim (00:41:06): Ken and Scott will be sharing their story on the interplay between business and technology leadership and how it resulted in their amazing accomplishments over the years. This is just the beginning. Stay tuned for more exciting announcements about our amazing speaker lineup. This will undoubtedly be the best DevOps Enterprise Summit Program we've ever put together. You can find more information at events.itrevolution.com/virtual. Keep going because that's a heck of a claim to make.
Michael Nygard (00:41:37): Okay. So this is another Clojure example. Suppose I want to find the length of a list, and imagine that we didn't already have length built in as a function, I would reduce over the list applying a plus operator, plus one to my accumulator for each item in the list. So you're already writing the code in your head I can tell. You know exactly what that would look like. It's a one-liner. Now imagine I say, I want a function that can only find the length of lists of prime integers. You have to add code to make that work, right? The more specific thing requires more code.
Michael Nygard (00:42:23): Now if I want something that finds the length of a list of names, I have to add code to make sure that my list is only full of strings. If we take the same idea into the strongly typed world, the more specific your type signature is, the less general your functionality is. So you have to add more cases to cover more territory. If I have a function that goes from list of ints to ints, there's basically just a handful of ways to write that. If I have something that goes from list of A to int, I can feed it many more things and the code is going to be simpler. Because the implementer is able to make fewer assumptions about the parameters it receives.
Michael Nygard (00:43:08): If I have list of int to int, I might be multiplying the ints together, I might be summing them, right? There's no guarantee that I'm actually counting them. If I have lists of A to int, the receiver doesn't know what they can do with A and so they're constrained to basically, "What can you do with it?" You can count it and then you can do something crazy like divide by two or negate the count or maybe just all those returns zero. But it is more general and it's going to be simpler because there are fewer operations being done on the parameters coming in.
Gene Kim (00:43:43): Holy cow! This is not where I expected Mike to go, but he just gave us a pretty precise and also a very startling definition of how to know whether code is simpler or more complex. I not only had to listen to this portion of the interview several times to make sure I understood what he was saying, but I also have to read, listen to a LambdaCast Podcast that I heard last summer. Which I was dazzled by, but didn't actually fully understand until today. But thanks to Mike, I think I understand now and it's pretty amazing what Mike is claiming.
Gene Kim (00:44:19): Let's rewind and listen to what Mike just said. The greater the number of special cases and logic I allow into my function, the less general it is. And the fewer number of specific cases and logic I allow into my function, the more general it is. Okay? I guess both of those make sense. In other words, if you want to write general code, avoid logic and special cases. I think that's helpful. He then went on to say the more general the type signature of my function is, the fewer operations that can be performed on them. Conversely, the more specific the type signatures are, there are a greater number of operations that can be performed upon them and the less general the functionality is.
Gene Kim (00:45:07): Okay. This will take a little bit of explaining. I'm going to put a link in the show notes to that entire episode of LambdaCast, which is on this very topic, which is hosted by the very brilliant David Koontz. David Koontz says with every increase that you know about the types, you have less certainty about what the function can do. If you know nothing about the types, you actually know everything about what it does. So just following only applies to pure statically-typed functional programming language like F sharp, ML and Haskell. But it's still an astonishing proof point.
Gene Kim (00:45:43): I apologize if this is getting too abstract, but this is what category theory, the mathematics that all of functional programming is based upon says about this topic. If you have a function that accepts type T and returns type T, you already know exactly what the function does. The only thing that a function can do is return exactly what you gave it because if you don't know what type T is, you can't make a new one. Therefore the only valid value it can return is what you gave it. In other words, in the scenario where you know nothing about the type, you know already everything about what it does.
Gene Kim (00:46:21): Let's now consider a situation where you know everything about the type, you now know nothing about what the function does. Here's the proof. Suppose you have a function that accepts type int and returns type int, you now have an infinite number of values that the functioning can return. It can be a constant: one, two, three, and so forth. It could be negative. You could add one to the input, add two to the input. Basically you have an infinite number of values that it could return and so you really have no idea of what it actually does from looking at the inputs.
Gene Kim (00:46:52): Again, to repeat the astonishing claim that Mike makes, if you make something more general, it has to be simpler. When something is more general, it will have fewer lines of code and it will even eliminate the possibility of having specific cases in your code because you don't even know what you're operating on. So it's the right things that are simpler and more general. We eliminate as many specifics from our code as possible. I got to tell you, wow, that is a pretty big idea. Okay, let's keep going.
Michael Nygard (00:47:24): Let me use another concrete example. I don't have a mathematical proof on this, but I have a lot of examples. But this one is actually a debate that I've had inside my company. I was being provocative and it triggered a lively exchange of ideas. We often need to find the location of things on earth. So we were in the travel industry. It's useful to know in which city the airport called ORD exists because sometimes people care about going to the city rather than the airport and so we need to know that.
Michael Nygard (00:48:06): Well, we can write a service that will take an airport code and return you the LAT launch of the airport code, right? Now in order to write that service, somebody has to feed me with the data about airport codes, what they are. And either the same source feeds me the coordinates of those airport codes, or maybe I get them all as one delivery. Well, when I receive a request, what's the first thing I'm going to do in such a service? I'm probably going to look to see if you've given me a real airport code or not.
Michael Nygard (00:48:47): So I'm adding code to validate that the parameters are legit for the type. Yeah? Then I'm going to go make a query to find out where it is and maybe I'm going to do a radius query with LAT launch to find nearby points of interest. And I'm going to return you a place or a set of places. Let us now suppose that I also need to locate hotels, should I write another service to locate hotels?
Gene Kim (00:49:22): My gut feeling is probably not. That seems like a concretization that is not necessarily since you're already doing location points of interest.
Michael Nygard (00:49:33): Except that I only accept airport codes in my specific API. So now maybe I need to add a special case or another API function that accepts a hotel identifier. Now in addition to hotels, maybe we want to add theme parks or cruise ship terminals, various other points of interest. My service is growing new APIs, but fundamentally, all it's trying to do is map a name to a location or a set of locations. So what I should really do is take away all the special cases. Imagine the Google search page, if you had to tell Google, if you were searching for a phone number or a zip code or the name of a restaurant or the name of a book, or the name of an author of a book, or the name of a movie adapted from a book.
Gene Kim (00:50:34): Imagine the drop-down box, right? That you would have to, right? Before you hit enter.
Michael Nygard (00:50:39): Imagine I had a service that could translate a name into a location or a set of locations. Now, I have the choice where I can make an instance that only deals with airports and I can make an instance that only deals with hotels, but that choice is in the dataset that I loaded up with, not in the implementation of that service. The service is more general. I can choose to run one global one that handles all named locations for everything or I can choose to have many deployments that are composed into different workflows and have operationally independent availability. But I have more options because I've got a more general thing at the core.
Gene Kim (00:51:24): Awesome. And so what was the strongest argument for the other case? What was the opposing argument?
Michael Nygard (00:51:33): The opposing argument was that any given caller certainly only cared about their type of data. In other words, if you're looking for a flight, it's of no use for me to give you back hotels in Chicago, which is true. What that tells me is we need to augment the data that we're passing in with some context.
Gene Kim (00:51:56): Right. So this is a feeling you have, is you call an API and you get back a whole bunch of stuff you don't care about and you're mystified by why it's being given to you?
Michael Nygard (00:52:05): Right. So imagine that my parameter is Chicago and I get back restaurants-
Gene Kim (00:52:12): Gas stations. That's [crosstalk 00:52:14]-
Michael Nygard (00:52:13): .. gas stations, hotels and the O'Hare higher rental center and so on. But what that really means is I have some implicit assumptions about what I'm interested in that I didn't tell you about. So, one of two things can happen. Either, I contextualize those results by saying, "Oh, I'm going to filter for airports," which means your data needs to contain some kind of classifier or identification. Or I need to tell you to only give me airports. But we're making that implicit assumption, explicit in the data, which allows us to simplify and generalize the service on the other end.
Gene Kim (00:52:59): That's super interesting fact. I mean, so I think maybe one of the conclusions is that feeling you have when you make an API call and you get this huge [inaudible 00:53:10] thing of like stuff you don't care about is don't overreact. And maybe that's okay, right? It didn't hurt you, right? That's actually a signal that that's actually maybe putting into something that's very generalizable, not just for you, but for every other potential caller.
Michael Nygard (00:53:25): Yeah. And if you have no way to-
Gene Kim (00:53:28): Not to be offended by it.
Michael Nygard (00:53:29): Don't be offended, yeah. I previously used the example about Stripe accepting payments where you simply identify the item that is being purchased rather than having to supply them the entire catalog. This is another example of making something both simpler and more general at the same time because they no longer have to do catalog look-ups and deal with item not found or item is in the wrong seller or any of that stuff. That would be huge complexity on Stripe's end that not only do I not care about it as a consumer of their services, but it would actually be harmful and frustrating if I had to deal with that hidden coupling that there's an implicit item catalog behind the scenes.
Gene Kim (00:54:20): Wow. I thought that was so cool. So, just in case if you didn't get that the first time around, let me repeat what Mike just said. Imagine that you have a service that takes as an input an airport code and generates as an output a list of items of interest around it such as other airports, hotels, restaurants, and so forth. He presented two options of implementing this. Option A, you create a separate service for each type of area of interest: one for gas stations, one for hotels, one for cruise lines, et cetera.
Gene Kim (00:54:56): Option B, you create one service that handles every type of area of interest. Using his reasoning, you should choose option B because it is the more general solution as measured by it handling fewer numbers of specific cases. I think I'm definitely starting to understand far better how Mike views the world. So let's go back to that payment processing example that he gave in the previous episode. Again, we have option A, you put all the logic into a central group who defines not only the payment methods accepted, but they would also be responsible for ensuring that each payment method is actually accepted in every country.
Gene Kim (00:55:37): Option B, you create a separate service that would allow every country manager to define which payments are accepted in each country. Then the bill ground of option C, you create a third component, which would find the intersection of the two. Option C seems so unlikely because it adds a third component. I now finally asked Mike to explain why option C is the preferred solution.
Michael Nygard (00:56:05): I'd really like to talk about case three the most because I've used this word implicit a couple of times and implicit information is kind of the worst kind of coupling. It's the part that's hardest to change because if there's something that's an implicit assumption on the receiving side of a call, they probably assume there's only one instance of a thing, right? Only one item catalog, only one list of payment providers. It's very rare to see that I can tell you which list to use.
Michael Nygard (00:56:43): This has come up a couple of times in different contexts, but it's also one of the things that I learned from working with Rich Hickey, is take whatever is implicit and sort of ambient or floating in the environment and make it explicit. Make it an argument that you pass along. And oftentimes you'll find the receiving side might not need anything more than the arguments you're giving it. So you can get rid of entire databases, you can get rid of data feeds to populate those databases, reconciliation jobs, because the receiving service just doesn't need it.
Gene Kim (00:57:20): Keep going, right? I mean, it's funny you mentioned that, right? My reaction when you say that is, "Oh gosh, more fields." But then I think about what you said about the example of that 12 argument API does an 11 field version all the way down to four, right?
Michael Nygard (00:57:34): I need to distinguish between two different types of parameters. I'm going to start with the microscale and I'll illustrate this by contrast. In something like a Ruby on Rails app, you've got this fabulous framework called Active Record, which allows you to get an entity back from the database, manipulate it, save it to the database. And you don't need to know the SQL behind it, you can just work with the object. And in most cases you don't even have to worry about the database because there's just a configuration at startup time that says what database am I connected to.
Michael Nygard (00:58:14): This works great until you need to use two different databases. Because the database is just kind of a global parameter, there's one. All the Active Record methods assume the database. By contrast, if you were working in say a Clojure system, whether you're working with a SQL database or Datomic, the much more common practice is to have functions that receive the database connection or the database value as an argument. And this way, those functions work with whatever database you choose to pass in.
Michael Nygard (00:58:55): So it's now up to your application at a higher level to say, "Do I have one? Do I have five? Do I have 10 databases?" The lower level functions no longer coupled to that implicit or ambient notion of the database. Now that's not an optional parameter, right? Those functions require a database connection to work. So we can't really ally that parameter and you do need to pass it along. The example I gave about the Smalltalk methods with a large number of arguments, the optional ones were modifiers that would give you special behavior or added control, but they weren't the... They were optional parameters, they weren't the required ones.
Michael Nygard (00:59:41): At the macroscale, we have a similar thing with services. If we're making something explicit that... I'm calling you, if we're making something explicit in the call that you have to have then I must provide it. What that's doing in a way though is making clear in our API specifications and in our contract, exactly what you need to operate. Whereas before you have some hidden requirements, which may or may not be fulfilled and may or may not be applicable to the use case I'm trying to invoke. I have to know more about how you work in order to invoke you to know if my call is likely to succeed. If it's all explicit in the arguments, then I only need to look at the contract. I don't need to know anything beyond that API specification.
Gene Kim (01:00:34): What about that scenario that option C in which payments do I accept, what did you exactly react to that led you to say, "No, we actually do need this third piece?"
Michael Nygard (01:00:47): It's all about change. Ultimately, almost everything about architecture is how do we enable change at a system scale? If we have the centralized case where there's a master that understands what every payment provider is in every geography or country we're going to have a lot of churn on that, right? We're going to constantly need people to update that. And unless you've provided it with a super good API for allowing changes to be added from lots of different places, you may be dealing with code changes almost on a daily basis. Now we can deploy it. That's no problem. The problem is the attention and the backlog and the queuing time to get that change into that shared master component.
Gene Kim (01:01:36): So this is like the VP of manufacturing from the Big Three auto manufacturer, right? It's a centralized control, one person needs to know all the information, right? Then everything is reliant upon changes there. Okay, got it.
Michael Nygard (01:01:48): What we'd really like is for the business unit in each country to make their own deals with payment providers that operate in that country. Or if we've got TransNational Payment providers, maybe we can make the deal globally for efficiency, but we want that flexibility. We want local adaptation for culture, for example. Not everyone views PayPal the same way around the world. Not everyone uses WeChat to pay for things around the world, right? So we want the people with the local context to be able to contextualize to make those deals, to set up the, capabilities and then sort of inform the global system rather than having the need for coordinated change on both sides of this interface.
Gene Kim (01:02:39): And then just to kind of argue against that one... By the way, my reaction was like, "Oh, TransNational Payments? Oh, no." [inaudible 01:02:47] a little bit to that, right? [inaudible 01:02:48].
Michael Nygard (01:02:49): Maybe a bunch of listeners broke out in hives just now.
Gene Kim (01:02:52): Right. [inaudible 01:02:54] finally kind of startling to hear additional complexity that certainly wouldn't have shown up in my first version.
Michael Nygard (01:03:03): Well, so the challenge with option two is that it's not only the payment providers that are in question because we particularly operate as a marketplace with two sides. We have to think about both the seller and the receiver. And both of them may have something to say about what payments will be acceptable. We have to be able to process it in the currency in the region and it has to be something that's acceptable to the supplier of services who will be receiving that payment.
Michael Nygard (01:03:40): So when you're trying to do that kind of matching, somebody somewhere has to take two sets and find the intersection of those two sets. And the essence of my third option is let's do that intersection late by having one side provide it's set in the request data rather than having it all preconfigured and predefined. Just provide it in the request data and then when it finally reaches the end point, that's when you do the intersection.
Gene Kim (01:04:12): And there's something so gloriously right about that third option, but I'll be honest. The red flags didn't go off as you were describing that, what could go wrong if you have all that logic happening in the option number two? What's so hard about having that matching happen in that service?
Michael Nygard (01:04:30): I'm going to make an assertion that the best granularity for data is request level or transaction level business, an instance of a business process. And so if we could, we would pass all of our data within the business process. I mean, all of it. Because then it can change from one request to the next. So all of our rules, all of our policies could change from one request to the next without requiring code changes. All of our, I don't know, catalog and item data, all of our approval levels, what have you, everything.
Michael Nygard (01:05:11): Now, of course I'm sort of postulating an impossible universe, right? Because we know that we can't carry all that data with every request. The size of the request payload would be ridiculous, which means every piece of data we're storing in advance to make decisions is a performance optimization. My assertion about that then is because that's a performance optimization, it generates complexity as with any kind of a cache. So you can think of a lot of our databases as caches where we are providing a key like an item ID or a carrier code or something along those lines. And we've got cached business rules or policy data or something along those lines.
Michael Nygard (01:05:56): Well, every cache needs refresh mechanisms, update mechanisms. You need to monitor your success rate and so on. It adds complexity in the name of performance because we can't carry all that payload data around. I've really come to regard a lot of our store databases as cached or materialized views on top of events that we use to accelerate decisions during business processes. If we could make everything fully explicit in the payloads, our systems would be enormously simpler. You would only look at data, make decisions about data and admit more data. All of our services would be pure functions.
Gene Kim (01:06:44): Tell us what else. That was my gasp of shock, but [inaudible 01:06:50]. I was wondering if that's what you're suggesting, ideally that what makes that solution better is you are carrying around basically every factor you need in order to make a decision in a pure way with nothing implicit, nothing hidden?
Michael Nygard (01:07:04): Right. We approximate that in some of our systems by this notion of imperative shell, functional core, right? So when you receive a request, you go, you look up everything you need to know., you attach that all to your context, pass it down into the functional core and you get back a value that says, "All right, here's the HTTP response to deliver. Here's some messages to admit. Here are some changes to apply to the database on the way out." But what you've got inside of there is a pure function.
Michael Nygard (01:07:36): Well, a lot of what I'm trying to do in macroscale architecture is extend that idea and say, how can we further apply functional concepts like pass values, not references, be explicit, not implicit? How can we apply those concepts at the level of services in an enterprise scale? One of the amazing things that happens is you automatically get the ability to adapt to certain kinds of changes with plurality. If I no longer have an implicit item catalog, I can pass you a bunch of items, right? That you've never seen before and you can operate on them. That makes both sides of the interaction simpler and more general.
Gene Kim (01:08:19): Okay. You could actually hear me gasp a couple of times as Mike was talking because I started to wonder if he was actually going to make the claim he did. The critical part of what makes option C better is that it makes both components in options A and B more general and simpler. And that option C could be done as a pure function. That term pure function comes from the functional programming domain. Pure functions are the notion that functions must be referentially transparent. In other words, for any given set of inputs, you will always get the same outputs. This can only be true if there's nothing implicit, no global variables, no back-end data stores as queering.
Gene Kim (01:09:01): In fact, what often makes a function impure is that it uses the current system time, which of course will be different every time the function is called. Instead, time must be passed in as an input. So when you do this, do you end up with systems that are dramatically simpler to not only implement because you can test them without any of the other system components being present? For those of you who saw Scott Havens present at DevOps Enterprise Summit in 2019 on the work that he did at Walmart and Jet.com, this is exactly what he built to handle the entire supply chain systems for Walmart. I'll put a link to that talk in the show notes, but I'm happy to say that I've already interviewed him for a future episode of The Idealcast. He will talk at length about this exact topic. I think this is all so amazing. I am so happy that I finally understand why Mike's third option is obviously the best option.
Gene Kim (01:10:00): Okay, back to the interview. By the way, when you hear me say something that sounds potentially disparaging about monorepos, I just absolutely not meant that way. I love monorepos and I am in awe of how Google has used them for almost all their internet facing properties. So it's funny that you brought up Rich Hickey because the question that I was dying to ask you is what is it about Rich Hickey and Clojure? One of the things that... I got a chance to talk with him at the last Clojure/conj, one of the last conference I went to before the lockdown. And something that just clicked for me was that he seems to be viscerally aware of coupling a couple observations.
Gene Kim (01:10:47): One is he seems to detest unnecessary coupling and he seems to be aware of it at a level that most of us, me included, cannot see. To the point where his sensibilities almost seem alien. I remember him reacting to the notion of a monorepo and being disgusted by it. I think it's because that it's tied to a CI system that you're not able to work on two separate components without deploying. So you can't really work on two things that have two things in progress and have them interact with each other, which I think is an amazing observation that I certainly never objected to.
Gene Kim (01:11:27): But now that you mentioned it I do recognize how many workarounds I've had where I was like, "I just want to work on two pieces without committing both of them and deploying both of them." His notion of classes being coupled to each other was actually one of my big aha moments in his job of one presentation. I mean, so what is it... Could you validate that sensibility and why are those couplings bad?
Michael Nygard (01:11:54): I think you've described those two characteristics of Rich pretty accurately. One of the things I learned from him was how to spot coupling that I had previously not seen. The things that appeared to be atomic to me, he regarded as compounds that could be decomposed. I'll give you an example. When we talk about OO programming, Clojure gives you the characteristics of OO, but they're all a cart. Whereas a class couples together a protocol and an implementation and some state. And in Clojure you can separate all three of those and handle them however you like. You have the option to compose them together however you like.
Michael Nygard (01:12:45): I had this discussion with Rich many times about actors and whether it made sense to include actors into Clojure. And maybe after the third time I finally got what he was saying. An actor is a compound. It is behavior plus state plus an inbox that somebody is managing. You have exactly one inbox, you don't have the choice of multiple inboxes. You have exactly one outbox, you don't have multiple ones. So an actor has already made some decisions about bringing together constructs. Everywhere that I just said and, Rich would take those apart and supply each of them independently. So you have channels, you have ways of managing state, you have ways of managing behavior.
Michael Nygard (01:13:30): Then if he provides those atomic components, you have the option to compose them together, but you're not obliged to compose together. And so splitting things and splitting and splitting and splitting is totally appropriate for a language designer. Rich has incredible sensibilities about that. I'm constantly impressed. He has a strong aesthetic sense that goes along with it. And there is such a thing as taste and one language designers tastes may be more in line with yours. I think his taste for splitting things down into tiny pieces, tiny orthogonal, composable pieces gives consumers of his language tons and tons of options.
Michael Nygard (01:14:25): It's funny that you mentioned the monorepo idea though, because we're actually moving towards a monorepo inside my company precisely because we want to couple some things together that have been independent in the past. So yes, those higher-level constructs pre-make decisions for you or predecide things for you. And sometimes we choose that deliberately. It's when you get it by accident or without reflection that the coupling is really a problem.
Gene Kim (01:15:02): Gene here. Okay. If you're getting a little bit lost because you don't know the Clojure programming language, I'd recommend you watch an amazing Rich Hickey talk called Simple Not Easy that he gave at the Strange Loop conference in 2011, where he talks about coupling. This is where I learned about the term complected. It shows up so prominently in The Unicorn Project. Those concepts Rich Hickey are at the heart of the first ideal, the whole notion of locality and simplicity, the desire to keep components of the system from being complected together. In Rich Hickey's talk, he talks about splitting apart the notion of identity and state and interfaces and time and namespaces and functions, data structures, and all the benefits afforded by doing so. And when Mike Nygard mentions agents that comes from the agent construct popularized by the airline programming language used for concurrent programming.
Gene Kim (01:15:58): The second thing that I wanted to mention is that coupling is neither good nor bad. As Mike was saying, it's only when it is accidental or when there's too many implicit assumptions that when it can hurt you. So when you go to a restaurant, typically you want a meal, not all of the ingredients put into a sack and left for you to assemble. And often that's the right thing to do for our customer. But we've all been in a situation where we don't want the entire meal or we don't want the entire piece of furniture, we just want one bolt or one screw. And we shouldn't have to order a whole new bookshelf just to get that one screw. All right. Back to the interview. So what do you think those sensibilities of breaking things down into these small, orthogonal pieces? What [inaudible 01:16:46] is that generalizable or reinforced kind of your sensibilities for thinking about macrosystems?
Michael Nygard (01:16:52): I would say I'm continuing to explore how those ideas work at the macroscale. We have this challenge of metaphors when we talk about service-based infrastructures or service-oriented architectures or whatever acronym you like to apply. We try to say it's a collection of objects that are distributed, except you don't want to make too many calls because there's a lot of overhead, right? And sometimes the object is just not there when you try to talk to it. Oh, well, so it's not that much the options, right? We don't enforce type signatures on calls, you can make any kind of call you like and it could respond with a catalog of Weird Al music, if it feels like it. You don't have a byte-level syntactic enforcement like you do with objects. Actually, the more you look at it, it's not really very much like objects.
Michael Nygard (01:17:54): Okay. So we'll let... Maybe it's like actors. You pursue that path and you're... No, it's not really very much like actors either. Eventually you start to realize, now these service-based architectures are really their own thing. They have their own properties, their own characteristics. We need to think of design techniques that work for these. And it's still... I mean, even though we're 20 years into SOA or more, maybe 25 years into SOA, and we're at least 15 years into the Guerilla SOA or REST style, I think it's still relatively early days to see what evolves and survives change the best.
Michael Nygard (01:18:42): We had stories from Uber a few years ago about how they had more services than engineers, which probably meant they had some orphaned services that no longer had anyone who knew what they were or what they did or how to deploy them. Well, now we see stories from Uber about now that they've stopped their hypergrowth scaling and sort of flattened off on their employment curve. Now they're kind of pulling back from that and saying, "Well, we're going to take collections of services and put them behind a facade that represents a higher-level aggregated behavior." I totally get that. That's a very sensible pattern.
Michael Nygard (01:19:20): At one point in one context it seemed like rapid proliferation of services was the right way to survive change and evolve. Now we're thinking actually that may allow coupling of types that we don't like that inhibit other kinds of change and evolution. So we're still trying to figure out what it is that's going to allow us to survive and persist with these architectures. My explorations on applying the principles of functional design is part of that. There some pieces I'm very certain about. There are some pieces I'm pretty sure will work and there are some that are hypothesis.
Michael Nygard (01:20:04): I'll give you one example that I'm very sure about. I designed a service at one point that I called a perpetual string service. I was with a company that had a problem about a T's and C's. They needed to make sure that when a user came to the site, they had agreed to the latest terms and conditions. This was a SaaS eCommerce company. They had a table with all of the shop IDs and sorry, the date that the T's and C's have been agreed to, but they weren't keeping the old versions of T's and C's. They were being overwritten. And so actually go back and say, "What's the difference between what I agreed to before and what I agreed to now?" Or if you got into some kind of an arbitration situation, you had the date they agreed to it, but you had to go to paper to figure out what the text was.
Michael Nygard (01:21:07): So to me, coming from the functional world, I said, "Well, the problem is you're treating something that should be immutable like it's mutable." What you should store is a reference to a perpetual record of the T's and C's that they agreed to. And when you modify your T's and C's what you're actually doing is making a new contract, not modifying the old one, right? So keep the old one around, make a new one and when they agree to the new one, update the reference to point to that. Well, it turns out there are a lot of cases in a company where the ability to store an arbitrary string of text that is immutable, content-addressable, and that I can rely on fetching forever. There are a lot of use cases for that. So by making the service as simple as saying, "I'm going to put a bunch of texts to you and you're going to give me back a URL. The contract is-"
Gene Kim (01:22:02): Oh, my gosh.
Michael Nygard (01:22:02): "... I can always use that URL to get back the original text." Very, very simple, right? You could write that in an afternoon, enormously useful in a lot of different situations. But by the way, now it would just use Google Cloud storage or Amazon [inaudible 01:22:22]. Those effectively are the content-addressable immutable storage I was looking for.
Gene Kim (01:22:27): That's astonishing. That's really freaking awesome. I love that story because Mike is highlighting how important the concept of immutability is another core concept from functional programming. It's the notion that when you create a variable, you can never change it. You must create a new one. Rich Hickey described this as the notion of place-oriented programming. The notion that in the olden days, we had to care about memory. So that's why we had to reuse memory. That's why we had to use pointers and memory addresses for variables and Clojure and most functional programming languages. Immutable data structures are the norm. I have found that so much complexity in the applications that I've written disappear when you use them. Entire categories of errors no longer happen.
Gene Kim (01:23:15): I think that concept is familiar to many of us, but then he made the same claim for databases that in the olden days we had to preserve space in databases so we would routinely overwrite values in our databases. I mean, after all, what else would you do? Rich Hickey created the Datomic database, where you can't overwrite values, you have to merely supersede them. It means that the database only grows and never shrinks. I love Mike's answer to the one thing that he does know about macrosystems at scale, which is that it will take advantage of immutability.
Gene Kim (01:23:51): So I've been dazzled, every interaction that we have. I learned so much and my eyes are all teary from this laughing so hard, but there's something that is bothering me. You talked about the architects elevator about going from the boardroom to the boiler room and you just gave me a new example of the boiler room of like these immutable strings servers. So it seems like it could be almost trivialized as like all about the bits and bytes, why does Mike Nygard care about strings? And yet it's definitely a board level issue, right? About how do you know what terms and conditions someone actually signed up for four years ago? That's in that class action lawsuit.
Gene Kim (01:24:29): So on a scale of one to 10, to what extent do you think the most senior leaders are armed with the knowledge of structures that is required to win in the marketplace in terms of the supporting architecture, how to organize teams, how to create these enabling services that could be easily laughed? On a scale of one to 10, one is no concern, every leader... The top leadership knows is properly supported. 10 is grave existential concern about in most organizations, leadership is not armed with that level of knowledge or sensibilities.
Michael Nygard (01:25:11): I'm probably at a nine. I think that there are exceedingly rare companies where executive leadership and board level leadership has an understanding of these issues. I've been fortunate to work in some companies where it was true and I've certainly seen the effects in companies where it was not true. The idea that managing a large enterprise is all about looking at the balance sheet and optimizing your labor cost and outsourcing non-core, et cetera, I think it's a harmful idea. Because you don't have that profound understanding of the system that the Deming's ed was necessary to make changes. If you view your company as a system, you need to profoundly understand it. That understanding is hard to achieve, it's time-consuming, it rarely comes from outside the company. I think it's a combination of perhaps luck when it occurs or it's a combination of... Or it's exceptionally good recruiting and team building by the very top executives.
Gene Kim (01:26:26): Mike, this is so fun. I feel like really important. I mean, I think these are... He said, I think are not well understood. And when I say not well understood, certainly not well understood by me. I think are really important that everyone needs to understand better. I mean, with just a tremendous amount of gratitude, thanks for your time.
Michael Nygard (01:26:45): I enjoy talking to you enormously. It's fun every time.
Gene Kim (01:26:54): Wow. That was such a cool interview and one of the most challenging to fully process, comprehend and explain, but I think the topics that Mike covered in this interview and the last one are so important for any organization aspiring to win in the age of software and data. I'm so grateful that I got to learn so many of these sensibilities we talked about today by watching as many of Rich Hickey's talks that I could find. And by programming in his language Clojure, which forces you to program according to his sensibilities. If any of these things interest you and you love programming, I recommend you try Clojure out. In the show notes, I'll include a link to my blog post, My Love Letter To Clojure, which has a list of my favorite aha moments and how it's reintroduced the joy of coding back into my life.
Gene Kim (01:27:44): In the next episode of the Idealcast, I'll be interviewing David Silverman, CEO and founder of CrossLead and co-author of the amazing book Team of Teams, which has been a topic of conversation in every episode we've done. I'll also have on Jessica Reif, who is director of research and development for CrossLead, where she leads their education efforts, which have been delivered to over 20,000 leaders. I'm so delighted like so many of us, Jessica Reif comes from a software background. This is an amazing interview where we learn about the story behind Team of Teams and the lessons that leaders must learn from it. See you then.
Sign up to receive email updates
Enter your name and email address below and I'll send you periodic updates about the podcast.