Gene Kim (00:00:00):
Welcome to another episode of The Idealcast. I'm your host, Gene Kim. I'm so delighted that in this episode, I interviewed Dr. Gail Murphy, who is one of my favorite researchers in all things related to modularity, software architecture, information hiding, and developer productivity. This episode is made possible with the support from VMware Tanzu. Free your apps. Simplify your ops. Head to tanzu.vmware.com to learn more.
Gene Kim (00:00:27):
You're listening to The Idealcast with Gene Kim, brought to you by IT Revolution. Dr. Gail Murphy is Professor of Computer Science and VP of Research and Innovation at the University of British Columbia. I was introduced to her work by a good friend of mine, Dr. Mik Kersten, author of the book Project to Product. She was Dr. Kersten's PhD advisor, and they went on to co-found Tasktop.
Gene Kim (00:00:56):
I've loved reading the research papers she's written over the decades and her dazzling vision of what developer productivity should look like, and she has a vast set of collaborators who have made such an incredible contribution to our field. Both she and Dr. Kersten have studied deeply the notions of what developer productivity means and uncovering insights in terms of what factors inhibit developers from achieving their full creative problem-solving potential. So many of these themes show up in the Unicorn Project, which tells the story of how even Maxine, one of the best developers at Parts Unlimited, when exiled to the Phoenix Project can suddenly do nothing by herself because of the horrible architecture that deprives everyone of their full problem-solving potential. Maxine goes on to help lead the rebellion that overthrows all the powerful orthodoxies that were enforcing this terrible status quo.
Gene Kim (00:01:47):
I had recorded this interview with Dr. Murphy back in April, and I'm so glad that I had waited until after the episode I just did with Dr. Steven Spear on physical supply chains, because, holy cow, Dr. Murphy and I discuss and explore so many similar topics that were explored in the previous episode, but in the context of modularity and how so many miracles that we take for granted in the software supply chains are made possible by that modularity. There are so many genuine aha moments in this interview, especially to what extent open-source software provides some very surprising insights on how one can achieve modularity and information hiding and the full benefits that are created as a result.
Gene Kim (00:02:28):
So in this interview, we will learn why defining software developer productivity remains elusive, especially when using the conventional definitions of productivity, and how developers talk about what factors make them feel productive and what the value of modularity is and how one can achieve it, and the many ways we can decompose even small systems can have surprising outcomes, how open-source software is a triumph of information hiding and how it has created a massively interdependent set of libraries, but it also enables incredible co-evolution, which is only made possible by modularity, how we've exceeded and fallen short of the 1980s dream of software being like Lego blocks, where we can quickly create software by assembling modules together, and what we've learned from the infamous Left Pad and My Magic Incidence in the last two years, why and how in some very specific areas, the entire software industry has standardized on a set of modules versus in other areas, we continually seem to go in the opposite direction, and a summary of some of the relevant work of Dr. Carlos Baldwin, who has so significantly influenced the work of not just Dr. Gail Murphy, but also two other people you've probably heard of, Dr. Mik Kersten and Dr. Steven Spear, and how software development is a subset of knowledge work and what the implications of that startling insight are.
Gene Kim (00:03:52):
So, Dr. Murphy, I am so honored that I am able to interview you. I've introduced you in my words. Can you introduce yourself in your own words and describe what you've been working on these days?
Dr. Gail Murphy (00:04:03):
Hi, Gene. I'm so glad to be here with you today, having a chance to have a conversation. I'm Gail Murphy. I'm a professor of computer science at the University of British Columbia, where I also currently serve as Vice President, Research and Innovation. I'm really excited about the work that we're doing, trying to identify where design occurs within software developments. So we're interested in locating that within natural texts and then being able to take advantage of it to help developers make decisions better.
Gene Kim (00:04:32):
Can you just say little bit more about that? What's so hard about determining where design is done? I mean, it's in a Word doc, right?
Dr. Gail Murphy (00:04:39):
Well, we all like to think that people very carefully record all of the design decisions that they're making for given systems, whether it's in a Word doc or in a Wiki or in some other form. But the reality is a lot of the design decisions are captured in interactions the developers have. Whether they're on the issue system, whether they're in a pull request, whether they're in a code review discussion, that's where a lot of design decisions get discussed and talked about, and they get hidden amongst all the other stuff that goes on in those systems, whether it's the actual tasks that are recorded or bots that are putting in various timestamped events. So it gets really hard for developers to find those really important nuggets of design within all of that information that is facing them every day.
Gene Kim (00:05:27):
Yeah, and because we're going to be talking about those types of things later in this interview, could you just maybe briefly describe the spectrum of what types of decisions are being made? I imagine some are very small, but some of them can be very large with significant consequences.
Dr. Gail Murphy (00:05:40):
Well, as people make developments, it's really interesting the scope of those decisions, as you referred to. Some of them are as small as choosing a data type if you're programming in source code that requires you to do that. Other times, you might be making a decision about the library to use, which could actually have fairly long-term consequences. Sometimes you're actually trying to design how the system is going to go together, which is going to change the trajectory of what you're going to be able to easily create in the future as part of that system or where bugs might actually be occurring because of decisions that you made. So some are really small. Some are really big. But they can all have long-term effects.
Gene Kim (00:06:21):
Maybe we can sort of zoom in on sort of that larger, more consequential end of the spectrum. How do we divide up the system? What are the interfaces between them? What are the roles and responsibilities of those various parts of the component system? I mean, am I understanding correctly those are all the things that kind of go into those type of design decisions?
Dr. Gail Murphy (00:06:40):
Yeah, well, those decisions from what are going to be the major components? What are they going to do? What are going to be microservices? What are going to be closely together coupled components? What are aspects of non-functional requirements that are going to be cross-cutting everything that I write as part of the system? They all kind of get tangled up together as we work on software developments, but it's also what makes software developments really fascinating to study and to try to make better.
Gene Kim (00:07:09):
Right. It's startling. Some of those very consequential decisions you're saying actually aren't always in a conference room with a whiteboard that you take a picture of and then turn into a Vizio diagram. Instead, some of those are actually coming up ambiently all the time, perhaps never captured or memorialized in that way. Am I understanding that correctly?
Dr. Gail Murphy (00:07:32):
Well, at least in the developments that we tend to study from our research group, and often we are looking at open source systems. There isn't necessarily an opportunity to really stop and do that kind of design work often with others. We often don't have a good way of capturing it in a way that makes sense to other people in the future. So even if you capture the whiteboard, I don't know about others, but sometimes I go back to my whiteboard drawings, and they don't make a heck of a lot of sense anymore. So we don't always have good ways of really doing that memorialization, as much as we might want to.
Gene Kim (00:08:07):
Now, on a scale of one to ten, how important and consequential are those decisions in terms of success of important projects and so forth? One is, "Eh, it's about as inconsequential as anything I can think of. It's cosmetic." Ten is, "Oh no, this is actually one of the core things you need to get right and keep in mind to support all aspects of daily work."
Dr. Gail Murphy (00:08:28):
So what I find really interesting about software is that malleability that it has and, in a way, the consequences of a decision or as much as it is easy to undo and redo that decision. So if you think about programming in Java and maybe you make a decision about certain parameters to a method, well, even that can be relatively easy to undo if you're able to use a refactoring tool to do that. On the other hand, if you're in a system where perhaps you don't have that kind of support, even that decision of what are the parameters to the function or the method or whatever you're working with could actually be pretty consequential, because it might take a lot of work and be really hard to undo all the places that that decision has an impact and an effect. So clearly, if you're making big, consequential decisions about system decomposition at the largest level, those are probably going to be the ten, the highest. But depending on what system you're working with, even smaller decisions might have a lot of consequence, might introduce a lot of technical debt, might be really hard to change.
Dr. Gail Murphy (00:09:34):
It's one of the things that I think makes software really, really difficult, because we don't have a complete spectrum of kinds of changes, kinds of consequences, ramifications of impact that make it easy for someone to start learning which decisions they're making are going to have long-term consequences and which are going to be easy. When you watch programmers that are really adept, maybe they're the high performers, they have an intuitive sense of that. They carry around models of the system in their head that are usually pretty accurate, and they're able to make those trade-offs as they make various decisions. They might say, "Well, this is going to be really easy to change. So I'll just try this. I'll go this path for a while. I know that I can back out of it." When you look at developers that maybe don't have as much experience, every decision can be difficult and can be very halting, because they don't know how easy it is to back out of that decision.
Gene Kim (00:10:31):
I had an opportunity to spend an afternoon with Rich Hickey. So he's the creator of the Clojure programming language and has influenced so much of my thinking. He has many gifts, but I think one of them is a sensitivity to coupling. One of the biggest surprises in this interaction that I had with him was his disdain for exactly what you talked about, the ability to instantly refactor using a right click in an IDE, and suddenly you change every interface easily. This deeply offended him. In fact, I'm looking forward to actually asking him this directly, but it seems to me that his sense was that it was too easy to make these large-scale changes to systems. There are other consequences that go beyond just to what extent does a developer tool automate the refactoring? To what extent does that resonate with you?
Dr. Gail Murphy (00:11:20):
Well, it's an interesting question, because with a PhD student at the University of Bergen and Ellertson, we've been looking at what are the uses of refactoring tools as part of software changes? So a lot of the study of refactoring tools has been very, very focused on the refactoring tools themselves. Can they actually accurately make large-scale changes of a certain kind through a code base? There's been great engineering to allow a lot of different kinds of refactorings to be supported in that way. But when software developers are actually making real changes and trying to think about if I want to remove this parameter and I'm going to use this refactoring tool, what's going to happen, they start to struggle. They start to worry about the consequences, and even if they see the preview, they might not understand the ramifications of actually making that change. If they go through and they actually invoke the tool, sometimes they're surprised at the resulting code.
Dr. Gail Murphy (00:12:17):
So you do have to be able to think really far in advance sometimes if you're using some of these refactoring tools. On the other hand, having something that can rename a variable or a method and do it efficiently and effectively and correctly, that's a real win. So for a lot of these kinds of tools, they can have incredible accuracy, incredible benefits, but we haven't done enough study of them, of how they're actually used as part of sort of daily work, I think, to understand where developers can make the trade-off of wanting to understand how much of the code will be changed.
Dr. Gail Murphy (00:12:54):
So what Anna's doing is taking a real look at can you change those kinds of tools? Can you allow people to step through the changes in different ways? Can you allow them to control the change and maybe only use part of the refactoring? In a way, what that allows them to do is keep their mental model of the code more in line with what the code is so that the tool is helping them deal with the accidental complexity of making the change. But the essential complexity, going back to how Fred Brooks has talked about software, still remains more in their control.
Gene Kim (00:13:27):
Maybe just to heighten our paranoia, just a note that this actually might be relevant to us, can you give us sort of a concrete example where kind of a mechanical refactoring or a refactoring that might look mechanical might lead to grave surprises that would give us pause or panic?
Dr. Gail Murphy (00:13:45):
I don't think you have to go very far. So even if you consider the removing a parameter case that I've been referring to, because I think it's an easy one to kind of visualize in your brain, if you have a code base where your tests are done in a certain way and you want to keep your code running as you make a particular kind of change, and maybe that change involves removing the parameter, if you invoke the refactoring tool at the wrong point, you might actually change the nature of your test. So you're no longer able to test the functionality that you thought you were able to test, because you've removed that parameter. Maybe it's a Boolean switch that you're relying upon. So you end up changing the way that you're developing, where you're expecting to be able to keep your tests running. They might not actually be doing what you're expecting anymore. So your code base is not one unified mass. Your code base is often, "I have my functional code. I have my test code. Maybe I want to treat those things separately."
Gene Kim (00:14:47):
Dr. Gail Murphy (00:14:47):
But tools don't know that difference, so they can treat it as one thing.
Gene Kim (00:14:51):
Dr. Gail Murphy (00:14:51):
Let's say I have a piece of code that takes as a parameter a Boolean, and in the interior of that method, there is a default assumption that the Boolean will be true, but it could be false, right? So in order to test the false path, you actually have to trigger the change in the Boolean to be false, right? So now you've removed the parameter. You still have tests that have now been modified by the refactoring tool to no longer test both true and false. So you've reduced the number of test paths.
Gene Kim (00:15:33):
Dr. Gail Murphy (00:15:34):
So you still have the tests. If you inspected the tests, right, like you're just looking at the names of tests, you'd think that you might be testing some functionality that you no longer are actually able to invoke.
Gene Kim (00:15:44):
Ah, okay, great. By the way, this reminds me of when Jess Humble finally explained to me what a semantic merge conflict was. It was kind of a similar discussion. It was like that horrifying moment when I realized that it may have happened to me, and I actually don't know if it did or not, which is actually a terrible, unsettling feeling. I would be remiss if I didn't ask you about something that you did with Mik Kersten lately. So on a recent podcast that you did with him, you had this wonderful discussion about how defining what developer productivity actually is defies easy explanation. Can you describe why that is and why you found it useful to focus, at least in the short-term, what productivity isn't?
Dr. Gail Murphy (00:16:34):
It's really interesting when you think about productivity in software development, because it seems like it should be so simple, because we produce so many items and things and things that we could count. So if you go back to the 1970s, 1980s, a lot of productivity was based in lines of code or in function points, and there's lots of good reasons why neither of those kinds of measures are actually useful for productivity, because we know you need a different number of lines of code to express the same kind of functionality or value, depending on what kind of programming language you're in. We know that with function points, it can be hard to actually figure out how to define your software in terms of similar-sized function points to allow you to think about productivity in that way.
Dr. Gail Murphy (00:17:21):
So with Thomas Fritzon, Andre Meyer, and Tom Zimmerman, we've done a bunch of work trying to think about how developers perceive productivity, and we've asked them more recently, "What would you do to measure your own productivity?" They don't have a common measure. Some people want to measure in terms of Git commits, some people in terms of issues resolved. There's a whole number of different approaches developers might want to use. What's interesting about all of that is you start to realize there's no real to start easily defining what productivity is. We know from Mik's work and others the value is kind of important. So it's not just how much work we're doing as part of developing something. It's about what comes out the other side. It's even hard to figure out when the value is actually value to your customers sometimes. So you might be producing what you think is value from the organization, but it might not actually be value to your customers.
Dr. Gail Murphy (00:18:15):
So somehow we have a really hard time of getting a handle on the output of our system and the input of our system, both of which are necessary to think about productivity in classical terms. What's interesting when you really start to delve into it is maybe what we should be looking at is what is unproductive work? So where are we introducing technical debt into our system where we weren't expecting to? Where were we developing functionality that truly didn't have value? Maybe if we had a better idea of when we were taking those steps that were actually leading us to unproductive moments, maybe we would actually in the end produce more better, which is in the end what an organization presumably wants to do.
Gene Kim (00:19:00):
I also hear you saying that the reason why we might want to focus on what is unproductive, that it's actually more visible. At least we can see that. Is that properly interpreting what you are saying?
Dr. Gail Murphy (00:19:11):
Well, we do know that we can assess when we're putting more defects into our system, because we get reports about those. Our developer teams often have a pretty good idea when the system architecture isn't like they want it to and they've introduced technical debt. So you're right. Maybe it's easier to actually identify when things are going badly than when things are going really well.
Gene Kim (00:19:35):
If you're not enabling your developers to be productive, then you're slowing down your business. Developers need to focus on getting their code, working in production, not spending time on packaging or infrastructure. VMware Tanzu is a DevSecOps platform that provides a superior developer experience to accelerate modern application delivery. Developers can run serverless containers on Kubernetes with ease. They can discover enterprise- wide APIs and access a self-service catalog of open source building blocks, such as application components, databases, and runtimes. They can apply modern data practices for real-time applications. What's more, they can build secure containers automatically and maintain visibility of applications in production to troubleshoot and improve their apps. Learn more at tanzu.vmware.com.
Gene Kim (00:20:26):
It's so interesting. So when you talk about architecture, I think for me one of the biggest aha moments in my journey and especially in the state of DevOps research, where that was the cross-population study that looked at 38,000 responses over six years, trying to understand what does high performance look like and what are the behaviors that lead to high performance, and specifically we divided it up into technical practices, cultural norms, and then later architectural practices. The big surprise was to what extent architecture was, in fact, one of the top predictors of performance. It was defined as in our instrument to what extent can teams make large-scale changes to their parts of the system without permission from anyone else outside of the team? To what extent can they complete their work without a lot of fine-grain communication coordination with people outside the team? To what extent can they deploy and release their service on demand, independent of services it depends upon? To what extent can they do their own testing on demand without the use of a scarce integrated test environment?
Gene Kim (00:21:25):
So those also probably lead to the ability to do deployments during normal business hours with negligible downtime. So yeah, to me, that was so startling, because the way I was trained during my days at Tripwire was that it was always safe to ignore architects, especially chief architects, because by reputation, they were the people who the only thing they did was generate Visio diagrams and PowerPoint slides that they would email to everybody and then go back to the ivory tower, not to be seen for another year. So that was just a polite way of saying they didn't impact how daily work was performed, when what this finding clearly shows is that architecture is one of the top factors that impacts the daily work of development.
Gene Kim (00:22:02):
There was actually you and Dr. Kersten who pointed me to the work of Dr. Carlos Baldwin about the value of modularity, and she talked about how if you can create an architecture that enables options, it can create an order of more magnitude of value, because it allows for fast and frequent experimentation. Her work cites 25 times more value creation, which was "enough to blow entire industries apart." So can you talk about your view of what modularity is? What is architecture, and why is it so important?
Dr. Gail Murphy (00:22:36):
Well, I think I've always been really influenced not only by Dr. Baldwin's work, but also Dr. Parness's work early on in software development. His paper is on the criteria to be used in decomposing systems or classics in software engineering research literature, and they really speak to the fact that those decisions that you make about what a module is really encode what you're able to change in isolation and where you are going to have to have those communication factors. Despite that, we continue even to this day to really struggle with modularity in software. We know what we want it to be, and we can do a really good job breaking apart a system that we want to build into pieces. Then we find out those pieces really do have to communicate, or we find out there's a non-functional requirement, like logging, that has to exist within every component of our system, and we suddenly get that orthogonal interactions between components and modules that we actually have to deal with and to work with.
Dr. Gail Murphy (00:23:39):
So for me, architecture is really about thinking about the constraints that the system has to work within, the functionality it has to produce, and how it has to produce those with certain nonfunctional requirements and to start breaking things down in a way that allows teams to go off and work as independently as possible, but where they also know when they have to communicate and who they need to communicate with. If we have a vision and are able to record, think about those decisions we need to make, we can start to make those trade-offs of where do we need to create more modularity, because we know we need different kinds of changes in the future where we want to interchange different kinds of modules, and where might we be willing to live for a while, at least, with something that's a bit more messy and not quite as modularized, because we're working under time constraints and we have to get the system out?
Dr. Gail Murphy (00:24:36):
So it's about being able to retrospect on that in a way, introspect on it and be able to think about the modules and how they're coming together in a way that would allow you to make decisions such as trade-offs of where you want to be able to do great changes in the system and where you don't in a way that's actually thoughtful and not one in which we often just fall into, because we didn't realize that we wanted to change the system in certain ways or didn't realize that we had to meet certain nonfunctional requirements.
Gene Kim (00:25:07):
So where does good architecture come from? So when you talk about these virtues that we want, and certainly I think we can all think of exemplars that we've experienced or studied where people were able to get these incredible capabilities to work independently with isolation, is there an easy answer or some common factors that lead to good architectures, and where do they come from?
Dr. Gail Murphy (00:25:29):
I think if you could answer where good architectures come from, you could solve a lot of the world's software development woes. I think there's certainly architectural styles and architectural patterns that have been developed. So if you're able to match your problem to those styles or patterns, you can start to get some advantage of the work that people have done before. I think the challenge is that often the architecture we think that we're building isn't what we're actually building, because it's usually not a sole endeavor. You might have a few people, a small team kind of designing what the architecture should be for the system. That doesn't mean that all the people that are helping you to build that are actually recognizing what the architecture should be or communicating when the architecture isn't going to work out in practice.
Dr. Gail Murphy (00:26:18):
So some of the work we've done in the past in our research group has looked at how do you match the perceived architecture with the actual architecture, and how do you sort of keep things on track? Because it's really easy with software to allow interactions, couplings, calls between those modules to just have to start appearing. They often start appearing for really, really good reasons, because the original architects didn't actually realize that for performance, maybe these two modules have to communicate in a different kind of way. But over time, as we allow that to happen, we get more and more of a drift, and the fact of architectural erosion between the desired architecture and the code is a really well-known problem. We still haven't developed really good systematic approaches and common use to do that check of perception of reality and make sure that we're actually enacting even the architecture we intend if we know what architecture we want in the first place.
Gene Kim (00:27:21):
Can you concretely describe some of those symptoms? Where does this kind of architectural erosion, where sort of as design starts to drift further away from actually used?
Dr. Gail Murphy (00:27:33):
Well, the symptoms usually look like you get a lot more calls between those modules than you ever expected. So perhaps you have a module A is only supposed to make calls to a module B, but over time, you actually realize that module B is making calls back to module A, because there's some sort of information passing that has to happen that was perceived to be able to be done in one way, but in fact you started to use some asynchronous approach and you actually have calls going back the other way.
Dr. Gail Murphy (00:28:06):
If you don't actually realize that you've started to make that bi-directional coupling happening between those modules, you've really changed how they can be used together. So if you think about the Lego block building ideal of software, where you have your little modules and you stack your blocks very independently, suddenly you're bringing substructures in. When you start to pick up one block, many blocks might come with it. So our nice little design of maybe a Lego block system we thought we were going to build, suddenly you can't even make it stand or do what you want anymore, because there's all of these hidden dependencies that you had no idea you were getting into when you decided to depend on a particular thing or use a particular module.
Gene Kim (00:28:52):
The reason I'm laughing, I was watching a ... It was actually this Rich Hickey talk that sort of made me see something that I didn't actually see as a problem before. Maybe I just sort of accepted it as a fact of life that then struck me with horror. Let me see if I can reconstruct that aha moment. So I think the talk was in 2006. He was giving a talk at the JavaOne conference, and he's describing what happened in CORBA and RPC calls, where in order to talk to another system, you have to change both sides. If you repeat that enough, then suddenly you can't change anything. Every endpoint that you have, you have to change all of those. His analogy was now you are like a puppet being totally controlled, but externally, right? Basically it's just deprived you of any freedom whatsoever. Does that resonate with you at all as an example of maybe an extreme case of how this can occur?
Dr. Gail Murphy (00:29:47):
Yeah, absolutely. I mean, there's all those times, too, where you might just decide you're building a system and you need to depend on a library maybe for plotting. So you make a dependence on a plotting library, and suddenly the plotting library needs something else. Suddenly, you need yet another library, and along the way, you end up with some collision that you need two different versions of the same library-
Gene Kim (00:30:09):
Dr. Gail Murphy (00:30:10):
... from different dependency streams, and suddenly you made a really bad decision early on.
Gene Kim (00:30:15):
Dr. Gail Murphy (00:30:15):
Getting out of it is really, really difficult.
Gene Kim (00:30:17):
By the way, this actually hurts. In closure, it runs on the JVM, and I was using something for a plotting library, learned when I was trying to put it in a container that it is inextricably tied to the swing library.
Dr. Gail Murphy (00:30:31):
Gene Kim (00:30:31):
So a decision I made three and a half years ago, four years ago now is profoundly unsuited for what I would now want to do. So I suspect that this is a very concrete example of exactly what you're talking about.
Dr. Gail Murphy (00:30:46):
Yeah, no, that's exactly the kind of example that I was thinking about.
Gene Kim (00:30:51):
Right. That's awesome. It just sets some context. When we last talked, I told you about a concept that I once learned about many decades ago, about information hiding kind of came to the fore and then suddenly seems important to the journey I'm on right now. So specifically, that was a story about how a mentor of mine, Dr. Steven Spear, back in the nineties was in Japan with his mentor, Dr. Kent Bowen, and they were visiting a Toyota plant. So they saw many amazing things, but one of the amazing things that they saw was how they were doing 16 line-side store changes every day and that the VP of manufacturing from a Big Three US auto manufacturer reacted with incredulity or even disbelief, saying, "That's crap. We tried six line-side stores in a day, and we ended up shutting the plant downm because all the parts were not where they were supposed to be, and we couldn't do final assembly. We couldn't ship cars for three days."
Gene Kim (00:31:48):
At the core of that story is the Kanban card, where each one of these line-side store changes boils down to an envelope with a to and a from and what you need, and you could change the to and the from seamlessly, easily, without telling other parts of the system. So there was no risk of causing chaos and disruption, which is exactly what that VP of manufacturing from the US plant was describing, where you miss something in the MRP system, you risk actually taking a whole plant down. A friend of mine said, "Oh, that's information hiding." To me, that was just such a startling insight, because I thought of information hiding really very much in the software domain. I never thought something like a Kanban card could be a manifestation of that.
Gene Kim (00:32:34):
So you then pointed me to the work of Dr. David Parness that was actually also unsettling to read, because I think what he is trying to convey is that information hiding is more difficult than it looks even for a small system. So can you describe why that is? Is there an easy answer for that? Maybe react to the statement, because I guess the reason why I found this so unsettling is that the job of any leader is to define the system-level goals, start identifying roles and responsibilities, and start decomposing the problem so that teams can work more independently or [inaudible 00:33:15] can be worked in smaller pieces. It seems like if we can't even do that right repeatedly, then we really have a big problem. What is your reaction to that?
Dr. Gail Murphy (00:33:24):
Well, I do think we have a big problem to be able to solve. David Parness in his 1972 paper talks about the quick system as a really small system to show the impact of information hiding, so showing how different architectures allowed different kinds of changes, and the one with information hiding allows the most kinds of changes independently. That's a super small example.
Gene Kim (00:33:49):
Dr. Gail Murphy (00:33:49):
With a super small example, you can keep the constraints. You can keep what you need to build almost in your brain as one item and start making those trade-offs and thinking through them. I think as we build-
Dr. Gail Murphy (00:34:03):
Start making those trade-offs and thinking through them. And I think as we build bigger and bigger systems, we lose that ability. We can no longer think about it as one thing. So we have no ability to ever think about just the system environment and start breaking it down because there's so many interactions there. So by definition, almost, we can never get to that place where we can hold it all in our brain and start to do the decomposition in a way that we remember all of the system constraints at once. And so we're going to have leakage in a sense in how we start subdividing it. At the same time that we're doing that, we know that we need to build on the shoulders of others to be able to build these complex systems. So we start to rely on libraries, we start to rely on components other people have made. And as we do that, we start to rely on these supply chains, which are these stacked Lego blocks to some extent. And at some point we don't even know what's in all of our Lego stack that we're building on.
Dr. Gail Murphy (00:35:02):
And in a way that is a triumph of information hiding, that we can rely on an interface, get a huge amount of functionality, and use that to help build our system. But we've also seen how it is also our downfall. Refer to the example a few years ago of the removal of left-pad from the node system when we saw things crumble. And people didn't have any idea that they were dependent upon something so far down the supply chain. So I think we have a way of fooling ourselves in software because it's so easy to stack, and so easy to start using it. We forget that the overall complexity of what we're building is so complex and we have to manage so many parts. We just haven't yet developed the engineering principles to be able to do that in a totally systematic way for big systems. We forget that the overall complexity of what we're building is so complex and we have to manage so many parts. We just haven't yet developed the engineering principles to be able to do that in a totally systematic way for big systems.
Gene Kim (00:36:06):
Okay. Gene here. I'm going to break in with some clarification and reactions. So number one, the famous left-pad incident occurred on March, 2016 when an 11 line library was unpublished breaking thousands of other projects. There are so many curious things about this incident, but for me, the biggest curiosity is that left-pad even exists at all. As I mentioned, it's 11 lines of code and it adds left spaces to a string. Something that can be done with the famous sprintf() in C. The evidence that has solved a very real problem that developers had is evidenced by the 2.5 million downloads that happened in the previous month. It was taken down because some lawyers claim that another one of the authors libraries infringed upon a registered brand. So in response to the unpublished, not only that module, but also the left-pad module, making it unavailable for everyone who used it.
Gene Kim (00:37:02):
Gene Kim (00:38:16):
Okay. Enough on left-pad, let's go to number two. There was an even more significant problem that happened earlier this year. In March, 2021, Bastien Nocera, maintainer of the Ruby library shared-mime-info informed Daniel Mendler, maintainer of mimemagic, which incorporates it, that he was shipping under an incompatible software license." So mime, which was originally created to extend the format of email messages beyond ASCII text characters, and also allowed the attachments of audio, video images and application programs. So if you remember when email started magically allowing attachments 20 to 30 years ago, it is because of mime. And mime is now used in not just email, but in the hypertext transfer protocol or HTTP for the worldwide web.
Gene Kim (00:39:04):
So mime is also used to negotiate what types of content can go between the web server and the web client. Specific mime types include text/plain, or image/jpeg, audio/mp3, and so forth. According to the amazing publication, The Register, "The shared-mime-info library is licensed under the GPLv2 license. And mimemagic was listed as an MIT license project." They quote Bastien Nocera, " Using a GPL file as a source makes your whole codebase a derived work, making it all GPL, in other words, it's a viral license. So I think it's pretty important that this problem gets corrected before somebody uses it in a pure MIT codebase or a closed-source application." So Mendler, the author of the mimemagic library thanked Nocera for letting him know, and shifts his library to GPLv2 and withdraws the prior version from Ruby Gems.
Gene Kim (00:39:58):
This then broke the wildly popular Ruby on Rails framework, which used the mimemagic dependency, just 1 of 172 other packages, which touches 544,000 other software repositories. The article continues, Paul Berg, an open-source licensing consultant is quoted as saying, "It does cause a major issue for Rails. Rails is widely used under the MIT license, which is a permissive license. Since so many applications are authored using Rails under the assumption that those applications are not copyleft under the GPL. It is likely that a great many of those apps would not be complying with the terms of the GPL since they were not deployed with those terms in mind." So to cut to the end of the story, it takes two days for the Rails team to switch out the, mimemagic library for the Marcel library, which is released under the more permissive Apache license. So I'll put a link to a bunch of resources and timelines for both the left-pad and the Ruby on Rails/mimemagic timeline as well.
Gene Kim (00:40:58):
Before we leave this topic, let me pause just for a moment and explain why this was such a problem. Open-source software comes in many forms, especially under licensing, quoting from the Wikipedia entry for GPL. "Some licenses are copyleft and viral such as the GPL or GNU General Public License, which says that not only is the software being licensed copyleft and must remain free, all software that uses it. And any derivative work must also be distributed under the same or equivalent form of license. This is in distinction to the permissive software licenses of which BSD and MIT license, which are widely used. Prominent free software programs licensed under the GPL includes the Linux kernel and the GNU Compiler Collection also known as GCC. So when you depend on a library that has a copyleft license, you must also change your license to something similar. And that's why it's called viral. And this has been found to be enforceable by law.
Gene Kim (00:41:56):
I'll put a link to a stack exchange article that gives a lot of evidence to this claim. "The result is that open-source licenses and the GPL have been recognized as effective and enforceable licensing tools in many jurisdictions around the world. However, it might not be possible to enforce these licenses in all individual cases." So this answers the question of why open-source maintainers and so many commercial enterprise care so much about open-source licensing. It may unknowingly obligate you to make your code open-source as well, whether you know it or not. And it's also why as part of the due diligence process in any software acquisition, a universal practice is to scan the organization source code to make sure they comply with any open-source licenses, which gets us to number three.
Gene Kim (00:42:40):
Dr. Murphy mentioned how automatic refactoring tools might have significant consequences to code that are not obvious, which reminded me of the semantic merge conflict. Martin Fowler writes about semantic conflicts in the context of automated text merging. He writes, "This is all very well, but it only solves the textual conflicts. and does not help with semantic conflicts. By a semantic conflict, I mean a situation where say Jez and I make changes, which can be safely merged on a textual level, but causes the program to behave differently." He writes, "Suppose I have a method that should be renamed to say, calculateBill. With modern refactoring tools, this is trivial. You just press Shift+F6, type the new name, and then the tool changes all the callers. The problem appears, however, if Jez adds more calls to this method on his feature branch. When the two gets merged, the textual merge will work fine, but the program will not run the same way." When Jez Humble explained this to me, the same Jez that Martin Fowler references in that article, my feeling was one of horror.
Gene Kim (00:43:45):
In fact, there is a scene in the Unicorn Project that tried to recapture that feeling of horror. One of the reviewers of the Unicorn Project wrote about that scene. "Holy cow, that happens so much in my life with frameworks that were based on XML files, which reordered themselves automatically all the time. Forcing us to ignore this file in our source code control systems all the time." Which is terrible because this is actually something that you want to track because the consequences of change is so high, which gets us to number four. At the beginning of the interview, Dr. Murphy talked about her work studying how developers characterized feelings of when they felt productive. I'm quoting from an interview she did with Dr. Mik Kersten, where she talked about the work of Dr. Andre Meyers. Actually this was for his PhD doctoral dissertation. It had the title, Fostering Software Developer Productivity through Awareness Increase and Goal-Setting.
Gene Kim (00:44:42):
Among the things that he did was a survey they sent out to developers that went out to 413 respondents. They asked developers about what activities led to feeling productive and unproductive and to quote Dr. Murphy, "The actual coding felt often very productive, but meetings start to show this tension between the productivity of the individual and the productivity of the team. Meetings came up as the second most productive activity overall, and the top most unproductive activity. So you can see this caused some challenges for individuals because they really don't want to go to meetings if they feel they're not productive. But on the other hand, if it's the right meeting with the right agenda, with clear goals, then they see it as something being very useful to them." She also talks about how in that dissertation, there are questions like how do you measure your own productivity?
Gene Kim (00:45:31):
Again, quoting Dr. Murphy, "Up at the top of the list were typically number of work items. The time I spend on them, the time I spend on code reviews, the time I spend writing code. Way down at the bottom of the list where the number of code elements I changed that was not seen as being very productive or a measure of productivity. The number of lines of code was the second lowest. The number of emails written was third lowest." I think that interview just does a wonderful job in describing what activities make developers productive is so elusive, which gets us to number five, open-source as a triumph of information, hiding and modularity.
Gene Kim (00:46:08):
I love this quote, "In its extreme, modularity promises that we can interchange underlying components. The word fungible comes to mind as defined by the ability to replace or be replaced by another identical item mutually interchangeable. The fact that the Ruby mimemagic library could be replaced by the Marcel library in a matter of days, as opposed to weeks or months, seems to suggest that the promise of interchangeability and modularity is being delivered. In other words, even though they didn't have an identical interface, they were close enough so that they could be swapped out."
Gene Kim (00:46:49):
I love how Dr. Murphy points out that open-source modules can be deeply interdependent, but still co-evolve independently. I think this is a remarkable characteristic and evidence of modularity within the open-source community. Okay. Later in this interview, I'm going to break in with a almost 30-minute extravaganza on the after mentioned Dr. Carliss Baldwin, who has been so influential to not only Dr. Gail Murphy, but also Dr. Mik Kersten, and Dr. Steven Spear. In the meantime, let's go back to the interview.
Gene Kim (00:47:23):
Among many of the startling things you said, the one that just leapt out at me is that the open-source ecosystem is a triumph of information hiding. Can you just say a little bit more about that? Because that seems very important. It says that we might not have an idea of what it looks like when we do it well.
Dr. Gail Murphy (00:47:39):
Well, we did some studies of open-source ecosystems a few years ago. And we looked at both the Ruby system and we looked at the Java ecosystem as they existed on GitHub for a fairly large number of projects. And what was interesting is if you start disentangling what is connected to other components, it becomes quite a big ball of wax. There's so many interdependencies that people have made between different projects. And that can only happen if information hiding is actually working to some extent, because an individual who starts a project on GitHub starts to make dependencies to other projects, and somehow they co-evolve. They are both making changes, they both continue along over time, relying upon each other. That seems like that is kind of a triumph of information that occurs.
Dr. Gail Murphy (00:48:34):
We also took a look at the communication patterns between those projects over time. And what was really interesting is that we would see both communication patterns that we expected. So a project A that depends on a project B. Not surprisingly project A might file a bug in project B as they find some functionality that they desire that doesn't exist in project B. What we didn't expect is that there were cases where project B, so the dependent upon project, would actually make requests or suggestions into project A, saying, "We see that you're using us. And by the way, we're about to change the interface. You might care about this change." So that was really interesting. You don't see a huge number of cases of that, but seeing any, actually really surprised us. Because it was a place where people realized that there was a dependency happening. And they felt strong enough to keep the using relationship that they were willing to help the other project continue on forward, keeping that dependency alive.
Gene Kim (00:49:39):
Yeah, this is so interesting. We never got to publish this, but hopefully this will be a focus. So Dr. Steven, Miguel and I, who I think you've met, we worked together on a project to analyze the Maven, Java open-source ecosystem. And we got an opportunity to work with some type, and we got to scan basically all the components out of Maven central. It was so cool. Over quarter million projects, tens of millions of versions of those components, maybe hundreds of millions. We got this glimpse into what you talked about. And one of the kinds of things that we want to explore further was the migration patterns within the dependency graph. There were some where they cared so much about their callers. They create very stable interfaces. They're careful not to ever have breaking changes. And when you take a look at the dependency migration behavior is kind of even distribution of, sometimes this n + 1, sometimes n + 10, but you know, you can see that the people can clearly migrate ahead.
Gene Kim (00:50:34):
And then there's things like maybe the Spring Framework where you have version four, then they get stuck. Only a certain number make it to five. And they can maybe migrate to five, and then a six or whatever. And so it's possible to get stuck because of those breaking changes. I think one of the sensibilities within the closure community is the absolute, not disdain, it is one of the core values is stable, API changes, no breaking changes. I think there's been only a handful of breaking change into closure programming language. And backwards compatibility is so highly valued. And so in those libraries, you see more of the behaviors of the first where you could just very easily, fearlessly migrate between versions without knowing that it's not going to break you. I suspect that those are some data points that validate certain behaviors that you saw in that research.
Dr. Gail Murphy (00:51:29):
Yeah, absolutely. And I think an area that still needs a lot of study is to understand, what does innovation look like in these different ecosystems? So if you allow more breaking changes, do you see more innovation within particular projects that they move forward instead of there being replications and changes? Or if you ensure that you can allow the stable interfaces, does that allow those innovative changes? Because I don't think that we have a very good handle on that. And so in some ecosystems, we see the cloning. Well, that was a great way to do that, but now we need to clone it. So what does that do to the upstream people, right? How do they then cope with these changes that occur? And we know that there's some big examples out there with back to the Hudson days and things like that. But we don't actually know very much about what that all looks like and how people can take up changes or not.
Gene Kim (00:52:22):
And sorry, when you said Hudson, was that the Hudson that turned to Jenkins?
Dr. Gail Murphy (00:52:27):
Gene Kim (00:52:27):
Oh, okay. Right. And by the way, just to get very concrete, I think that the guidance that Rich Hickey and the closure community, he's been very clear, when it comes to API, you can add to them, but you can't take anything away. Once you expose functionality, you've made a solemn promise that you won't change it. And so the notion of accretion versus breakage. I think that's the word that he used to sort of distinguish those two behaviors. And that's so interesting. Upon your instructions, I went and looked at that 1972 paper that was dazzling. The kind of toy example he gives was decompose a system where basically it takes words, transforms them, and sorts them, now puts them. And he describes how you can draw lines in two ways and each results in some trade-offs. And one of the things that you said was that, if you use an architecture that is amenable to solving that type of problem, you're better off. Can you sort of confirm?
Dr. Gail Murphy (00:53:26):
Yeah. So architectures are going to give you different trade-offs to different problems, right? So if you want to make certain kinds of changes, then a given architecture may allow you to make those kinds of changes fairly seamlessly. And what [inaudible 00:53:44] showed was that information hiding allowed you to make a certain number of kinds of changes in a very isolated way. And if you used a different kind of architecture, you might have other good characteristics, maybe better performance. But your ability to modify the system in the future to provide different kinds of functionality was going to require much more communication between different pieces of the system.
Gene Kim (00:54:06):
I thought that was a very sobering paper to read. And then I had mentioned to you, that you actually sent me down a couple of hours of rabbit hole reading as the other papers that's built upon that work. But one of the things that did occur to me afterwards is that, the specific problem that he chose for his 1972 paper reminded me of functional programming, we have sort of the map reduce filter pattern. In some ways, if we use kind of an architecture like that to solve this problem, it is supremely well-suited for solving the problem in a very easy way. As long as you don't change map filter reduce. Is that interpretation of how tools for certain problem domains have been created and make certain classes of problems very, very easy.
Dr. Gail Murphy (00:54:47):
I think so. I think engineering is about understanding the patterns that we can see, and using in traditional engineering scientific principles to then take us forward. We don't have quite as strong scientific principles as thermodynamics to guide us in software engineering. But we do have a growing number of cases where we might know that a visitor pattern allows us to make certain kinds of changes in the future, or maybe an architectural style of pipes and filters. We know that we're going to be able to change maybe various filters out much more easily. If we use that kind of architecture than if we take a blackboard style architecture to the same kind of problem. The hard part is really forecasting what you're going to need to change. And so in some systems maybe it's easy to say, I know I'm going to have to do X, Y and Z. And so therefore I'll choose an architecture like this, because I know that's the path I'm on.
Dr. Gail Murphy (00:55:46):
And then you can sit down and start with good design principles and probably be on a really good path. For other systems if you don't have that really good notion of what the system is going to have to do. It's going to be a lot harder. I'm just totally blown away by watching the Mars Rover and thinking about what's it like to build that software when you know, you're only going to make certain kinds of changes, you know that it's going to have to work in that environment. And it's just amazing to see what those teams have been able to build and produce. That allows them to operate something at a distance for so long and to even enable some changes as they go along that way.
Gene Kim (00:56:30):
While Kubernetes can streamline container delivery, it can be complex to deploy and operate itself. In fact, a top barrier to Kubernetes adoption in the enterprise is lack of experience and expertise. VMware Tanzu is a DevSecOps platform that provides centralized lifecycle and policy management for Kubernetes clusters across multiple teams and clouds. It addresses day one and day two operations burdens with a complete, easy to upgrade Kubernetes runtime with pre-integrated and validated components. This means you can run the same Kubernetes across the data center, public cloud, and Edge, as well as use it with your existing data center tools and workflows. The best part though, is that you can consistently manage policy and security of your entire Kubernetes estate from a centralized management plane just as any conforming cluster across hybrid and multicloud. Learn more at tanzu.vmware.com.
Gene Kim (00:57:25):
So if you can go back to that job of the leader, whether it's in software development or any context. I had made the claim that so much of the job of the leader is to design the system level goals. And then I think what comes off of most people's tongues are the notion of roles are responsibilities. We're going to define kind of these roles, the responsibilities that go along with them. But Dr. Steven Spear mentioned something that I found quite startling, which is, what's often not described are the relationships between them. In other words, the interfaces between those components. Do you find that in software development that, that often is overlooked or not scrutinized carefully enough, that leads to undesired outcomes?
Dr. Gail Murphy (00:58:11):
One of the interesting things about having taught software engineering is you would often have a desire to convey to your students certain principles that you would hope you would go out and find materials that would tell you how something was done in industry and done well. So that you could convey that to your students. And how to design a good API is one of those items that you don't actually find as much information out there as you would hope to. There are some really good references, but by and large, it's very difficult to sit down and explain to someone, here's what you would want to do to design a really good interface. You know that as you have mentioned in closure examples, maybe you want to have it be resilient to breaking changes in the future. So maybe you want to make sure you at least have a minimum number of pieces of information that are passed with the interface. So that whatever function you're building underneath that has enough information to do something interesting.
Dr. Gail Murphy (00:59:16):
On the other hand, you don't want to make it too broad or too wide of an interface because we know that that's probably going to make it hard for people to use. And it's going to make it hard to keep that module resilient and working for all of those different paths that might occur from all of that information. So it's a place where I don't actually think we have enough software engineering principles that we can teach people how to build good interfaces. We don't have really good metrics to be able to apply to a piece of code and say, is this a good interface? Is that a good interface? So when teaching software engineering in the past, I have done some exercises where we've given some functionality to students, broken them into teams and say, go and create an interface for this little piece of functionality. And then we have them vote on which one they would actually like to use.
Dr. Gail Murphy (01:00:03):
And it's really fascinating to then listen to the discussion. Like why would you vote to use this one, and not that one? And sometimes it's about the breadth of the interface. Because students will often create an interface that has many, many parameters, because they want that module to do many, many things. And other times they'll make it very, very narrow, but then it doesn't do enough. And so there's these sweet spots in design that only seem to come from experience of maybe having used bad interfaces and good interfaces, and starting to pick up some ideas. Obviously there's great books out there like Josh Bloch has written about the JAVA interfaces. But they still don't give you all those principles that you need when you're just sitting down to really feel confident that what you're designing is indeed the right kind of interface. And so how do we allow software development to happen in a way that allows you to propose what an interface might be, get some feedback and then be able to refine it?
Dr. Gail Murphy (01:01:01):
That's something that we're not very good at yet because those interfaces are kind of those solid impact places right between software. And once we've written them down, they're kind of written down, and it gets really hard to start to change them. So it's a place where we know we need the rigidity of the interface to get all the advantages of modularization. But in being rigid, it makes it very difficult to have a space where those modules can evolve over time and actually change with the software environment that it's working with. And so maybe that's why we see so many replications of software packages that have different functionality, and you kind of want half of one and half of another, but you can't do that. And so I think that's a place where there's still a lot of research work to do both in software engineering and programming languages to understand how we make some of that have enough rigidity, but not too much.
Gene Kim (01:01:59):
Yeah. Again, the reason I'm laughing is I feel seen, or maybe this sounds very weird, I feel caught red handed in doing something I shouldn't. In fact, there's this one library I forked, just the Google Pub/Sub library, and they were passing it unencrypted, and I just needed it in the clear. And so it was just easier for me to fork the repository. And I hope and pray that I didn't overload the interface that I didn't break it. I'm hoping that I merely added it like Rich Hickey would advise, but I'm not a hundred percent sure. So I know that's a problem, because I'm now on my own island of my own making. And it's going take an unnatural act for me to get back onto the mainland. Does it confirm, that's what you're referring to?
Dr. Gail Murphy (01:02:55):
Yeah, absolutely. I mean, if you think about where software development was even 20 years ago, maybe, and we didn't have so many things to build upon, so we had to recreate a lot of software from scratch that we might need in a particular environment. And that was kind of nice because we can control the whole playground. And when you control the playground, you can do really interesting things. And if you fast forward to now, we have so much more to build on. You can really do so much more software development by just wiring parts together really than you could 20 years ago. But it's really hard to project. What's it going to be like in 2040? Are we going to have forked all of these projects that there's just so much noise out there and so many components? That you still end up kind of recreating or forking because you just can't find the right one, even though you know it must be out there.
Dr. Gail Murphy (01:03:49):
And so I think we're heading into this really interesting place where our ability to creating things is maybe faster and easier than our ability to figure out how to make fundamental components really, really good, and really solid, and really robust. And just all keep using them and evolving them. So we do have a tendency to evolve projects to a certain point, and then we decide there's a better way to do it. And we got to toss that aside and keep going. But there still might be 10,000 projects depending on that whole thing. And so we like to talk about legacy systems of the deep, dark past, the COBOL systems of the past. It's hard to know what we're going to really be looking at 20 years from now. How many legacy components are still going to be hanging around and have to hang around? And how much of our overall work within the community and the ecosystem is really creating the same kind of functionality again, and again. Just a little bit better, a little bit different, does a little bit of a different thing.
Dr. Gail Murphy (01:04:51):
It would be really nice if we could harness our ability to evolve things more together, collaboratively. In a way that allows maybe less software overall to be written, but the software that's written does more things for more people.
Gene Kim (01:05:11):
As you're talking about this, what comes to mind is BLAS the basic linear algebra system. I think some parts are written still in Fortran, but it has evolved over what? 50, 60 years of which all the matrix multiplication libraries that are used in some of the most widely used libraries are still based upon. It seems like that's an example where everyone has rallied and invested their efforts into a common code base to make it the absolute fastest math library out there. Is that an example of kind of this paragon of kind of collective effort where everyone's focusing on to make one thing great versus a million different variations of all might be mediocre?
Dr. Gail Murphy (01:05:50):
I think that's a great example. And when we have software that ties really closely to concepts in a domain like BLAS sort of does very closely to matrix multiplication and various operations that are needed. We seem to be able to do a good job of that. When we're in domains that are maybe, I don't want to say broader, because BLAS is a hugely broad domain. But let's go back to an example of a plotting library. You would think that we would only have a small handful of plotting libraries these days. But we don't, we have a huge number. And it's always hard to choose which one you're going to use. So why is that? Why is it that we're not starting to have more fundamental packages in some areas? Maybe it's because we're very creative people in software, and we always want to create a different kind of flavor.
Dr. Gail Murphy (01:06:42):
But maybe it's places where there could be more common efforts to making one thing that's really, really good. And that's kind of fundamental and base, and allow the energy and effort that is needed to build on those, to build systems that are really meeting needs that our overall user communities have.
Gene Kim (01:07:02):
And do you think maybe one of the reasons why BLAS maybe has an unfair advantage in this characteristic is because the interfaces to mathematic are so fixed? And that there's only so many the set of mathematical operations you can do on a matrix are somewhat constrained. Versus graphing where how I want a certain scatter plot, or curve fitting, or circumplex diagrams, where the people's appetites are potentially infinite in variety?
Dr. Gail Murphy (01:07:33):
I think that's a big part of it. The domain is kind of very well-defined. So if you're sitting down to write a piece of functionality for BLAS, you're kind of working in this well-defined space. And maybe going back to thinking about information hiding, you can almost sort of understand what the constraints are. We know what the input should basically be. We know what the output should basically be. And as soon as we get into a graphing library, as you say, there's all these variations that people want, and we don't have principles to say 80% of that variation is going to be-
Dr. Gail Murphy (01:08:03):
... supposed to say 80% of that variation is going to be really helpful to 90% of the community, and 20% is only going to be worthwhile to 1% or some fractions like that. And so how do we get to the place where we can understand what are the major pieces of the interface that are going to serve many versus where there's a few specialized pieces of functionality that are needed? And we can do it in a way that are kind of almost bolt-ons to libraries, right? Like here's my core library. Now bolt on this piece. We have some examples of that for sure, that exist especially in the open source ecosystem. But it's not easy to just go find them and teach people that this is how they should build things.
Gene Kim (01:08:45):
Do you have any examples of some really great modular pieces where there is this buffet that you can pick and choose from, where they bring with it a sensibility of good modularity? Maybe visual studio code extensions, right? Anything with maybe a module system, right? Maybe that's where ...
Dr. Gail Murphy (01:09:04):
But I'm not sure that's a fair example. If there's already a modular system, it was built with that [crosstalk 01:09:10] modularity in mind. The question would be whether the module system has that characteristic.
Gene Kim (01:09:16):
If you think of any examples, send them my way.
Dr. Gail Murphy (01:09:19):
Gene Kim (01:09:19):
I'll add them. This is great. So we talked about some of the specific characteristics of architecture and what happens if you have good architecture and bad ones. Here's another very recent aha moment I've had. And I think for me, it's one of the more important aha moments in terms of really trying to understand why organizations work the way they do both in the ideal and not ideal. And for me, it comes down to the communication paths within an organization.
Gene Kim (01:09:52):
So I think there's two populations. One is an organization where if you look at the communication path, it's dominated by vast escalations up and down the org chart where you have to go up eight levels and down eight levels in order for two engineers to actually talk to each other and work on a problem together. So whenever you go up and down the org chart, that tends to be a slower mode of communication, has probably less accuracy of the information, less detail, and it's actually ... and also slower, right? So to get two managers to talk, it might take a week. To get two VPs to talk, it might take a month.
Gene Kim (01:10:28):
But then there's this other dynamic of communications that is what happens when you have two members within the same team talking, or maybe two teams able to talk to each other because those interfaces have already been identified. And so that nature of the person being able to tap someone on the shoulder and ask for help, or to be able to have two engineers on both sides of an interface be able to talk to each other and potentially renegotiate that interface. There there's a very rich information flow. So it's higher granularity, higher accuracy, as well as better frequency and better speed to use a control theory language.
Gene Kim (01:11:05):
Can you react to that, that architecture goes a long way in dictating, at least within a software development organization, what kind of communication flows there will be? One is just politely acknowledge it and let's move, ten is, oh no, that is actually one of the core attributes of what architecture enables.
Dr. Gail Murphy (01:11:24):
I think I'd probably react with like a three, because so much of what we've looked at in my research group has been more in open source, where you don't necessarily have a defined organizational structure. And so what dictates the ability to communicate quickly or appropriately is how clear it is that there is some dependency or some interaction that's needed. And so that happens really well if it's baked into code and you know that there's an owner of a piece of code you depended on, let's say, and you can do the tap on the shoulder that you refer to.
Dr. Gail Murphy (01:12:08):
Some of the work that we've done is to try in that very asynchronous, very distributed collaboration environment, to try to surface those dependencies that aren't so evident as they are in code. So one of the systems that one of my former PhD students created was called [foreign language 01:12:27], and [foreign language 01:12:29] means eyes wide open in the Wolof African language. And the idea with [foreign language 01:12:36] was to try to replicate a group memory for a software development that's asynchronous and distributed.
Dr. Gail Murphy (01:12:43):
If you're working within a building with other software developers on the same kind of problem, similar to what you were just describing, you'd often meet at a water cooler or a coffee machine, or you could do the tap on the shoulder or walk down the hall and you could have a conversation to say, "I don't understand why the system looks this way. Why does it?" But when you're working in these very distributed, asynchronous environments, kind of like what's happening through the COVID-19 pandemic for everybody, you start to lose that ability to know who to go to and how to reach them potentially.
Dr. Gail Murphy (01:13:18):
So what this [foreign language 01:13:20] system did is it tried to integrate into indices, inner links between all the kinds of information in a development. So what issues were related to what commits, what commits were related to what code versions, what code versions were talked about on certain development mailing lists, what developers seem to know each other through the mailing list. And then [foreign language 01:13:41] acted as a recommendation system. So you could, for instance, query the system with an exception trace that happened as you were running the code, and it would go through all these links and say, "Oh, this is tied to this issue. And this issue is tied to a previous solution that's similar in the code base. Why don't you look at this similar solution?"
Dr. Gail Murphy (01:14:02):
So it was kind of surfacing more than one level links that I think you're referring to in that organizational structure, and starting to get at the multi-link situations where you might not have a direct being on the same team or being part of the same organizational structural piece of the hierarchy, but how do you start to allow people to know who they should reach out to and where the information might exist?
Dr. Gail Murphy (01:14:33):
So that's why my reaction is kind of in between. It's sort of like, I think there's ways that we can use technology to surface some of those places where we should have to talk and where we can enable the tap on the shoulder without having to embed it in the organizational structures that are happening to build the actual software.
Gene Kim (01:14:53):
And just to confirm, you can certainly ... I'm just wanting to confirm that you've seen organizations that don't have that, where essentially middle managers have to route these kinds of "I need to talk to somebody about this module," and then they have to manually find a path to, "Oh, you need to talk to Alice or Bob in that team." That's kind of the non-automated case [crosstalk 01:15:16]-
Dr. Gail Murphy (01:15:15):
I think it's the non-automated case but I think often it can also happen that the question never gets asked, because you just assume at the bottom part of your chain that it's unanswerable.
Gene Kim (01:15:25):
Unanswerable [crosstalk 01:15:26]-
Dr. Gail Murphy (01:15:25):
And so you just make your own decisions and move forward.
Gene Kim (01:15:28):
Right. So I'm going to cope and I'll do whatever I can. I'll just work around it by working on my side of the interface and ... because I know it's hopeless to ever reach the other side. Oh that's even worse, Dr. Murphy.
Dr. Gail Murphy (01:15:40):
It is even worse, yeah. But we also know that software does tend to reflect the organizational structure that's used to create it. So from that point of view, I would answer your question with more the eight or the nine. That's often the case. What's interesting is that so much of the software we rely upon in the world doesn't come from those organizational structures that are defined a priori anymore. It's emergent structure.
Dr. Gail Murphy (01:16:05):
If you think about open source systems, certainly some are built by organizations, but there's also successful open source systems that are collections of individuals coming together to build software. And when they come together to build software, I don't think anybody's usually sitting down at the beginning and saying, "Let's draw the organizational chart and then let's develop the software." That organizational chart of who knows who emerges. And there's great visualizations out there through a number of systems that can visualize that git commit logs and who knows who based on how the software is structured. And they're fascinating to play forward as animations because you can see different collections of people coming together. And so that's an interesting aspect of open source.
Gene Kim (01:16:54):
Is [inaudible 01:16:55] one of those examples?
Dr. Gail Murphy (01:16:56):
Gene Kim (01:16:58):
And so I think, if my interpretation of things that Mick said, that the [Myelin 01:17:05] Project, when young not Dr. [inaudible 01:17:07] was working on Myelin, that was actually something that they did very well, was create an architecture to allow other contributors to be very, very productive so that specifically young Mick wasn't a bottleneck. Somehow, through hard work, he was able to create an architecture and the communication paths that he specifically removed himself as having to have all changes funnel through him. Is that an example of kind of what you're talking about?
Dr. Gail Murphy (01:17:31):
It is an example. And I think there was two really important aspects that Mick took in making Myelin successful in that way. One was to break down the system into different components that allowed people to get into a piece of the functionality and make meaningful changes because it wasn't all coupled together. But secondly, he did a huge amount of work to allow that ecosystem to understand the software and work with it, by answering questions on email lists and engaging people.
Dr. Gail Murphy (01:18:03):
And so it was both taking care of the architecture, but also taking care of the communication paths that were happening as a result of that architecture, the actual human to human part. So it was making sure that there was the technical capabilities for people to come in and build a piece, but also the social capability to understand how they could contribute that piece and that enabled them to become active contributors. So it was both the social and the technical, and making sure that those two things were in line, which we know from various other researchers' work like Jim Herbsleb at Carnegie Mellon and Marcelo Cataldo, who looked at socio-technical congruence, and the need to make sure that those two graphs of how people work together and how the software is connected, that there's good alignment between those two things.
Gene Kim (01:18:58):
What a coincidence. I happen to have that paper up right now. So techno-congruence. And what are the important lessons in that paper for you, that you think every leader should understand and have some good answers for?
Dr. Gail Murphy (01:19:15):
Well, I think part of what a good leader should think about is that you can have the best technical architecture in the world and if you have not enabled the individuals who are working with that architecture to communicate appropriately in line with that architecture, it's not going to succeed, that you really do have to have alignment between those two things to really be successful. It's not enough to just decompose the system and carve it off to people. You have to enable the right people to communicate, to make sure that everything can evolve quickly, and that you can meet the system complexity and constraints that you're trying to achieve.
Gene Kim (01:19:59):
[inaudible 01:19:59] feedback from the various constituents which comprise that architecture.
Dr. Gail Murphy (01:20:05):
Gene Kim (01:20:06):
I love that paper by the way. I was actually studying it before this call.
Gene Kim (01:20:11):
Okay, Gene here. A couple of clarifications and elaborations. One, I had asked Dr. Murphy about the famous 1972 paper by Dr. David Parnas called On the Criteria for Decomposing Software. This is an amazing paper and considered to be seminal on the thinking of architectures. What's so neat about this paper is that it explores ways to decompose a simple system. He called it the Kwic Index, K-W-I-C. The input of the system are simply lines of text. Any line can be circularly shifted by repeatedly removing the first word and adding it to the end of the line. And then the output lists all the circular shifts of all the lines in alphabetical order. And so he acknowledges that it is a toy project. But treating it as if it were a [inaudible 01:21:00] project, we still get some very surprising insights.
Gene Kim (01:21:03):
So I'm quoting from a fantastic analysis of this paper that Adrian Coiler did. I'll put a link to him and the post in the show notes. From the very first sentence of the abstract, you will find some shared goals with modern development, quote, "This paper discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time." Coiler writes, "Parnas sets out three expected benefits of modular programming. We can look at it through the lens of microservices too. One, development time should be shortened because the separate groups can work on each module or a microservice with little need for communication. Two, product flexibility should be improved. It was hoped that it would be possible to make quite drastic changes or improvements in one module or a microservice without changing others. And three, comprehensibility, it was hoped that the system could be studied one module or micro-service at a time with the result that the whole system could be better designed because it was better understood."
Gene Kim (01:22:04):
So Parnas decomposes the system in two ways. The first decomposition splits out the input module, the circular shifter, and alphabetizer, a sorter, and output module, which creates a nicely formatted output and a master control module which sequences the other four. Parnas writes, "This is a modularization in the sense meant by well-defined interfaces. Each one is small enough and simple enough to be thoroughly understood and well programmed. Experiments on a small scale indicate that this is approximately the decomposition which would be proposed by most programmers for the task specified."
Gene Kim (01:22:39):
And then he comes up with a second decomposition with one additional module. Parnas writes, "There are a number of design decisions which are questionable and likely to change under many circumstances. It is by looking at the changes such as these, that we can see the differences between the two modularizations." So now I'm quoting from Coiler's analysis. "In the first decomposition, many changes, for example, the decision to have all lines stored in memory, may require changes to every module. But with the second decomposition, many more potential changes are confined to a single module. Furthermore, in the first decomposition, the interfaces between modules are fairly complex formats and represent design decisions that can not be taken lightly."
Gene Kim (01:23:20):
Quote, "The development of those formats will be a major part of the module development and that part must be a joint effort amongst several development groups. In the second decomposition, the interfaces are simpler and more abstract leading to faster independent development of modules."
Gene Kim (01:23:37):
So Coiler knows that one of the key lessons is this from the paper. "We have tried to demonstrate by these example, that it is almost always incorrect to begin decomposition of a system into modules on the basis of a flow chart. Instead, we propose that one begins with a list of difficult design decisions, which are likely to change. Each module is then designed to hide such a decision from the others. Since in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing."
Gene Kim (01:24:06):
Now this is really a great analysis from Adrian Coiler. If you are interested in this, you will love Eric Norman's podcast episode, where he reads a significant chunk of this 1972 paper. It's 45 minutes and goes into even greater depth of sections of this famous paper. By the way, I love Eric Norman's Thoughts on Functional Programming podcast. Recently he's been reading the speeches of everyone who wins the ACM award, going all the way back to the late 1960s. I'd highly recommend it.
Gene Kim (01:24:36):
And by the way, in one of those spooky coincidences, while I was studying this paper, Dr. Alastair Coburn, one of the original agile signatories tweeted out this, "Dr. Parnas says a good programmer could do this quick library exercise in a week or two, so 40 to 80 hours. How long does this take you? Start the timer the instant you start doodling and sketching. Don't cheat. I would like to see the times please." Several people posted solutions stating that it had taken them a couple of hours. Dr. Coburn responded, "What's really interesting is that we do have a full order of magnitude in programming speed over a 40 to 50 year period, which is pretty nice." So evidence that these engineering patterns allow us to solve certain types of problems much easier and faster.
Gene Kim (01:25:25):
Which gets us to number two. One of the key insights of the Parnas paper is how we should decompose systems based on what we expect to change, focusing on what will change most frequently and design the system so that we can hide those changes from other modules. And this reminds me of Wardley mapping. So lots of things have been attributed to Wardley maps, but for me, one of the best uses of Wardley maps is to be deliberate about which parts of the system we are going to change for competitive advantage. In other words, these parts of the system should almost certainly remain proprietary and bespoke. We would not expect these parts to come from a vendor because these parts will be a core competency, something that we do not want to be held hostage to by a vendor. This is in contrast for those other parts of the system, which we can consider undifferentiated heavy lifting, which the customer does not value. And for these parts of the system, we should use the commoditized capability from a vendor.
Gene Kim (01:26:23):
Which gets us to number three. Dr. Murphy talks about how we don't want module interfaces to be so rigid that the architecture prevents us from getting what we need to get done done. This came up in the second Dr. Ron Westrum interview in the context of the Falcon Missile Program. Despite the Falcon Missile Program being 10 times larger than the Sidewinder Program, they ended up being canceled because they couldn't achieve the missile program goals. And one of the major reasons cited was that the architecture was defined too early in the process. So you had all these great engineers trapped in their modules or functionals unable to do collaborative problem solving with other groups. In this case, the architecture constrained the engineers, preventing the unleashing of their full creativity and problem solving potential.
Gene Kim (01:27:07):
In contrast, the Sidewinder Project had far more fluid boundaries, allowing them to experiment, prototype, and improve. And in the end led them to develop the most successful missile program in history. And so this now has a name, which is socio-technical congruence. One of the seminal papers in the space is a 2008 paper called Socio-Technical Congruence: A Framework for Assessing the Impact of Technical and Work Dependencies on Software Development Productivity by doctors Marcelo Cataldo, Dr. James Herbsleb, and Dr. Kathleen Carley from Bosch, Carnegie Mellon, and Carnegie Mellon, respectively.
Gene Kim (01:27:46):
I'll just read from the abstract. "The identification and management of work dependencies is a fundamental challenge in software development. This paper argues that modularization, the traditional technique intended to reduce interdependencies among components of a system, has serious limitations in the context of software development. We build upon the idea of congruence proposed in our prior work, to examine the relationship between the structure of technical and work dependencies and the impact of dependencies on software development productivity. Our empirical evaluation of the congruence framework showed that when developers' coordination patterns are congruent with their coordination needs, the resolution time of modification requests was significantly reduced. Furthermore, our analysis highlights the importance of identifying the right set of technical dependencies that drive the coordination requirements among software developers."
Gene Kim (01:28:31):
I'm just going to quote a line from the paper. This is section three on socio-technical congruence. "Product development endeavors involve two fundamental elements, a technical and a social component. The technical properties of the product to develop the processes, the tasks, and the technology employed in the development effort constitute the technical component. The second element consists of the organization and the individuals involved in the development process, their attitudes and behaviors. In other words, a product development project can be thought of as a socio-technical system where the two components, the technical and the social elements need to be aligned in order to have a successful project."
Gene Kim (01:29:12):
So anyone interested in how to get dev and ops to work better will probably resonate with their characterization of the problem. And I do love the language of how would we better address the socio part of the socio-technical system? Number four, Dr. Murphy talked about the ergonomics of APIs. This is a fascinating topic, which I discussed in my first and second episode at length with Mike Nygaard. And we also talked at length about information hiding.
Gene Kim (01:29:38):
Number five. I had mentioned BLAS, the Basic Linear Algebra Subprograms library. Well, actually, it's a set of libraries refined since 1979 to do extremely fast mathematical operations. If you've done any sort of AI or machine learning or matrix multiplication in libraries, you are likely using the BLAS libraries. I'm going to put a link to David Littman's talk that he gave at the [inaudible 01:30:03] conference in 2019, which educated me on BLAS and truly inspired awe and admiration. I'm just picking a couple of quotes here. "This library has been hand tuned for over decades to run fast. It targets almost every major CPU and GPU." Basically, if you want to do matrix multiplies fast, you have to use BLAS. I love that this is one of those domains where an entire industry has focused their entire collective efforts in pursuit of creating the fastest ways to do mathematics. And it's also a great example of modularity, because almost every programming language has been able to leverage it.
Gene Kim (01:30:40):
Which gets us number six, Dr. Gail Murphy talked about visualizing source code commits and activities, and confirmed that she was talking about the [Gorse 01:30:48] library. I will put a link to the show notes of some phenomenal Gorse visualizations, including a visualization of the scores of people making commits to the Python ecosystem, to the Python language repository from 1990 to 2012. It shows a dizzying and ever increasing rate of changes being made in the repo across 22 years.
Gene Kim (01:31:09):
And lastly, that gets us to number seven. Let's talk about the work of Dr. Carliss Baldwin, which has come up numerous times, not just in this interview, but in this podcast series. Dr. Carliss Baldwin, who's now at the Harvard Business School. So any discussion of modularity would be incomplete without talking about Dr. Baldwin's vast work. Over the last two years, I've been taking pages and pages of notes from her books and lectures, and to be honest, it's been quite intimidating on how to explain it all. It's such a big topic because her work encompasses so many fields, but I can't think of a better segue than what Dr. Murphy just said about open source being a triumph of information hiding and modularity.
Gene Kim (01:31:50):
So let's start by describing who Dr. Baldwin is. She is a very interesting figure. She has influenced tremendously, not just the work of Dr. Murphy, but two other people who I also deeply admire that anyone listening to his podcast will recognize. One is Dr. [McKirsten 01:32:08]. He heard her talk at Xerox Park in the mid 2000s, which made him want to pursue researching the intersection of software modularity and business models. In fact, so much of his work in value stream management is that intersection of modularity, software architecture, technology, business, and optionality. And astonishingly, her work influenced another person whose name you might recognize, Dr. Steven Spear. Her book, Design Rules, published in 2000, helped inform the language that Dr. Spear used in his 2009 Harvard Business Review paper Decoding the DNA of the Toyota Production System, as well as his doctoral dissertation.
Gene Kim (01:32:49):
So as an undergraduate, Dr. Carliss Baldwin studied finance under Dr. Robert Merton, who was on the faculty of the MIT Sloan School of Business. So Dr. Merton won the Nobel Prize in economics in 1997, along with Dr. Myron Scholes for their work in determining value of derivatives. So this is the model that provides a conceptual framework for valuing options, such as calls or puts, and is referred to as the Black-Scholes Merton Model, or sometimes just the Black-Scholes Model.
Gene Kim (01:33:20):
So I mentioned the Design Rules book published in 2000, which she co-authored with Dr. Kim Clark, who was at the time, the Dean of the Harvard Business school. I think the best way to describe the book is just to read the cover. Quote, "We live in a dynamic economic and commercial world, surrounded by objects of remarkable complexity and power. In many industries, changes in products and technologies have brought with them new kinds of firms and forms of organizations. We are discovering new ways of structuring work, of bringing buyers and sellers together, and of creating and using market information. Although our fast moving economy often seems to be outside of our influence or control, human beings create the things that create the market forces. Devices, software programs, production, processes, contracts, firms, and markets are all the fruit of purposeful action. They are designed."
Gene Kim (01:34:16):
Okay, that's some sort of fancy, highfalutin finance talk, but that let's go into the second paragraph, which is a lot more concrete. "Using the computer industry as an example, Dr. Carliss Baldwin and Dr. Kim Clark develop a powerful theory of design and industrial evolution. They argue that the industry has experienced previously unimaginable levels of innovation and growth because it embraced the concept of modularity, building complex products from smaller subsystems that can be designed independently, yet function together as a whole. Modularity freed designers to experiment with different approaches as long as they obeyed the established design rules. Drawing upon the literatures of industrial organization, real options, and computer architecture, the authors provide insight into the forces of change that drive today's economy."
Gene Kim (01:35:08):
So I'm going to try to force myself to percolate the top five insights that Dr. Carliss Baldwin has generated that's relevant to the conversations thus far. Number one, she describes two types of design spaces. The extremes might be the auto industry. She called those manageable designs versus the computer industry, and she called those unmanageable design. She asks, "What makes computer design so unmanageable? And that was the question that Dr. Kim Clark and I set out to answer in 1987." It took them 13 years to research this and put it into their book.
Gene Kim (01:35:47):
To get more insights, I listened to Dr. Baldwin's 2015 acceptance speech for the Distinguished Scholar of the Technology and Innovation Management Division of the Academy of Management. I'll open a link to that talk as well as a link to a bunch of notes I tweeted out in the show notes. She says that, "We are certain that the first large-scale modular system was the IBM 360 driven by customer demand. Moore's law allowed for frequent hardware upgrades, but cognitive limits slowed down the ability to do software rewrites."
Gene Kim (01:36:20):
And in a PowerPoint presentation, she cites Frederick Brooks and the mythical [man month 00:01:36:24]. "The IBM System/360 operating system were already at the limits of what you could design in software. So the IBM solution is to separate the hardware and software, and that structure informs strategy from 1964 to 1968." So the unintended consequence of the strategy of this design decision is that plug compatible peripherals start to show up in 1969 and its hundreds of competitors by 1980. She says, "Burrows, Honeywell, Unisys, all fled into niches where they could set up their own barriers to entry. Competitor peripherals emerged often staffed by ex IBM employees." So IBM sought to sue employees who left to compete against them. But she said the horse had already left the barn. Disc drives were at the forefront of these unbundled components but that was just the beginning.
Gene Kim (01:37:15):
She then shows a set of graphs that I also tweeted out. And I remember this from her Design Rules book. She shows the market cap of companies over the decades within the computer ecosystem. And she shows that the majority of it was dominated initially by IBM, but then shifted to these competitors to the point where the majority of revenue being generated by the IBM System/360 ecosystem were being generated by the peripheral vendors as opposed to IBM.
Gene Kim (01:37:45):
This is a theme that shows up over and over in business and finance. You may hear Dr. Steven Spear or Dr. Clay Christensen, who was his mentor, talk about how value is captured at the point of integration. The point here is that IBM initially owned all the value creation in the System/360 space, because it was the point of integration. However, by making things modular, by enabling hundreds of firms to make System/360 plug compatible components, the customer became the point of integration and disintermediated IBM and the value being created by the ecosystem went to one that was dominated by IBM to one that was then spread out to IBM and the hundreds of peripheral manufacturers.
Gene Kim (01:38:27):
In that lecture, Dr. Baldwin says, "And this was the end of a Chandlerian industry, which blew apart into its component industries and IBM was never the same" she further states that Andy Grove, CEO of Intel, in the 1990s took advantage of this continuing shift from vertical integration to horizontal shifts. And that trend resulted in other giants in the marketplace, not just Intel, but Microsoft and Cisco. Maybe an alternate model is the Apple ecosystem, where only Apple is the integrator. There is no alternative operating systems. There are very few peripherals like in the IBM PC. And this might explain why Apple can charge so much more for their products and there's very little ability to disintermediate Apple, at least for the current laptops, iPhones, iPads, and so forth.
Gene Kim (01:39:16):
So to wrap this section up, all of these incredible dynamics that we in the technology industry have seen in our careers, was made possible by modularity. In other words, the reason why the computer industry is so dynamic and so unlike the automotive industry, where the point of integration are the likes of Toyota, General Motors, Ford, Volkswagen, so forth, in the computer industry, we have these rapidly shifting boundaries where the point of value creation will shift wildly from decade to decade, creating fortunes and ruin for those organizations.
Gene Kim (01:39:51):
Okay, so we talked about, one, who is Dr. Carliss Baldwin. Two is the specific case study of the IBM System/360 and how modularity blew the entire industry apart. So let's get to number three, which is how Dr. Baldwin defines modularity. To answer that question, let's go back to her TIM Distinguished Scholar acceptance speech. She says that, "Design rules is how modularity is created by people," and that in chapters one through nine is the specific technique of how people create modularity. And chapter 10 is all about how modularity creates options, that it evolves through decentralized trial and error, and it can blow industries apart. She says, "Modularity is created after you understand the true dependencies between the component parts of the system." And she cites a book called Design Structure Matrices by Dr. Steve Eppinger, which she said was an aha moment because it concretely described what modularity is and allowed her to show, quote, "Her finance colleagues that modularity actually exists." She says that her aha moment was that the design structure matrices show the dependencies between different component parts of the system and that crossing the boundaries takes time and worse potential cycles.
Gene Kim (01:41:06):
So what is a design structure matrix? So in her presentations, you'll often see these grids. They look like large numerical matrices. In fact, I actually think these are adjacency matrix that actually are a representation of a dependency graph. So you have all of the nodes in the graph going across and down. And then on the diagonal, each one of those will be invalid because those other relationships form a specific node to itself.
Gene Kim (01:41:34):
And so there are some properties of a design structure matrix you can identify. So you can define system elements that have no inputs from any other node in the graph. You look for empty columns. You can find system elements that deliver no information to anyone else, and you do that by looking for empty rows. And so when Dr. Baldwin talks about these dangerous loops, you can do that by path searching. And so when she talks about the danger of loops, that seems to resonate with me. I think the notion-
Gene Kim (01:42:03):
... the danger of loops that seems to resonate with me. I think the notion is that if you have dep A depending on B, depending on step C, to say G, H, I, and I depends on A, people in the system may not even realize that they are in a loop. So the danger is if you are in a node in one of these loops, you may see a piece of work come across and then see it come across again but not even recognize that it's old work because the feedback cycle to traverse the loop is so long.
Gene Kim (01:42:31):
The other observation I will make is that the design structure matrix is binary so it really represents the presence of an edge between node A and node B. And as a thought experiment, I think in the worst case, every box is checked, so that means everyone has an input from everyone else and has an output to everyone else. It's hard to imagine being able to make good decisions quickly in that scenario.
Gene Kim (01:42:55):
The other extreme would be like an assembly line, so A connects only to B, B connects only to C, C to D, et cetera, and that would be clearly visible in the design structure matrix because it would be represented as a diagonal parallel to the identity line. So in these design structure matrices that Dr. Baldwin shows, you not only have the adjacencies in the matrix, overlaid onto the edges are these red squares showing the module boundaries. So I suppose in the worst case, you have every node connected to every other node and there are no squares. So everything is connected to everything else.
Gene Kim (01:43:32):
The far better case is that you will have a fewer node dependencies and more of these red squares representing the hard boundaries between modules. So I'm sure more is not always better. [inaudible 01:43:47], you could have every node be its own module, and that's not so good. You want few enough modules enable the system to be actually understood, enough modules so that they can actually work independently of other modules.
Gene Kim (01:44:02):
She says creating modularity requires awareness, experience and learning to know where the dependencies even are. It's driven by the desire to not relive those horrendous conditions where decisions require resolving dependencies. When we create modules, we create these binding rules so that designers won't have to talk to each other outside of a certain scope and no longer have to resolve the dependencies because the architect has eliminated that need. And this doesn't happen by accident. The outcome are that modules are tightly dependent within the module, but independent from other modules. So that means activity within the module can be hidden and honored principle called information hiding in computer science.
Gene Kim (01:44:45):
She goes on to say, "And because modules are never perfect we may need system integrators at the end often to deal with these hidden dependencies that were not known at the time." She says, "Modularity reduces cognitive complexity." This is something that comes up so much when you talk about Team Topologies, that fantastic book by Matthew Skelton and Manuel Pais. So modularity reduces cognitive complexity, allows for work to happen parallel on modules and enables systems that are tolerant of uncertainty. And when you tell an economist that something is tolerant to uncertainty, they will hear the word options. And in a loosely coupled architecture, there are tons of options, and that optionality can be analyzed using financial tools. Modularity enables you to perform many experiments that is option rich and is good for value creation.
Gene Kim (01:45:34):
She shows another graph where the X axis is the number of modules and the Y axis is the number of experiments you can perform. And she makes the claim that you can create 25 times more value, 20 times more value that customers will actually pay for and she says that will pay for a lot of architects and experimenters. This kind of value proposition is unstoppable. When you create 25 times more value, the entire economy will rearrange itself in all kinds of different ways around you. Old companies will disappear. New ones will appear funded by rapacious venture capitalists.
Gene Kim (01:46:07):
So then she steps back and says when she was working with Dr. Robert Merton, she said she would consider herself lucky if she could find 1.75X value. She said that she found evidence that modularity and the optionality enables created 25 times more value. And this is what she told Dr. Kim Clark in 1992, "This is what we have to explore because it's an unstoppable economic force." And she suspected that they would see radical rearrangement of economic relationships as a result of the value it created. These options aren't controlled by a central party. They will evolve by decentralized trial and error. Lots of modules and experiments don't need to be centrally coordinated, which was an anathema to the Harvard Business School in the early 1990s.
Gene Kim (01:46:51):
So to summarize number three, Dr. Baldwin gave us some very concrete characteristics of what constitutes a module, which includes the ability for teams inside of a module to have high bandwidth communications and collaboration, the ability for people outside of a module to be in-curious about what is happening inside the module and that allows for lots of experiments. And, I'm interjecting here, it does it because it lowers the cost of change and it enables optionality, and that the value created by optionality is enough to blow industries apart.
Gene Kim (01:47:26):
So let's get to the fourth insight, which is around modularity and open source software, which she has also studied. So in this talk, she talks about how bad architecture, these [Gergy 01:47:36] modules are very common in software, and that she has data showing that open source communities are better at maintaining modular architectures than for-profit companies. She cites some papers, one which includes Dr. Allen McCormick, and she says, "Why? Because our methods of organization enforce modularity. In the commercial context, you have to be like Jeff Bezos, who, in 2002, abolished back doors between teams or they would be fired." So this is the famous Jeff Bezos memos where he said that the only ways that teams could communicate with each other are through versioned APIs, which has been a topic many times in this podcast. So she's making a very direct linkage between the architecture that was created by Jeff Bezos in 2002 and modularity. And she says, "So therefore you have less technical debt because it's more module, and technical debt can kill companies." That's what happened to Research in Motion/Blackberry.
Gene Kim (01:48:32):
And then she shows this astonishing diagram of three systems, the kernel of Linux, the kernel of the Darwin Operating System at Apple, which is the foundation of MAC OS and iOS, and then OpenSolaris at Sun, now Oracle. So these are three design structure matrices. And when you look at the size of the squares and she observes that the squares are smaller and more numerous in the Linux core, some score of 7%, Apple Darwin is next best at 16% and the one with the worst score is OpenSolaris with 25%. She says, "If you want to change something in OpenSolaris, you can easily walk into a portion of code that's cyclic, that has interdependent set of files and you may not be able to leave." Supposedly few extremely experienced people dare to make changes there. So in other words, it's so dangerous to make changes there that only very few people are qualified to or brave enough to make changes there.
Gene Kim (01:49:34):
She shows another graph where she makes the claim that open source software has a smaller core than proprietary systems. She states, "The correlation is very strong. Open source software has better modularity."
Gene Kim (01:49:46):
Before we leave this section, I'll bring up one last point that she makes at the end of the lecture. She said, "I previously thought that modularity is how a system is split up in complementarity," she refers to the Milgrom and Roberts type, whatever that is, "is how components interact to create value." She says there are two separate dimensions of value and they often get conflated. Modularity answers the question of whether they can be broken apart and complementarity is whether you need all those pieces to have a valuable artifact. She gives an example, "A mug has three components, a handle, top and bottom, and it has a complementarity value of 1.0." In other words, you need all the pieces. An iPhone tear down shows hundreds of modules, but you need the whole thing to be valuable. So M equals hundreds and complimentary is 1.0. However, a living room also have hundreds of components, but you can swap out the furniture. So M equals hundreds, but the complementarity is nearly zero.
Gene Kim (01:50:44):
She talks about platforms in a different context, in a similar but not exactly the same way that we talk about platforms and technology as part of this phenomenon, that we can mix and match what we want. She says, "In the space of technology, we're converging into one large interoperable system. We expect all our digital artifacts to work together. They're modular all the way up and down. And often they're made up of platforms or in many cases where you can see something built entirely on top of something else." Okay, I feel really great about this. I think I managed to summarize at least some of the key insights that I got out of at least some portions of Dr. Baldwin's vast work, which was so influential to people like Dr. Murphy, Dr. [McKirsten 01:51:27] and Dr. Steven Spear. So that was four, let me just add a fifth for the sake of completeness. I want to define real options. According to Investopedia, "A real option gives a firm's management the right, but not the obligation to undertake certain business opportunities or investments. Real options can include the decision to expand, defer or wait, or to abandon the project entirely. Real options refer to projects involving tangible assets versus financial instruments. Real options have economic value, which financial analysts and corporate managers use to inform the decisions. Using real options, value analysis managers can estimate the opportunity cost of continuing or abandoning a project and make better decisions accordingly. And is important to note that real options do not refer to a financial derivative instrument, such as a call or put options contract, which give holders the right to buy or sell an underlying asset. Instead, real options are opportunities that a business may or may not take advantage of."
Gene Kim (01:52:29):
So in the previous episode with Dr. Steven Spear on supply chains, we talked about the three primary finance theories net present value of money. In other words, it is almost always better to be paid now than later. Option theory, that says always better to defer or delay a decision until we have more information. And third is portfolio diversification. If we don't have more information to make a better decision, we can always diversify our risks, making sure that all our eggs are not in one basket. So a real option fits somewhere in that second where you have the right but not the obligation to undertake certain business opportunities or investments. It includes the decision to expand, as well as defer or wait or abandon a project entirely.
Gene Kim (01:53:17):
All right, before we go back to the interview and conclude this massive Dr. Baldwin extravaganza, I'll mention one more thing that came up in a conversation with Dr. McKirsten and Dr. Steven Spear. We were talking about the $1 billion aforementioned rearchitecture at Amazon. Dr. Baldwin referred to the 2001 memo where he declared that teams can only interact with each others through APIs. Both Dr. Kirsten and Dr. Spear agree that that $1 billion rearchitecture project was essentially buying a $1 billion basket of call options. So Amazon had gotten into a situation where even easy things became impossible because everything was so entangled with each other. What that $1 billion rearchitecture represented was buying a basket of call options so they could say yes to all these projects like say Amazon Music, Amazon Prime, recommendations projects, so that they could actually be completed. It bought them the right to undertake those business opportunities or investments. I'll be honest, I don't entirely understand this, but it's something that I definitely seek to understand better. As your Kubernetes footprint grows to multiple data centers and clouds, how do you effectively run your apps at a global scale? VMware Tanzu is a DevSecOps platform that enables you to secure, connect and observe distributed applications across your multi-cloud enterprise. With Tanzu Service Mesh, get consistent operations and security across the full application transaction, end users to services to data, on any platform or cloud. Meet your service level objectives with automated fail overs and scaling, and simplify and life cycle management of all service meshes alongside your clusters. Then get full visibility into the health and performance of workloads and clusters across clouds with Tanzu Observability. That's enterprise-grade observability and analytics at massive scale with granular controls. What's more, you can roll out monitoring as a service to all your DevOps teams, including developers and SREs across the enterprise. Learn more at tanzu.vmware.com.
Gene Kim (01:55:29):
So yet another startling thing that I've heard you say is that software development is a subset of a larger category of knowledge work. And when you said that, I guess it seems sort of obvious, but then you said that what they both have in common is, one, is activity primarily of making decisions. I find that to be genuinely one of the most startling things I've heard in years. So can you talk about what does it mean when you say this work is primarily about making decisions? What does that mean to you?
Dr. Gail Murphy (01:56:09):
Well, when I think about software development, if you even take just the act of coding, which is one small piece of software development, even if you have that interface defined, and we've been talking a lot about interfaces, and you start to think about how you're going to actually write the code that provides the functionality promised by that interface and do it in a robust way, likely a way that's probably secure and a whole bunch of other non-functional requirements, you're making decision after decision after decision. You're making decisions about exactly the variables that you're using to express the code so that other people can understand it later, you're making decisions about how you express that code through various constructs and loops, you might be relying on other people's components. So every decision is having an impact on your output.
Dr. Gail Murphy (01:57:08):
What's always fascinated me is how can we make development much more of a what-if scenario situation? So if I make this decision about a variable name, is that going to impact me in the future? Maybe the variable name isn't going to be so bad, but maybe the way that you're starting to create a different looping structure, deciding on an algorithm to use, that's actually going to have ramifications down the line. You're probably not going to be able to predict what those ramifications are, and that wouldn't be so bad if you could undo things, it's just it's so hard to undo things. That every decision can feel kind of weighty if you really think about it. So you can't allow yourself to think about it. You just make progress and start to make decision upon decision upon decision. But at the end, the sum of those decisions puts you in some concrete of what that piece of software is going to look like.
Dr. Gail Murphy (01:58:08):
And to go back to Carliss Baldwin's work, what kind of option value does it have in the future? Now, when we think about that at the level of writing a function, maybe it's not so bad. But then you start to think about, well, what about at the level of a module? Okay, now there's a lot more choices. What about the level of modules that interact through an exception-handling mechanism? Now there's even more decisions. So it's compounding decisions time and time again in a place where we often don't provide people with enough information to make a good decision. We might not know what are other decisions that are happening that might impact this one because we don't have an ability to understand all the information that's being bombarded at us as a software developer within a big system.
Dr. Gail Murphy (01:58:56):
I think this is very much similar to what happens in other kinds of knowledge work where you're passing different kinds of information between people in an organization, maybe between sales and customer success, but it's still information-passing that's happening and decisions that are being made.
Dr. Gail Murphy (01:59:14):
So if you look at software development, it's always been a great environment to study because you have very, very structural information that you can analyze, can do static analysis on code, you can execute the code and learn dynamic information. So you have all of this very rich information environment that you can use to help inform decision-making.
Dr. Gail Murphy (01:59:37):
If we think about the sales versus customer success example, you can imagine coming to that problem area and also trying to help with various recommendation tools of information to look at, but now the information is much more unstructured. It's natural language. There's more ambiguity of meaning. So what can we learn from the software development case where we have very, very structural information also tied to unstructured, natural language documentation, issues, comments that we can play with, but it's still very contained and domain specific and we can get a handle on it? How can we learn from that environment to then go to these more far off knowledge working environments to use the same kinds of techniques to see if we can also improve work in those areas?
Dr. Gail Murphy (02:00:29):
So that's where I see some of the parallels is some of the techniques that we develop in the much more safe world of software might actually apply in other areas of knowledge work. But we have a long way to go to understand software before we can even start to promote some of those areas to these much more unstructured situations.
Gene Kim (02:00:49):
And by safe, I'm presuming you actually mean safe in that there's a structure around it as opposed to the non-functional safety critical-
Dr. Gail Murphy (02:00:58):
Yeah. No, I totally mean the fact that when we have words that have meanings, for instance, in the software [inaudible 02:01:05], we can figure out what they mean. When you're working with knowledge workers, it's a lot harder to figure even out what the words that they're using mean.
Gene Kim (02:01:12):
You had also mentioned that often we're making decisions with not enough information. The notion of optionality is that whenever you can defer a decision because you don't have enough information, that's usually better, but often we're forced to make decisions with the information we have and sometimes it can be some very critical information. You brought up an example of we have this ideal of being able to work within our component, completely isolated from the rest of the world. But I think you gave an example of a flight control software where it might be very helpful to know that you are in a flight control system and that then therefore you inherit some non-functional requirements that may not be explicitly stated, but it is very important for you to know that that. Am I recalling that correctly?
Dr. Gail Murphy (02:01:57):
Gene Kim (02:01:58):
We're talking so much about the work of Dr. Carliss Baldwin, one of the papers or things that she's talked about in the past that she studied the modularity in three operating systems. I think one was Darwin at Apple, the other one was SunOS at Sun and the other one was Linux, and she made this claim that open source systems are better at making the systems modular. To me, that was actually somewhat surprising and yet that seems to be consistent with some of these themes that you've been talking about. There's some emergent properties about the way open source systems are created that lead to better architectures. Is the two related? Does one cause the other?
Dr. Gail Murphy (02:02:35):
Well, I think we can say one does not cause of the other because we know that there's open source systems that have very bad architectures. So I think we can do proof by example.
Gene Kim (02:02:45):
What is it about the open source software systems that do lead to these emergent properties that you've talked about?
Dr. Gail Murphy (02:02:53):
Well, I think in open source software there's survival of the fittest going on. It's Darwinian evolution in action because the systems that we look to for having good architectures are ones that have evolved and they've evolved over time because people want to continue contributing because they must see value in doing so and there's usually users of that software as a result. That's what motivates people to keep contributing that allow those to almost become the species that prosper and continue forward. We know there's lots of open source systems that do not survive, perhaps because they don't have these characteristics. We don't have a huge number of studies yet that really are able to pinpoint what is going to be a successful system and what is not going to be successful system. We know that there's rules of onboarding that can help systems become successful and can incorporate contributors more easily and so maybe it will be more long lived. Probably under there is a good architecture too that allows people to contribute. So I think it really is a Darwinian system where the examples we promote are the ones that have survived.
Gene Kim (02:04:09):
By the way, that resonates deeply with me because as we just talked about, the cost of cloning a repo is zero, I can do it with a button click or two, and so that means there's actually very little cost in forking, replicating. So I think that increases the Darwinistic pressures. It enables wide variation.
Dr. Gail Murphy (02:04:31):
Gene Kim (02:04:32):
Very interesting. By the way, just because I think you might be interested, one of the finds that we did put out in this state as a software supply chain that I did with Dr. Stephen Magill and Sonatype was, we were specifically looking at one measure effectiveness of how quickly could they remediate security vulnerabilities, and there was essentially four findings. One was the meantime to remediate a security vulnerability is very much correlated by the release frequency. So in other words, if you don't release very frequently, and that's very difficult, it's probably going to be very difficult to specifically put out a security patch.
Gene Kim (02:05:08):
The second one was that the number of dependencies an open source project had... We thought that if you have more dependencies, it would be more difficult to change and patch, and it turns out that was not true. The more dependencies a project had, the lower their meantime to remediate security vulnerabilities. What we found was that in general, there were actually more engineers committing on a monthly basis. And so I don't know which way the causality goes, is it more developers mean more dependencies or is it more dependencies mean more developers? But it was an intriguing finding.
Gene Kim (02:05:45):
The third one was that if you looked at the popularity of component, how popular it was as measured by how many Maven Central downloads there were, or how many stars or [inaudible 02:05:55] there were on GitHub, absolutely no correlation with how good they are remediating security vulnerabilities. So kind of problematic because that's typically how I go shopping for components-
Dr. Gail Murphy (02:06:06):
Well, exactly. I mean, I love that state of the supply chain report because it gives you so many interesting insights into the ecosystem of open source and how things evolve. I think it's a super useful report.
Gene Kim (02:06:18):
It was so much fun to do. We want to focus now on that notion of what are signals that a open source project might broadcast out that says, "Hey, look, I am a great component of to use because I'm not going to break you." What are the signals?
Gene Kim (02:06:34):
I have so much respect for your body of work and how much you've contributed to understanding what factors need to be present for developers to be truly creative and productive. Could you just paint a vision of how in your ideal the daily work of a software developer might look like? What would that look like? If you could maybe wave a magic wand, what reality would you spring into being around that developer?
Dr. Gail Murphy (02:07:00):
Yeah, if I focused just on sort of, again, the coding activity, the sort of much more detailed design and coding. I think my ideal would be to enable developers to be much more fluid with the software that they're developing in terms of recording the decisions that they're making in a very unintrusive way so that as the software is being written we would actually understand why they had chosen to express the [inaudible 02:07:31] in a particular way. And with that kind of decision tree that would be created in the background, that they would be able to easily unwind back to parts of the tree and say, "Actually, I want to go back and change this decision higher up in my decision tree, but then replay forward everything you can when I make that change in the decision so that I'm back where I was without having to go through and actually recreate that entire branch." So that they could be much more introspective about the software that they're writing.
Dr. Gail Murphy (02:08:06):
And then if we had this record of the design decisions, it would become a resource for others who need to come and look at that software in the future to make changes to it, to understand why it is, and to maybe understand even automatically be able to reason about what the software could and couldn't do. So I think if we could get to that point where we could make that kind of tree in the background and then allow manipulation, that would be really exciting.
Gene Kim (02:08:34):
Do you have any active research projects exploring that domain?
Dr. Gail Murphy (02:08:39):
Well, we're trying to do work right now in understanding where the design decisions have been implicitly talked about and so I see that as a piece of that vision. If we could start to understand that for a particular component, performance was really important because when people were discussing in a code review about the component, that is what they were talking about and that somehow we're able to automatically extract what kind of performance was important. That would be a way to then, in the future, surface that information that might be useful to the next person that tries to make a change to the component.
Dr. Gail Murphy (02:09:17):
We've gotten to the point where we can automatically identify using a machine learning technique where design is being talked about and get pool requests, so we've got one step towards that and we're continuing to try to make some steps towards being able to actually extract what that information is and put it in a way that is then able to be manipulated in an automatic way.
Gene Kim (02:09:42):
That sounds awesome. Again, I'm sort of laughing just because I feel like I've been caught red-handed and that feeling of like, "Oh God, what have I done?" I wish I could go back in time and not do what I just did.
Gene Kim (02:09:54):
So how can people reach you and what do you want people to reach you out about? In other words, what sort of help might you be looking for?
Dr. Gail Murphy (02:10:02):
Great. You can reach me at [email protected], is a great email address to reach me at. We're super interested right now in the differences between how developers individually perceive their productivity and how teams perceive their productivity, so love for people to get in touch if they're interested in that topic. I think there's places where we can make overall organizational productivity better by understanding those differences where a developer may choose to interact with a teammate because it helps the team and where they might not is one example. And anybody who is interested in these design kinds of issues, happy to get in touch and see if there's a way to collaborate.
Gene Kim (02:10:43):
Wonderful. Dr. Murphy, thank you so much for sharing your insights. This has been such a fun interview and I have learned so much, so thank you again.
Dr. Gail Murphy (02:10:53):
Well, thanks a lot, Gene. It's a lot of fun always to talk to you.
Gene Kim (02:11:01):
Thank you so much for listening. This was such a rewarding episode to put together for so many reasons. Please join me next time when I will be speaking with Scott Havens, who gave one of my favorite talks at DevOps Enterprise, which was in 2019, on the work that he did as Director of Software Engineering at Walmart Labs, where he was responsible for rebuilding the inventory management systems that powered Walmart using functional programming principles, supporting over a half trillion dollars of annual revenue and 2. 3 million employees worldwide.
Gene Kim (02:11:32):
The Idealcast is produced by IT Revolution where our goal is to help technology leaders succeed and their organizations win through books, events, podcasts and research. This episode is made possible with the support from VMware Tanzu. For your apps, simplify your ops. Head to tanzu.vmware.com to learn more.