Skip to content

November 30, 2021

How (and Why) to Design With Conway’s Law in Mind

By Gene Kim ,Jez Humble ,John Willis ,Patrick Debois

Over the past 2 blogs in this series, we have discussed the necessary steps to start your DevOps transformation.

We covered the three key components to consider in choosing a starting place in this post: Selecting Which Value Stream to Start With

We covered how value is delivered to the customer and how to improve flow in this post: Understand the Work in Our Value Stream and Improving Flow

This week, based on the newly updated and expanded second edition of The DevOps Handbook, we are learning how and why to design with Conway’s law in mind.

Our goals in this post will be:

  • Understanding Conway’s Law and its impact on the performance of our value stream
  • Evaluating our organizational archetypes
  • Developing the habits and capabilities in people and the workforce as a means of facilitating these structures

Understanding Conway’s Law and Its Impact on the Performance of Our Value Stream

Conway’s Law has a tremendous impact on the performance of our value stream.

To illustrate this, let me share a story.

In 1968, Dr. Conway was performing a famous experiment.

Together, with a contract research organization of eight people, they were commissioned to produce a COBOL and an ALGOL compiler. During the experiment, he observed, “After some initial estimates of difficulty and time, five people were assigned to the COBOL job and three to the ALGOL job. The resulting COBOL compiler ran in five phases, the ALGOL compiler ran in three.”

These observations led to what is now known as Conway’s Law, which states:

“Organizations which design systems…are constrained to produce designs which are copies of the communication structures of these organizations… The larger an organization is, the less flexibility it has and the more pronounced the phenomenon.”

In other words, how we organize our teams has a powerful effect on the software we produce, as well as our resulting architectural and production outcomes.

In order to get fast flow of work from Development into Operations, with high quality and great customer outcomes, we must organize our teams so that Conway’s Law works to our advantage.

We begin this process by evaluating the organizational archetypes.

Evaluating Our Organizational Archetypes

In the field of decision sciences, there are three primary types of organizational structures that inform how we design our DevOps value streams with Conway’s Law in mind: functional, matrix, and market.

They are defined by Dr. Roberto Fernandez as follows:

Functional-oriented organizations

Optimize for expertise, division of labor, or reducing cost.

These organizations centralize expertise, which helps enable career growth and skill development, and often have tall hierarchical organizational structures. This has been the prevailing method of organization for Operations, (i.e., server admins, network admins, database admins, and so forth are all organized into separate groups).

Market-oriented organizations

Optimize for responding quickly to customer needs.

These organizations tend to be flat, composed of multiple, cross-functional disciplines (e.g., marketing, engineering, etc.), which often lead to potential redundancies across the organization. This is how many prominent organizations adopting DevOps operate—in  extreme examples, such as at Amazon or Netflix, each service team is simultaneously responsible for feature delivery and service support. 

Matrix-oriented organizations

Attempt to combine functional and market orientation.

However, as many who work in or manage matrix organizations observe, matrix organizations often result in complicated organizational structures, such as individual contributors reporting to two managers or more, and sometimes achieving neither of the goals of functional or market orientation.

The Issues with Functional Orientation

In traditional IT Operations organizations, we often use functional orientation to organize our teams by their specialties.

However, there are several problems that can occur by overly function orientation (“Optimizing for Cost).

For example, when we put the database administrators in one group, the network administrators in another, the dedicated server administrators in a third, and so forth—one of the most visible consequences is long lead times. Especially for complex activities like large deployments where we must open up tickets with multiple groups and coordinate work handoffs, resulting in our work waiting in long queues at every step.

In addition to these long queues and long lead times, this situation results in poor handoffs, large amounts of re-work, quality issues, bottlenecks, and delays.

This gridlock impedes the achievement of important organizational goals, which often far outweigh the desire to reduce costs.

Similarly, functional orientation can also be found with centralized QA and Infosec functions, which may have worked fine (or at least, well enough) when performing less frequent software releases.

However, as we increase the number of Development teams and their deployment and release frequencies, most functionally-oriented organizations will have difficulty keeping up and delivering satisfactory outcomes, especially when their work is being performed manually.

Enabling Market Orientation

Therefore, broadly speaking, to achieve DevOps outcomes we need to reduce the effects of functional orientation (“optimizing for cost”) and enable market orientation (“optimizing for speed”).

This means having many small teams working safely and independently, quickly delivering value to the customer.

Taken to the extreme, market-oriented teams are responsible not only for feature development, but also for testing, securing, deploying, and supporting their service in production, from idea conception to retirement.

These teams are designed to be cross-functional and independent—able to design and run user experiments, build and deliver new features, deploy and run their service in production, and fix any defects without manual dependencies on other teams, thus enabling them to move faster.

This model has been adopted by Amazon and Netflix and is touted by Amazon as one of the primary reasons behind their ability to move fast even as they grow.

To achieve market orientation, we won’t do a large, top-down reorganization, which often creates large amounts of disruption, fear, and paralysis. Instead, we will embed the functional engineers and skills (e.g., Ops, QA, Infosec) into each service team, or provide their capabilities to teams through automated self-service platforms that provide production-like environments, initiate automated tests, or perform deployments.

This enables each service team to independently deliver value to the customer without having to open tickets with other groups, such as IT Operations, QA, or Infosec.

However, having just recommended market-orientated teams, it is worth pointing out that it is possible to create effective, high-velocity organizations with functional orientation.

A Hybrid Orientation

Cross-functional and market-oriented teams are one way to achieve fast flow and reliability, but they are not the only path. We can also achieve our desired DevOps outcomes through functional orientation, as long as everyone in the value stream views customer and organizational outcomes as a shared goal, regardless  of  where  they reside in the organization.

In fact, many of the most admired DevOps organizations retain functional orientation of Operations, including Etsy, Google, and GitHub.

What these organizations have in common is a high-trust culture that enables all departments to work together effectively, where all work is transparently prioritized and there is sufficient slack in the system to allow high-priority work to be completed quickly.

Now that we’ve evaluated the archetypes of your organization, we will look at developing the habits and capabilities in people and the workforce as a means of facilitating these structures.

Developing the Right Habits and Capabilities in your Team

To be able to employ this correctly, testing, operations, and security needs to be, first and foremost, everyone’s job, everyday.

Testing, Operations, and Security is Everyone’s Job

In high-performing organizations, everyone within the team shares a common goal—quality, availability, and security aren’t the responsibility of individual departments, but are a part of everyone’s job, every day.

This means that the most urgent problem of the day may be working on or deploying a customer feature or fixing a Severity 1 production incident.

Alternatively, the day may require reviewing a fellow engineer’s change, applying emergency security patches to production servers, or making improvements so that fellow engineers are more productive.

Preventing Siloization

Secondly, we need to enable every team member to be a generalist.

In extreme cases of a functionally-oriented Operations organization, we have departments of specialists, such as network administrators, storage administrators, and so forth.

When departments over-specialize, it causes siloization (coined by Dr. Steven Spear), which means they end up operating more like “sovereign states.”

Any complex operational activity then requires multiple handoffs and queues between the different areas of the infrastructure, leading to longer lead times (e.g., because every network change must be made by someone in the networking department).

Because we rely upon an ever increasing number of technologies, we must have engineers who have specialized and achieved mastery in the technology areas we need. However, we don’t want to create specialists who are “frozen in time,” only understanding and able to contribute to that one area of the value stream.

One countermeasure is to enable and encourage every team member to be a generalist.

We do this by providing opportunities for engineers to learn all the skills necessary to build and run the systems they are responsible for, and regularly rotating people through different roles.

The term full stack engineer is now commonly used (sometimes as a rich source of parody) to describe generalists who are familiar—at least have a general level of understanding— with the entire application stack (e.g., application code, databases, operating systems, networking, cloud).

When we value people merely for their existing skills or performance in their current role rather than for their ability to acquire and deploy new skills, we (often inadvertently) reinforce what Dr. Carol Dweck describes as the fixed mindset, where people view their intelligence and abilities as static “givens” that can’t be changed in meaningful ways.

Instead, we want to encourage learning, help people overcome learning anxiety, help ensure that people have relevant skills and a defined career road map, and so forth. By doing this, we help foster a growth mindset in our engineers—after all, a learning organization requires people who are willing to learn.  

Next, we’ll look at how we fund our teams can also affects our outcomes.

How Funding and Team Size Affects Outcomes

One way to enable high-performing outcomes is to create stable service teams with ongoing funding to execute their own strategy and roadmap of initiatives. These teams have the dedicated engineers needed to deliver on concrete commitments made to internal and external customers, such as features, stories, and tasks.

Contrast this to the more traditional model where Development and Test teams are assigned to a “project” and then reassigned to another project as soon as the project is completed and funding runs out.

This leads to all sorts of undesired outcomes, including developers being unable to see the long term consequences of decisions they make (a form of feedback) and a funding model that only values and pays for the earliest stages of the software life cycle—which, tragically, is also the least expensive part for successful products or services.

Our goal with a product-based funding model is to value the achievement of organizational and customer outcomes, such as revenue, customer lifetime value, or customer adoption rate, ideally with the minimum of output (e.g., amount of effort or time, lines of code).

Contrast this to how projects are typically measured, such as whether it was completed within the promised budget, time, and scope.

Finally, by creating loosely-coupled architectures and designing team boundaries to enable developer productivity and safety, we can improve deployment outcomes.

When we have a tightly coupled architecture, small changes can result in large scale failures.

As a result, anyone working in one part of the system must constantly coordinate with anyone else working in another part of the system they may affect, including navigating complex and bureaucratic change management processes.

In contrast, having architecture that is loosely coupled means that services can update in production independently, without having to update other services.

Randy Shoup, former Engineering Director for Google App Engine, observed that “organizations with these types of service-oriented architectures, such as Google and Amazon, have incredible flexibility and scalability. These organizations have tens of thousands of developers where small teams can still be incredibly productive.”

Designing Team Boundaries in Accordance with Conway’s Law

One way to keep team sizes small is to design our team boundaries in accordance with Conway’s Law.

As organizations grow, one of the largest challenges is maintaining effective communication and coordination between people and teams.

All too often, when people and teams reside on a different floor, in a different building, or in a different time zone, creating and maintaining a shared understanding and mutual trust becomes more difficult, impeding effective collaboration. Collaboration is also impeded when the primary communication mechanisms are work tickets and change requests, or worse, when teams are separated by contractual boundaries, such as when work is performed by an outsourced team.

Conway’s Law helps us design our team boundaries in the context of desired communication patterns, but it also encourages us to keep our team sizes small, reducing the amount of inter-team communication and encouraging us to keep the scope of each team’s domain small and bounded.

As part of its transformation initiative away from a monolithic code base in 2002, Amazon used the two-pizza rule to keep team sizes small—a team only as large as can be fed with two pizzas—usually about five to ten people.

This limit on size has four important effects:

  1.  It ensures the team has a clear, shared understanding of the system they are working on. As teams get larger, the amount of communication required for everybody to know what’s going on scales in a combinatorial fashion.
  2.  It limits the growth rate of the product or service being worked on. By limiting the size of the team, we limit the rate at which their system can evolve. This also helps to ensure the team maintains a shared understanding of the system.
  3.  It decentralizes power and enables autonomy. Each two-pizza team (2PT) is as autonomous as possible. The team’s lead, working with the executive team, decides on the key business metric that the team is responsible for, known as the fitness function, which becomes the overall evaluation criteria for the team’s experiments. The team is then able to act autonomously to maximize that metric.
  4.  Leading a 2PT is a way for employees to gain some leadership experience in an environment where failure does not have catastrophic consequences. An essential element of Amazon’s strategy was the link between the organizational structure of a 2PT and the architectural approach of a service-oriented architecture.

Amazon CTO Werner Vogels explained the advantages of this structure to Larry Dignan of Baseline in 2005. Dignan writes:

“Small teams are fast…and don’t get bogged down in so-called administrivia….Each group assigned to a particular business is completely responsible for it….The team scopes the fix, designs it, builds it, implements it and monitors its ongoing use. This way, technology programmers and architects get direct feedback from the business people who use their code or applications—in regular meetings and informal conversations.”

With these pieces in place, we can see how architecture and organizational design can dramatically improve our outcomes.

Done incorrectly, Conway’s Law will ensure that the organization creates poor outcomes, preventing safety and agility.

Done well, the organization enables developers to safely and independently develop, test, and deploy value to the customer.

Where to Start with DevOps Series

  1. Selecting Which Value Stream to Start With
  2. Understand the Work in Our Value Stream and Improving Flow
  3. How to Design with Conway’s Law in Mind
  4. How to Integrate Operations Into the Daily Work of Development
- About The Authors
Avatar photo

Gene Kim

Gene Kim has been studying high-performing technology organizations since 1999. He was the founder and CTO of Tripwire, Inc., an enterprise security software company, where he served for 13 years. His books have sold over 1 million copies—he is the WSJ bestselling author of Wiring the Winning Organization, The Unicorn Project, and co-author of The Phoenix Project, The DevOps Handbook, and the Shingo Publication Award-winning Accelerate. Since 2014, he has been the organizer of DevOps Enterprise Summit (now Enterprise Technology Leadership Summit), studying the technology transformations of large, complex organizations.

Follow Gene on Social Media
Avatar photo

Jez Humble

Jez Humble is the coauthor of Accelerate and The DevOps Handbook

Follow Jez on Social Media
Avatar photo

John Willis

John Willis has worked in the IT management industry for more than 35 years and is a prolific author, including "Deming's Journey to Profound Knowledge" and "The DevOps Handbook." He is researching DevOps, DevSecOps, IT risk, modern governance, and audit compliance. Previously he was an Evangelist at Docker Inc., VP of Solutions for Socketplane (sold to Docker) and Enstratius (sold to Dell), and VP of Training & Services at Opscode where he formalized the training, evangelism, and professional services functions at the firm. Willis also founded Gulf Breeze Software, an award winning IBM business partner, which specializes in deploying Tivoli technology for the enterprise. Willis has authored six IBM Redbooks for IBM on enterprise systems management and was the founder and chief architect at Chain Bridge Systems.

Follow John on Social Media
Avatar photo

Patrick Debois

In 2009 he coined the word devops by organizing the first devopsdays event. He organized conferences all over the world to collect and spread new ideas. As a pioneer he is always on the look out for new ideas to implement and explore. Currently in the media sector where he is guiding broadcasters with the transition to enter into a dialogue with it's audience as a closed feedback loop.

Follow Patrick on Social Media

No comments found

Leave a Comment

Your email address will not be published.



Jump to Section

    More Like This

    Team Cognitive Load: The Hidden Crisis in Modern Tech Organizations
    By Summary by IT Revolution

    "This feels pointless." "My brain is fried." "Why can't I think straight?" These aren't…

    The Missing Link in Your Industry 4.0 Strategy: Industrial DevOps
    By Summary by IT Revolution

    As manufacturers embrace Industry 4.0, many find that implementing new technologies isn't enough to…

    The Original Disruptor of the Music Industry
    By Matt McLarty , Stephen Fishman

    I know. You’re thinking I'm talking about Napster, right? Nope. Napster was launched in…

    From Turbulence to Transformation: A CIO’s Journey at Southwest Airlines
    By Summary by IT Revolution

    When Southwest Airlines' crew scheduling system became overwhelmed during the 2022 holiday season, the…