What is Architecture, and Why It Matters

Log in to watch

Europe 2022

What is Architecture, and Why It Matters

Author · The High Velocity Edge: How Market Leaders Leverage Operational Excellence to Beat the Competition

We studied organizations that had the best project due date performance in Development, the best stability and reliability in Operations, and they also had the best posture of security and compliance.

We wanted to understand how these organizations made their “good to great” transformation, so that other organizations could replicate their outcomes.

There have been many surprises on this 20 year journey. But by far, the biggest surprise was how it brought me into the middle of the DevOps movement. The last time any industry has been transformed like our industry is being disrupted today was probably manufacturing in the 1960s, when it was transformed through the Lean and Toyota Production System.

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

Hello, I am so delighted that you're here, and I hope that you're having a fantastic conference.

For the next 30 minutes or so, my mentor, Dr. Steven Spear, and I will be talking about architecture. I think this will be a broader interpretation of architecture than you might have heard of before, and we'll be talking about why structure and architecture so massively impact the dynamics and performance of a system. I'm hoping that this will potentially elevate and further illuminate the impact of structure and architecture in your own work.

As you've likely heard, Steve and I have been working on this book for nearly two years, and we're hoping that it will come out next year. Hoping? It will come out next year, and so we do talks like this to further clarify our own thinking.

Let me first introduce Steve. Without a doubt, one of the most impactful learning moments for me was taking a workshop at MIT in 2014, taught by Steven Spear, which is why I went to the class. I cannot tell you how much he's influenced my own thinking.

He is famous for many things, but he is probably most famous for writing one of the most downloaded Harvard Business Review papers of all time in 1999, called "Decoding the DNA of the Toyota Production System." This was based in part on his doctoral dissertation at the Harvard Business School, and in support of that, he worked on the manufacturing plant floor of a tier-one Toyota supplier for six months.

Since then, he's extended his work beyond just high-repetition manufacturing work to engine design at Pratt & Whitney, to the building of the safety culture at Alcoa, and how we can make truly safe healthcare systems for everyone. For the last decade, he's been part of a U.S. Navy initiative to create high-velocity learning across all aspects of that enterprise.

For the last two years, we've been talking two to three times a week or more, trying to see if we can codify what we've both observed in our careers about that amazing and magical dynamic that is created that can fully unleash human creativity and problem-solving in almost every domain. Earlier this year, we presented on aspects of fast and slow integrated problem-solving, the four characteristics of great structures, and this time we'll be presenting on more aspects of great architectures and structures. Steve, I'm so delighted that you're here to teach us about architecture.

Dr. Steve Spear

Gene, thanks very much. And as far as the book coming out, remember, that's early next year, not just any time next year.

On the topic of architecture, let me start with a reference. Back in January, I attended a conference, a symposium, with my wife Miriam, who's actually an architect. Someone opened up with a quote from Winston Churchill, who said, "First we design our buildings, and then our buildings design us." What he meant by that is that in the moment of doing the drawing, doing the building, doing the construction, we think we have control over the building, and by extension, this was a symposium about urban design, we think we have control over the layout of the streets and the placement of buildings along those streets. But then once all that stuff is in place, they determine how we behave. We behave on them, and then once we're done behaving, they behave on us.

I thought that was so telling because you and I have spent so much time talking about the architecture of technical products and the way in which we architect them then determines back how we behave around them and towards them and with each other towards them. In fact, how we architect our organizations in terms of the flows of information, the possibilities, and the pathways for collaboration. We design that social circuitry, but once it's designed, then it designs us back in terms of how we act and how we behave. This metaphor of architecture is actually, I think, more literal than metaphorical, and I hope to elaborate on it a little bit right now.

In terms of background, you made reference to some of the work I've done, but really I've had now 25, 30 years of trying to explain anomalous outcomes. Those anomalous outcomes take the form of two, 200, and 2,000x. What I mean by that is that back in the '70s and certainly by the 1980s, people were making observations that when you looked in the automobile industry, there was Toyota, which had productivity that was double the world standard, and levels of quality somewhere between hundreds and thousands better than anybody else.

A few years after the first work came out about Toyota and its manufacturing systems, people at the University of Michigan did studies of its design systems and found the same ratios: on any given day, Toyota was producing twice the number of new models in half the time, with much higher manufacturable quality and much higher product quality down the road. As people started looking across these environments, no matter where you looked in industry, you saw these ratios of double productivity, hundreds to thousands of times better quality, and huge differences in workplace safety, which we documented about Alcoa and which I wrote about in The High Velocity Edge.

What we kept finding is no matter where we looked — healthcare, social services — you could literally double the output of an organization, increase dramatically its quality, reduce cost, increase affordability, increase accessibility, and so on. Planes, trains, automobiles, tech, biotech, pharma, healthcare, education, social services, military: in every vertical, every sector, and across every phase of value creation, from upstream discovery through development, design, production, delivery, after-sales, and service, we found these crazy ratios of 2, 200, and 2,000x. It was every place about everything. It begs the question: where does that come from?

Something we've explored previously, but is always worth repeating, is that most everyone, when they start a venture, starts at a very low level of competency and capability. Toyota, in 1958, when they started coming into the U.S. market, were arguably the worst automaker in the world. Their productivity was one-eighth the world standard. Their product quality was horrendous. There was no product variety. Within 20 years, they had transitioned from worst to first, with the highest quality, the highest productivity, et cetera.

When we look at this as a journey from positions of very low capability and competence to very high capability and competence, what we're really trying to do is manage in such a way as to have very high-speed, or high-velocity, learning dynamics going on. Anywhere you look across all those verticals and phases, some are much better at creating the conditions in which people can give fuller expression to their individual potential to be creative and have that individual potential and individual expression harmonized and integrated towards common purpose.

That ties us back to architecture. If we design things and then the things we design influence back how we behave, and in this case we're concerned about the things we design and how they influence us in terms of our behavior and our potential to be creative individually and collectively towards common purpose, what are the two primary things we design?

One is the processes by which your work, my work, Ann's work, and Erin's work gets harmonized and integrated towards common purpose: the enterprise processes. When we think about that, we want structure that allows better expression, and dynamics that allow better expression of our individual and integrated creativity.

We've talked in the past about the importance of processes that are simplified, so they're less confusing and less distracting and less pulling our attention; with more standards, temporary standards, but standards nevertheless, as a capture of our best understanding of how to succeed. That creates less initial confusion about how to get started. Coupled with the dynamics of stabilization, if there is difficulty, confusion, or aberration, it is contained in time, so we're less distracted and less aggravated for a shorter period, and contained in space, so it doesn't spill over and my aggravation, disruption, and confusion spreads to Erin, Ann, and you. The architecture of our enterprise processes matters a ton in terms of our ability to bring our intellect onto the technical problems in front of us.

The next thing is those technical products, and the question of how we can design them so that it's easier to bring our intellectual horsepower onto their design, improvement, maintenance, use, development, and operations. We can start thinking about a contrast between systems that are fragile and don't lend themselves to easy development or operation, and systems that are more resilient, more agile, and overall higher performing.

What are the qualities of an object that are unattractive for allowing us fuller expression of our intellectual potential in its design and operation? First, make an integrated black box technology: everything connected to everything else in convoluted different ways. The inevitable consequence is that no matter what change we want to make, no matter how small, we have to coordinate that change with every other part of the system and everyone else responsible for every other part. We can't be flexible. We're fragile because if there's a disruption locally, it becomes a disruption systemically. We lose agility, resilience, and eventually relevancy because the object we've engineered can't be changed to keep up with changing circumstance. That issue of highly integrated, intertwined design carries over from the technical object to the organization doing the design and operations.

What are the qualities of a nicely architected object? It is modular, and not only modular, but nested. In a modular design, we can make a local change without having to coordinate it with everything else. We can make another local change without coordinating it with everything else. If the modules really are nicely modular, we can change the overall architecture, layout, and configuration without changing the pieces. Designing the technical object to be modular and nested gives us huge opportunity to be agile, flexible, resilient, responsive, and otherwise maintain the relevance of that technical system societally.

The same logic carries over to the architecting of enterprise processes. We have a similar set of choices between highly black-box, intertwined, convoluted processes versus modular and nested processes. In some organizations, no matter what you want to do, you have to coordinate your changes, actions, and experiments with everybody else. You have to get everyone aligned, in agreement, and synchronized all at once. The impact is that people are persistently confused or flummoxed, and they can't change, adapt, be agile, or be resilient. The organization loses societal relevancy because it can't be high-performing and productive.

The alternative is to design enterprise processes with the same mindset: creating things that are more nested, modular, aligned, simpler, standardized, stabilized, and synchronized. When workflows are simplified, standardized, and stabilized, local issues don't become systemic issues, and systemic issues don't necessarily become local issues. Then people have enormous opportunity to bring their intellect and creativity locally focused, without being distracted by concerns about how they're going to be disrupted by, or disruptive to, the larger enterprise. We create organizations and enterprises with agility, resilience, responsiveness, flexibility, and so on. Not only can they establish a very high level of social relevancy, they can continue to maintain it regardless of how the environment around them changes.

Back to where we started: why is architecture important? Because how we design things and how we design ourselves is initially under our control. But once that's locked in, the architecture, the configuration of the object — what's connected to what, in what way — determines how we can act on the object. For the system, enterprise, and processes into which we're embedded, who's connected to whom, in what way, and in what form determines how we can act. What we want to do is act in ways that allow us to be more creative and more productive individually and collectively. The way to do that is architectures that are more modular, more nested, and more apt and receptive to our creative intent. Over to you.

Gene Kim

What excites me so much is that Steve has shown us this problem and the direction of a solution, and I think the language of software does so much to help illuminate and give concrete examples of this. Put another way, some systems constrain or even extinguish entirely the creativity and full problem-solving potential of everyone within the system, versus those that Steve mentioned that fully unleash the creativity and problem-solving of everyone in those systems.

You will recognize one very famous example in DevOps history: the Amazon API example. Dr. Werner Vogels wrote in this ACM Queue article in 2004 how amazon.com started 10 years before as a monolithic application running on a web server, talking to a database on the back end, and it was called Obidos. You might remember the URL having Obidos in it that far back. That application held all the business logic, display logic, and functionality that allowed for recommendations, Listmania, reviews, et cetera.

He said there were all these characteristics you want in a good software environment that over time could not be done anymore. The pieces could not evolve independently, and so increased the need to coordinate, communicate, schedule, and prioritize together, because there was no isolation and, as a result, no ownership. In other words, one small piece of the system could cause global chaos and disruption.

This is what led to the famous $1 billion API rearchitecture of the entire Amazon e-commerce system. There was a very famous memo by Steve Yegge, who characterized how this was put into place. He characterized that Jeff Bezos sent out a memo saying all teams will henceforth expose their data and functionality through service interfaces — in other words, APIs — and teams can communicate only through those interfaces. No other form of interprocess communication is allowed. It doesn't matter what technology you use — HTTP, CORBA, PubSub — Bezos doesn't care. Those service interfaces, without exception, must be designed from the ground up to be externalizable, and anyone who doesn't do this will be fired. Number seven was, "Thank you and have a good day." Steve Yegge wrote that number seven is obviously a joke because Bezos obviously doesn't care whether you have a good day or not.

Who enforces? There's a famous story that Amazon CIO, former U.S. Army Ranger Rick Dalzell, was put in place responsible for creating these hard partitions between teams. Visually depicted, the before state of Amazon e-commerce systems looked like it was very difficult to get anything done without touching other pieces of the system. The nature of technical debt, the way architectures can disappear, where lines between modules are blurred, eventually turns into a state where no piece can act independently. Every time you want to make a change or change a cable, you have to touch potentially every other cable, and if you make a mistake, everything goes down. The after state of the API rearchitecture at Amazon probably looked more like a state where you can regain independence and make changes independently without the risk of global disruptive impact. Teams can work, develop, test, and deploy value to customers independent of each other. This caused orders-of-magnitude improvements in productivity.

Architecture is elusive. Something might look like it has all the great characteristics, but what you don't see at the bottom is an incredibly convoluted pile of spaghetti where it is impossible to pull or manipulate one cable without potentially touching everything else.

What was the result of that amazing investment Amazon made? By 1999 they were doing thousands of deployments per year, but by 2001 they had ground to a state where they could do only tens of deployments a year because the risk of deployments was so catastrophic. By 2011, Jeff Jenkins shocked the world by describing how they were doing 15,000 deployments per day. By 2015, Ken Exner, director of dev productivity at Amazon, said they were doing 136,000 deployments a day. This shows how investing in architecture can truly unleash productivity and fully enable the creative problem-solving potential of tens of thousands of engineers.

This is what we found in the State of DevOps research. One of the biggest aha moments in that study for me is that architecture is one of the top predictors of performance, as measured by the extent to which we can make large-scale changes to our parts of the system without permission from anyone outside our team; the extent to which we can do our work without a lot of fine-grained communication and coordination with people outside the team; and whether we can deploy and release our service on demand, independent of services we depend upon. Can we do testing on demand without a scarce integrated test environment? Because everything is modular, through information hiding, we can localize effects. If all those things are true, we should be able to do deployments during normal business hours with negligible downtime. Only by that can we have characteristics of doing tens, hundreds, or even hundreds of thousands of deployments a day, where teams work independently, deploying value to customers without being coupled to the rest of the enterprise.

In software, we have a language for this already: Conway's Law. There is an isomorphic link between the communication paths of an organization and the software architecture they work within. This is based on Dr. Melvin Conway's famous 1968 experiment, where he built a compiler in two ways: the group organized into three teams built a three-pass compiler, and the team organized into five teams built a five-pass compiler. There is an inextricable link between how the team is organized and the software they create.

In the military community, they have words for this already: unity of command and/or unity of effort, ideally both. That is made most effective when you have decentralized execution. You have an architecture that allows people at the edges to work independently of each other. This is the story of Team of Teams, where they pushed and enabled decision-making to go to the edges, which allowed them to go from sighting to capture from never to 45 minutes, where a 22-year-old drone operator could set into sequence a series of events that lead to capture of an enemy terrorist leader.

That begs the question: if these structures are so important in predicting organizational outcomes and performance, where do they come from? For this, we can go to what Dr. Westrum taught us last year: this is what leaders are accountable and responsible for. The most beautiful phrasing is Jack Rabenau's rule number 23 of leadership that Dr. Westrum told us about: "If you have a dope at the top, you will have, or soon will have, dopes all the way down." That has explanatory power because it explains the best experiences I've had, where it was fully reinforced by who was at the top, and the worst working experiences I've had, also fully enabled by who was at the top.

This is where he introduced the term of the sociotechnical maestro — actually the technical maestro, but Steve and I broadened this to say this is really the sociotechnical system that the maestro creates. The five characteristics of the maestro are high energy, high standards, great in the large, great in the small, able to ask good questions, know when they're being lied to, and love walking the floor. So many leaders in the DevOps and enterprise community absolutely have these characteristics.

Really, structure is a phenomenal predictor of performance. What I find so exciting about the work Steve and I are doing is that if we open the aperture and look back at pairwise comparisons over the last 150 years, where great organizations soundly defeat non-great organizations, we will see these characteristics. Even more interesting are pairwise comparisons of before and after: for example, the Fremont manufacturing plant run by General Motors as the before case, and the same plant with the same people run under the Toyota production system in the joint venture with NUMMI as the after case. They went from worst to first. There is more evidence that we can isolate the variables that cause great performance: literally the management system, the structure, and architecture.

I'm hoping you have learned, one, a broader interpretation of architecture than you might have heard before; two, why we believe so strongly that structure and architecture so massively impact the dynamics and performance of a system; and that this elevates or further illuminates the impact of structure and architecture in your own work. Stay tuned. We will have a draft of a book available for DevOps Enterprise Live later this year in October. We are fully committed, and it'll be out sometime next year.

Closing Exchange

Gene Kim: Steve, I want to thank you so much for this incredible ability to work with you on this problem and doing up things just like this.

Dr. Steve Spear: Gene, this has been fantastic. Thank you very much. I think we're really getting towards giving people meaningful, useful, practical answers to a persistent problem of how to work together successfully.

Gene Kim: Thank you.