Coordination Costs and Rewiring Organizations to Win With AI

Log in to watch

Connect Feb 2025

Coordination Costs and Rewiring Organizations to Win With AI

Scott Prugh's impact on the community is recognized, especially regarding DevOps principles. Understanding coordination costs is vital for organizations, with AI playing a key role in process re-engineering. Challenges like organizational forces can hinder progress, affecting project timelines. The integration of AI in workflows enhances client analysis and legacy system understanding, while cautioning against over-reliance on AI. Embracing AI with strong DevOps practices is essential for managing these costs effectively.

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

So everyone in this community almost certainly knows the amazing work of Scott Prugh. He and I met in February 2013, literally less than a month after The Phoenix Project came out. He was describing to me how he was just in the printing manufacturing plant with his IT operations team studying the work-release processes in the manufacturing plant.

I can't overstate just how much I've learned from him over the last decade-plus: his journey showing how DevOps principles can be used to transform even a flagship customer-care platform that was written in the 1970s that ran on a mainframe. He was one of the most quoted people in The DevOps Handbook. And again, just his architectural sensibilities, I've learned so much from.

What's really exciting to me is Scott is going to be sharing his mental model of leveraging GenAI to reduce coordination costs. And he posits that this is going to be the main value that this technology is going to be: to lower, or maybe even eliminate, certain categories of coordination costs, just as DevOps did a decade ago.

Scott, I'm so happy you're here. Over to you.

Scott Prugh

Hi folks. It's great to be here, Gene. Thanks for that intro. Can you see my screen and me okay?

Gene: I can see you fine. I can hear you fine, and I can see your slides fine.

Awesome. Thank you. It's been such a pleasure to be part of this community for over a decade now. I've really enjoyed meeting all the people, the intellectual curiosity, and really helping myself and organizations learn, but also other organizations in the industry.

I'm going to talk today about coordination costs. It's been a theme the last couple years, and I've taken it through a couple areas. Also, really, I'll carry on with supercharging the learnings of DevOps and how we apply AI to these coordination costs.

The first thing I'm going to hit on is Pennsylvania Imposter Syndrome. There's no way that this talk is going to be as good as Nathan's, Steve Yegge's, Fernando's, John's. And every day you're bombarded by so much that's coming out with AI. By the time this talk is over, there'll be four more announcements. It's impossible to keep up.

That being said, I encourage folks to really use their FOMO to the strongest, and use AI every day to both change what you're doing and how you work, but also change how your organizations are working. It is so, so powerful.

In this talk, I really divide AI into two areas. The concept of product engineering, which I won't talk much about today, is how you embed AI in your product to help your customers. The other is AI process engineering, which is really the focus: how do you use AI to re-engineer your processes? One thing to build on the quote is: AI won't take your job, but someone re-engineering their processes with AI will. So learn how to use it and learn how to re-engineer your processes for both you and your organization.

Some problems that we see, and this covers a lot of areas in DevOps but also we just continue to see it today, are that teams struggle to make progress. They have to fight through organizational forces to get things into production. Escalations become the norm to get things done. You've got to go up and over and contact someone in an organization to get work through. You rework things a lot. Quality's a problem. You have meetings, then you have more meetings, and then you have a re-meet about the meetings that you had to actually understand problems.

Project managers seem to outnumber workers. You keep adding organizers to actually organize and coordinate the work. And of course, executives now are throwing around platitudes that AI will solve really all these problems: just blindly apply it, and it will help solve the problems in your organization.

I'm going to take you through the journey here around the physics of coordination costs. We'll touch on that, the golden rule of dependencies, the three layers of the organization and how we think about rewiring, and then really how DevOps solved the problem. Then we'll get into how AI can reduce those coordination costs, the concept of AI process engineering and enabling properties. Then I'll talk a little bit about AI being a silver bullet and the cultural change that you need to think about.

Here is the high-level view of the physics of coordination costs. There are three formulas. There's the concept of wait time, percent busy over idle, that The Phoenix Project introduced. There's coordination risk, which basically says that the number of dependencies you have in a process increases your risk at one over two to the N. If you have one dependency to coordinate, your odds of arriving on time are 50%. And then also the concept of knowledge degradation, which we saw in Mary and Tom Poppendieck's work, and John Smart highlighted it also, which is: as you have more and more handoffs, the concept of tacit knowledge degrades in the organization.

These coordination costs basically make it unlikely that you will arrive on time when you want with quality. You want to get things through fast. These costs defeat your ability to do that in organizations at scale.

The golden rule from this, since we have exponential equations here, is when you remove a dependency, you double your odds of arriving on time. When we tackle challenges in an organization, we really look at that: how do we get rid of the dependencies and allow independence of work in organizations?

The reason this happens is there are three properties. One is called contention, where you have conflict over a shared resource. Coupling, which is interdependence between resources and people. And then the concept of coherence, which is the quality of forming a unified and consistent whole. You really are battling those three dimensions in both organizations and software architecture. We need to address those to be able to move quickly with quality.

Now we get to the three layers. This is highlighted in the excellent book Wiring the Winning Organization by Gene Kim and Steve Spear, which says we have three layers. The lowest layer is the technical object and people doing work on that. Layer two is the tools and instrumentation that are then used to maintain that tooling. And then layer three at the top level, very important, is the social circuitry and how you work in an organization. Those are things like the organizational architecture, the process architecture, how information flows, and the behavioral norms of your organization.

I'll give you an example of what I call layer Band-Aids, or misattribution of applying the wrong fix to an organization. If we have a developer all the way down at layer one working on the code, and we look at that developer and we look at a value-stream flow which is 281 days to get functionality to market, what we want to do is make folks more efficient. The easy thing to think about applying is: hey, developers are slow. Let's give them AI. If we make them faster and more efficient, what's the best result we can hope for?

Well, if we can improve the developer work by 50%, which means we get 16 days of improvement, we've only improved the overall time by 6%. I really call this: Copilot may help you, but it won't save you, because you have a problem at a higher level in the organization. It needs to be fixed. This is a misattribution of a system problem and applying what folks would often call a quick fix: hey, give everyone AI and they'll produce more code, be more effective. But this hardly improves the overall system problem, and that's what we need to get at.

How did DevOps solve it? There were a lot of things that occurred in DevOps, but I break it into four things. One: the technical practices. We invested in things like CI/CD to make work simpler, to move through the system faster, at speed and quality. We did things like improve work practices. We started measuring our work. We used work visibility in smaller batches, and we got feedback into the system. That was very important too.

We did platform enablement, also known as platform engineering, which lowered overall cognitive load, created self-service at scale, allowed folks to parallelize their work and have independence of action to move things into production. And the final, really important one, because this is a true layer-three change, is changing team structures and basically creating build-and-run responsibility. That dramatically lowered the coordination costs.

We have a picture of that. You went from an old organizational structure, which was very role-based, to teams that were modular, and we optimized for flow and knowledge efficiency on those teams. That greatly lowered those coordination costs by applying these techniques.

Before I get into the AI pieces, I'll hit on something that other folks talked about today: integrating AI is software engineering, so don't forget about that. To be great at this, you need to be great at DevOps. Think about going faster but not being able to do that safely. This is so important. Don't shortcut CI in your AI initiatives, and don't forget about those important and vital engineering fundamentals. Don't put the AI cart before the CI horse. I see lots of folks running off to solve things with AI without thinking about that first.

Now for the process engineering pieces with AI. The flow and speed of work in most organizations is dominated by wait time and coordination costs. These costs occur when you wait on others for capability, i.e. special skills they have, knowledge, or tasks for them to complete. So we need to turn the knobs with AI to improve those things.

If we use AI to rewire the org to reduce these costs, we can have an outsized benefit. We can get easily 10x, if not more, improvements in lead time and quality. This will only get better as the AI capabilities get better and people get better about using AI to improve their organizations.

What are these enablement properties? I break them into this. There is knowledge, which is using AI to improve knowledge assimilation, using it to democratize the knowledge and really hone the accuracy on conversations. Humans are not great at being incredibly accurate with conversations, and it's very difficult to have the same conversation over and over again and draw accuracy at it. AI is very good at drawing that accuracy out.

Capability, which is really the skills that you have. You can improve your capability and desire to do things you wait on others for. This is like provisioning infrastructure, or even doing research, or drawing a UX wireframe.

Capacity: it just helps you do things faster. Things that took you longer to do can be a lot faster with AI.

The concept of linearization and parallelism: you can improve work sequence and do more work in parallel. AI can be in multiple places at once. People cannot, and that creates independence of action.

And then the concept of incrementalism and optionality. You can use AI to experiment work incrementally and exploit optionality quickly at a very low cost.

Now getting into the example pieces: how do we move those dependencies and double our odds? How do we rewire our org and the processes with the organizational architecture, the system architecture, and the process architecture to improve what we do? We'll go through three examples of how to do this.

The first example is around client requirements and analysis. In this case, we need to collect client requirements, and subsequent analysis is very time consuming. It's inaccurate and requires many handoffs between many groups.

I start here by saying: what do we take? We take 500-plus pages of these documents that we have from clients, these transcriptions that we get from calls, and we feed that into AI. Then we prompt it and say: pretend you're a cloud migration expert, and I want a summary for a compelling presentation highlighting the current-state risks, future-state suggested approach, business justification, timeline, and approximate cost to execute a migration to public cloud. Provide me a summary of that data-center renewal date so I understand those to talk to the customer. Provide me a summary of DR and backup issues. Provide me a summary of the data analytics landscape. Generate me a document from this conversation that I can use as an outline to create a final deck.

Taking some 500-plus pages, in literally minutes, I can consolidate that down into material that I can use in preparation for a presentation. We've used this technique many times, and it's incredibly powerful to enhance the knowledge capabilities and your capacity and capability to work faster. Then, from that, you take those 18 pages and produce your presentation, and your feedback loop in this analysis is incredibly improved.

What does this look like? Before: lots of meetings across eight weeks. Afterwards: you get it down to a couple meetings over two weeks, 100% transcription. If you look at the bottom here, this is what's really important: removing those dependencies is key. In the previous before example, we have about three dependencies at a minimum, if not more. That means there's only a 12% chance that we'll actually arrive on time with no delay. The cost is some 90 hours of work to get there, and it takes you 10 weeks. It's just so much wait time and coordination.

After, you really drop down the number of dependencies. You're working with AI, and it enables independence of action. You can have folks transcribe calls, provide the documentation, engineers and architects can then come in and use AI to do the analysis. That greatly reduces the risk of arriving on time by 4x, gives a 7x improvement of costs, and a 5x improvement in lead time.

The next is legacy-system analysis, really looking at legacy systems. The problem is you want to put new integrations in the system, and traditional approaches are a lot of meetings and heavy coordination costs with key stakeholders. In this case, we use AI's capability to consume code. It's really great at that. It loves code.

You push data definitions from your database and other code in. Then you say: can you draw me a Mermaid diagram of the API calls on this service and show how the code path calls the database? It follows the code through, it shows you where it calls the database, and now you have a much better understanding of how that works.

How about create me a table with each function name and the database call? It creates this table of three columns: give it the repository name, function name, and the stored proc. It outputs that in a format that you can import in Excel and create a pivot table. Finally: let's generate a professional report which analyzes the set of APIs and their functionality and all the database calls made. Now I get a report which I can leverage with a client to help them understand what their software is really doing.

In this case, again, several meetings over many weeks. Many, many conversations that are messy. I heard that today from Stefan and Peter, so I put this on there. People's conversations are not accurate. They describe things in different ways, and it's very hard to rationalize that.

In this case, at the bottom, we greatly reduce the dependencies in half. It's basically a 2x reduction in dependencies. AI again allows independence of action with the knowledge consumption. We reduce the risk by 4x, reduce the cost by 10x, and finally the lead time by 5x. Some things here: remember, AI is always available. It loves code. It types way faster than you do. And the AI code analysis is vastly superior to humans.

The final thing is the one that everyone views as the holy grail: how do I produce a full end-to-end system? To be honest, AI is not there yet, but there are a lot of powerful things that you can do with it. I can build basically an end-to-end proof of concept by leveraging AI in a similar way. I can take a whole bunch of schemas from a current database or a database I've mocked up. I can get some integrations, the WSDLs and OpenAPI Swagger, and throw that into AI.

I can ask AI: I've given you some sample database schemas of our current system and API documentation of three systems we need to integrate with. Generate me a solution in my language of choice that maps our orders to payments and then sends customer notifications via email and SMS. Add a set of tests that mock the three integrations and return test data. Generate me a Dockerfile and some OpenTofu to configure this solution on ECS. Those are all things that I had to wait on other people to do and are greatly improved now by leveraging AI.

This final one creates team modularity with AI, where now some of these roles on the team actually can leverage AI and individuals can work much faster. I get a 4x reduction in dependencies, a risk reduction by 8x, a cost reduction by 10x. Basically, I can do this in 48 hours. And then a lead-time reduction by almost 6x. It's about two weeks to actually get to a working POC here. The cost is very low, so I can actually throw this away without falling into the sunk-cost fallacy.

A couple silver-bullet comments here. Beware of the snake oil. Folks are throwing around: we don't need devs, just AI. You heard companies going about, let's get rid of devs. We don't need them anymore. That's just not true. We're going to need them. They're going to become more and more talented and more powerful with AI, and giving just the devs AI is not your silver bullet.

We need to understand the power. AI makes experts dramatically stronger. Technical maestros and architects really get boosted now. Folks who have deep experience building complicated systems now get tools that make their work more powerful. People great at AI will win the job wars. AI loves code. It types way faster than you. Look hard at CHOP. Steve Yegge talked about this. There's a book coming from him and Gene. Start looking hard at the agents. They're going to help you a lot.

Final cautions and thoughts: don't remove your critical thinking. It's really easy to fall into the trap that you try to get AI to do everything for you and you don't exercise your critical thinking. AI moves the cognitive load in weird ways. When you produce a lot more output, the cognitive load shifts. You have to think at a much higher and complex level. These great advances lie in augmentation versus replacement of folks.

The final summary here: slow down. Change you and your org's behavior to embrace AI. Remember the physics: your coordination costs degrade exponentially. The golden rule: when you remove a dependency, you double your odds. AI process engineering before AI product engineering: be great at re-engineering your processes. Rewire your level two and level three of your org to embrace AI. AI enables independence of action. Don't shortcut CI and DevOps for AI. And the final battle: the coordination costs with AI enablement, knowledge, capability, capacity, linearization, parallelization, incrementalism, and optionality.

Thank you very much, Gene.

Host Outro (Gene Kim)

Thank you so much, Scott. Another masterpiece. I love how you presented the physics of coordination costs two years ago and how you're using it to really create this wonderful mental model of what GenAI has for. We are long overdue to catch up. Scott, let's talk in the next couple of days.