DevOps 2020 - The Next Decade

Log in to watch

London 2020

DevOps 2020 - The Next Decade

Senior Director, Global Transformation Office · Red Hat

John has over 35 years of experience, focusing on IT infrastructure and operations. He has helped early startups such as Chef, Enstratius (now Dell), and Docker navigate the "DevOps" movement.

He is one of the original core organizers of DevOpsDays and has been a prominent keynote speaker at various DevOps events throughout the years. He is also a co-author of The DevOps Handbook along with Gene Kim, Jez Humble, and “the Godfather” of DevOps, Patrick Debois.

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

As many of you know, I've had the pleasure of working with John Willis on so many projects since we met in 2010. He's one of the programming committee members for this conference. He was my co-author on both The DevOps Handbook and the Beyond the Phoenix Project audiobook. We worked on an astounding panel here at this conference with Dr. Richard Cook and Dr. Sidney Dekker from the safety culture community, as well as Dr. Steven Spear from the Lean community, who will be presenting tomorrow. Hang on to your seatbelt: he's going to take you on a wild ride, describing some of his learnings from the past several years and his belief in the importance of platforms, not just for every developer, but for every company. Please welcome John.

John Willis

Hi, everybody. This is John Willis. The presentation is called "DevOps 2020 Rethink." This was a collaboration with a couple of my coworkers. I'll talk about them, but one of them is Jay Bloom, and Andrew Clay Shafer and Kevin Behr. I want to make sure there's credit for some of the slides they collaborated with me on. This presentation is about, I called it DevOps 2020 to set the stage: for the last 10 years, we've done a really good job in DevOps. The question is, what are we going to do now? It's the start of the next decade. Some things I've been thinking about are areas that I think we need to delve into a little deeper. The three areas I'm going to focus on are what I call organizational conversations, organizational design, and something that we're calling, in my team, the three economies.

This is my team. I started at Red Hat last October in 2019, and that's Andrew Clay Shafer. He's my boss. That's Kevin Behr next to him, and that's me, the small guy. Jay Bloom, who I've been working with, is getting a PhD in transition design. A lot of these ideas that I'm going to be pointing out really come from him about how design and transition design apply. As Andrew likes to say, we wrote some books. I was the co-author of The DevOps Handbook, co-author of Beyond the Phoenix Project, Kevin was co-author of The Phoenix Project, and Andrew co-wrote Web Operations and some of the site reliability engineering material.

Here's the deal. This is the Pete Cheslock joke. We came to find out this slide was actually originally created by Patrick Debois. All things lead back to Patrick. If you think about the last 10 years like this, we've been this unicorn poop, if you will, on the enterprise, and I mean that in the best possible way: DevOps Enterprise Summit, all the accomplishments we've made. But it's been a struggle. Starting off, the first enterprise conversations were, "I don't think the enterprise can do DevOps." Then it was, can security apply the security things around DevOps? Here we sit at 2020, and I think we've got that pretty much solved in terms of everybody has the memo.

But if we look at digital transformation, by all accounts this conversation has been creeping up for the last couple of years in a modern discussion. It's been around forever. You see a lot of stories about failures, different reports, different studies. This one particular one says 70% of all digital transformations fail. So jokingly, maybe the next 10 years is digital transformation unicorn poop. But we're going to have to. I'm half joking because the truth of the matter is we do have to have a better, stronger, bigger conversation about what we're doing. For most of the people at this conference, you've all been doing that. It's for the people we need to educate.

Andrew Clay Shafer has come up with this idea of five elements. If we think about the five failures: leadership is still preventing change in a majority of organizations, either in the form of governance and risk or just general business. Product is building things that don't matter. We still haven't gotten Eric Ries's memo from The Lean Startup. Development, in a lot of cases, is still building the wrong things. Architecture is basically designing, or not even involved in the decision of the design, so they're building the things wrong. In operations, we still have a split mindset of incidents, operations, outages, and half in on service management and half in on DevOps or newer thoughts about incident management.

What I wanted to propose are some areas I've been thinking about as this rethink. We're sitting here at 2020. We could say, as I said earlier, that we've done a really good job, and we have. We should all collectively, and I'm being serious, pat ourselves on the back. We've done a tremendous job in the industry improving commerce and improving people's lives. But the question now is: at 2020, what are we going to do for the next decade? If we're still talking about GitOps and CI/CD five years from now, then we probably have failed miserably and the digital transformation discussion will have overtaken us. We really need to start thinking about how we improve. Five years ago, a conversation about continuous delivery was a novel idea. Today it's table stakes. We want the new things to be table stakes.

The first thing I want to talk about is something I've been doing for the last three or four years. I've called it a lot of things. I'm just going to call it organizational conversations. This is where I've gone into large organizations and literally spoken to hundreds and hundreds of people. I usually come in at the CIO level, and usually a champion inside the organization says, "You should talk to this guy, John Willis. He sort of knows what he's doing." Then I get to talk to the CIO, and I convince the CIO to let me just have conversations. I want conversations where I talk about the people at the edge, the people who put their fingers on the keyboard. I'm more interested in that form of discussion than I am in talking to leadership and top-down. I want to go bottom-up.

I've done this over the last few years at very large banks and insurance companies. One thing I came up with is this quote I've made: "You can't Lean, Agile, SAFe, or even DevOps your way out of a bad organizational culture." Lean, Agile, and SAFe are great frameworks or great pattern and practice tools for us. But if we don't get to the bottom of how things really work, or the truth, or have the real conversations with the people doing the work, these things can actually give us false truths.

One of the things I've had over the last few years is this thing that I call the seven deadly sins of DevOps. I won't go into this in detail, but there are patterns you find when you have these conversations, and one of the most interesting ones is they all seem to funnel down into what I call security and compliance theater. In other words, your audits are basically nonsense. I've got full presentations on this and some of the work I've been doing in automated governance and automated cloud governance. You can look me up and find that.

I love this story by Abraham Wald. Actually, it was a story that Sidney Dekker told at one of the DevOps Enterprise Summits. During World War II, scientists, mathematicians, and specifically statisticians were looking at how to do proper repairs of fighter planes that would come back. They'd figure out where the bullet holes are, the weight ratios, and all that. At one point, Abraham Wald had this aha moment where he said, "We've got it wrong. What we're doing is repairing and fixing where the bullet holes are. Those are the planes that are coming back. What we need to do is look where the bullet holes aren't, because they're the ones that aren't coming back." It was the original definition of survivor bias. Sidney Dekker says we don't need to look at the absence of negatives. We need to look for the presence of capacity, the things that go right. I use this in the whole organizational conversation dialogue.

Eliyahu Goldratt wrote The Goal, and for most of you know, The Phoenix Project was a modern-day rewrite of The Goal. He also, 20 years after he wrote The Goal, did an audio-only project called Beyond the Goal. In one part, he talks about complex systems and complex adaptive systems. He says that if you look at these two systems, system B and system A, and ask people which one is more complicated, most people would say system B. But if you ask physicists or somebody who really understands complex systems, they're more likely to say system A because it allows more degrees of freedom.

When I go in to have conversations with customers, working for the CIO but literally talking to the edge, people who are doing the work, they tend to want to give me system B answers. They want to say, "Well, that works," or, "My CMDB is fine, John. Don't worry about it." What I really need to do is get beyond that to the truth. I like to use the "this is fine" dog cartoon. People are constantly telling me, "Don't worry about it. This is fine." Once you earn their trust or create an open, collaborative dialogue, a psychologically safe environment, what you actually wind up getting to is the real conversations where you're finding the places that are really on fire. When you get psychological safety and trust, people will tell you the most fantastic workarounds and the real fire stories, which are the system A discussions I'm looking for.

If you look at the Equifax breach in 2017, it's a classic example of a system A, system B conversation. For those who know this, it was a library called Struts 2. In there was a Jakarta parsing module. If you did a cURL on a system that had that library, chances are you could compromise that system with this little command. When it was all said and done, the CEO said, "We know what was wrong. The breach was basically a single person who failed to deploy the patch." That's a system B answer. But when you go in, first off, you look at the 2018 congressional oversight report on the breach, and it had tons of complex problems and systems. The chief security officer reported to the chief legal officer. When the chief security officer was asked in congressional review, "Why didn't you notify the CEO of the breach?" the answer was, "I didn't think about it." They didn't think about it because they were reporting to the chief legal officer. The IDS, the intrusion detection systems on the perimeter, had 18-month expired certificates. There were all these things. What I look for are those complicated, honest answers and discussions.

The second area that I've been thinking about for 2020's focus is organizational design. A lot of this I get from Jay Bloom in terms of transition design and thinking about design when we talk about transformation. If we look at a simple evolution, everybody knows this: we go from agricultural economy to industrial economy to knowledge economy. We're in a knowledge economy. But right now, if we talk about Lean and Toyota Production System, we're still in this struggle, this conflict between how we map the things we know work really well in an industrial economy and what the things are in a knowledge economy. Knowledge economy is still sort of art. Things like Lean have been able to try to apply science, but we still have debates on what really maps properly. We need to get over that. We need to actually start applying true science, the way operations research and all the things we could learn from the industrial economy, truly in a knowledge economy. I would say we're still not doing a great job there.

I talked earlier about how Andrew had come up with the five elements. Andrew spent five or six years at Pivotal on large transformations. When we all came to Red Hat, we had this powwow: glass half full, glass half empty; what are the things we've done right, and what are the things we haven't done a great job on? In a pre-DevOps conversation, it was all about development. It was the Agile Manifesto. The DevOps conversation opened up this balance theory between operations and development: differentiation versus scale. You could say it's DevOps, and it was an engineering-focused discussion. We've done a really good job there. What we haven't done a good job on is architecture, enterprise architecture. In large companies I talk to, the DevOps people are screaming, "Please, if you could help us get the enterprise architects on page." In a lot of cases, enterprise architects are still working off the 1990s paradigm of architecture. Product, in most cases, is a mess as well.

It's like Chinese medicine, based on balance, with leadership in the middle. We use this canvas to start a discussion: if you've got too much weight in development or development and ops, but not in architecture, what's your balance theory among these five elements?

If you go back to Toyota, one of the more successful parts, as we talk about Lean as a definition of what Toyota Production System is, was the Toyota supply chain and something they called the four V's of learning: variety, variability, velocity, and visibility. When we talk about that middle area between an industrial economy and a knowledge economy, could we take Andrew's five elements, or what we're calling our global transformation office five elements, and map that with the four V's to get a better sense of how we can do knowledge economy based on these pure principles?

I created a grid looking at motivations and conflicts. Pretty obvious, but if you look at a developer, a developer wants increased variety: more choices. For variability, they don't really want tolerance and lockdown. They want to expand. Of course they want increased velocity. But they want decreased visibility. They don't, in general, want GRC, governance, risk, and compliance. They don't want the CAB. They don't want NFRs. Product is pretty much aligned. Leadership wants everything: increase everything. But if you look at operations and architecture, which are reasonably aligned on our five-element grid, they want to decrease variety. They want to decrease optionality. They want reuse and scale. They want to decrease variation and tighten your tolerance. On velocity, they generally want to decrease the speed. I know everybody has gotten the DevOps mantra memo, but in large organizations, a big part of the organization is still trying to decrease speed. But they want to increase visibility. They want more NFRs, more operationalization, some sort of audit and control, and architecture is the same way.

I'm going to talk about variety and variance or variability, and save the other two V's for another conversation. In variety, we're talking about optionality. We're talking about balancing market demands and operational efficiency. The Toyota Supply Chain Management book is excellent if you want to understand the details of how they competed, how the Volt competed against the Prius. On variety, we can look at systems thinkers. Alicia Juarrero says constraints enable freedoms. By curtailing potential variation in component behavior, context-dependent constraints paradoxically also create new freedom. Certain types of governance systems enable freedoms. We need to learn more about systems thinking. We need to learn from the four V's. We have to understand what Toyota did incredibly well with variety.

Another great one is "The Tragedy of the Commons" by Garrett Hardin. Self-interest behaves contrary to the common good of all users, depleting and spoiling the shared resources. To summarize it: consumers must be managed to preserve the system. Too many cows consume all the grass and the field collapses. Then we have Ashby's law. Ultimately, I'm going to tell you that I think all this has to be balanced in the five elements, and my conclusion is going to be that you really need a platform. Ashby's law is: a system, to be stable, with the number of states of its control mechanism, must be greater than or equal to the number of states of a system being controlled. If you think about a platform, a platform does that. It does that balancing act between controller and controlees. Stable system controls must be greater than or equal to the controlled systems.

Last but not least, Don Reinertsen. The problem with any prioritization decision is that it is a decision to serve one job and delay another one. In general, without all the gory details or reading his intense book, focus on high-value, high-probability items in your backlog. The common theme here is economic balance and how to make those trade-offs and decisions. We've got tons of literature and science from incredibly smart people to help us. In general, what you have is constraints enable freedoms; consumables must be managed to preserve the system, stability, Ashby's law, and cost by Don Reinertsen.

Quickly, variability, which is variation. I love this quote by an unknown author: "Misunderstanding variation is the root cause of all knee-jerk reactions, overcontrol, micromanagement, and tampering." If you go to Deming's writings, he basically says, "The importance of operational definitions in collecting data. Without them, the data is suspect. Change the definition and the data changes, and when you don't have a written definition, the different opinions of those collecting data results in muddled data." Here's the bottom line: we quote the hell out of Deming, but we very rarely actually listen to him. Every presentation now has this Deming quote, but are we really doing operational research? Are we really applying the science, statistical process control, the system of profound knowledge, Deming and Shewhart's thoughts about plan-do-check-act or plan-do-study-act?

Another place to look for variance and how to create opportunity variances is Taguchi and the Taguchi loss function: "Cost is more important than quality, but quality is the best way to reduce cost." Find the edges of your variability. It's not how tight your tolerance levels are; it's how far you can stretch them. Where can you get the value? The hidden values are at the corners. Then there's the Red Queen effect from Alice in Wonderland. In general, when we talk about sitting here in 2020, at DevOps 2020, if we're running in the same place, we're losing. In summary: statistical process control, tolerance and Taguchi, and the Red Queen effect.

One thing I want to say here is that if you think about what's in common with all these things I just talked about, they're math, engineering, and statistics. We need to do a better job in the next stage of stopping this knee-jerk reaction: we get a failure, let's hire; we get a breach, let's hire 100 new security professionals. That's a true story from a bank. We do finger-in-the-wind. We have this knowledge, and it's been used by industrialization and power plants. It's 100 years of engineering that's sitting in our face that we can actually apply and think about and be better at. There's a great quote about Walter Shewhart and Deming, saying in 1980 that it will be 50 years before we figure out the real value of what Shewhart was saying. Shewhart created the genesis of most of Deming's work, statistical process control, plan-do-study-act. We're still 20 years away from Deming's prediction. The bottom line is that there's a lot of really good information in industrial engineering and operations research. We need to stop just quoting that stuff, and actually start looking at the real science and make breakthroughs. I'm not trying to trivialize the people who have done tremendous work. I'm saying that in general organizations say, "I'll never get my management to understand Taguchi." Well, Toyota got their management to do it, and they decimated a market for 50 years.

I want to end with a couple of things about platforms. This is an idea we've been talking about internally, and Jay Bloom has been writing about it for a couple of years. He calls it the three economies. Most of our discussions around how we think about infrastructure, scale, or even DevOps have been bound around this idea of two economies: differentiation and scale. We can call that dev or ops, or infrastructure and development. It's been a bimodal discussion. Differentiation economy: velocity, novel, niche, experimentation, incubation, the things you expect from differentiation and development. Scale is the ops or infrastructure side: regulate, reduce, create resilience, reuse, consolidation. We understand those two economies pretty well.

I've been fortunate enough to have great conversations with Mark Burgess, who wrote the foreword to the SRE book. We were talking about SRE and how Google has built its infrastructure over the last 10 or 15 years. If you understand Google's history, they started with Borg, turned that into Omega, and ultimately we see that as the open source project Kubernetes. Mark said one of the brilliant things Google did was make a non-deterministic infrastructure look deterministic to their developers. The developers didn't know anything about the particulars of the virtualization or storage platform. They just had APIs or interfaces. In most cases, they weren't even given the ability to know those things. They created applications and services through interfaces bounded by what Jay would call a scope economy. Google didn't call this a scope economy, but it was this clutch between scale and differentiation. It's not just a platform; it's an interface. It's an abstraction that allows developers to get the best value and allows infrastructure to get the best value. You can think of differentiation and scale crushing in toward the middle, where the scope economy creates the adoption, control, and all those things you want.

The scope economy gives us the ability to enable the V's: velocity, variability, and variety. It allows recombining all those things from the tragedy of the commons. Scale controls velocity and variability, so scope gives the best of both worlds. It becomes this clutch. If you Google three economies, you'll find more presentations and more details. Jay has written a couple, and there are some really good Wardley mapping examples with this as well.

In the end, one of the most important things we talked about in 2020, and this is self-serving because I work for a company that sells a platform, OpenShift, but I do truly believe this is true: Stephen O'Grady said developers are the kingmakers. When I sit here in 2020, I say if you're not thinking about the next-decade platform and how that platform is going to look, how you're going to utilize that platform, and what strengths of your organization use that platform, you didn't get the memo, and you're probably going to lose. I fundamentally believe, whether I work for Red Hat or not, that the new way forward is platforms. Platforms are the new kingmakers. Get the memo.

Platform by design: if you think about what I would call cloud titans or the early experimenters in scale of infrastructure, your Googles, Twitters, Netflixes, even Facebook, they all did this by platform. The question is what a platform means and how you use it. Don't get lost in the marketing hype, which anybody can give, including my company. Think about what Mark Burgess said about the brilliance of Google. They created this abstraction that allowed developers to be completely divorced from the infrastructure. All they were given was a set of APIs, interfaces, and a well-documented interface to do anything they needed to do. That's where enterprises have to get to. It's a long haul because Google has one application base or one infrastructure and few applications, while banks have 20 or 30 lines of business, thousands, maybe 10,000 services. It's a much harder ask, but in the end you have to get there.

To get there, stop thinking about platform as a differentiation economy, a platform as a service or self-service, or even worse, a container system that manages clusters. Start thinking about what I'm calling a scope economy and how a platform really becomes a platform as an interface. It starts enabling the things you need from the differentiation economy and the scale economy. Those things get collapsed in your scope economy. The platform becomes this enabler, and you tend to start looking more like Google. Whenever I hear a conversation in a large organization, I say, "Hey, calm down. You're not Google. You're a bank and you're healthcare, and you can calm down because you're reading all this stuff about the way infrastructure's supposed to work." Then I say later in a presentation, "By the way, you can be like Google." But you have to understand, Google didn't really create a PaaS. They created a platform as an interface. As we see things like service mesh, Istio, Envoy, all those things, the platform becomes this experience.

If I look over the last four or five years, some of the smartest people I know used to work for software companies. They're all taking VP of engineering jobs in large global 1000 companies, shoe companies, and banks. Why? Because these are the people the enterprises know have to get them through the next 10 years. It has to be based on a new generation, a new way of thinking. I would say it's a scope economy based on a platform as an interface.

My ask for everybody is that I would love to push this conversation about organizational conversations, about some of the things we've learned from Toyota, some of the things we should be doing better from operations management. Are we applying the right science? Maybe I'm right, maybe I'm wrong, but I don't think we're applying the right science. I think we're a lot of platitudes and a lot of quotes. All this is bound into a discussion about platforms. My ask is: anybody who wants to have a next-generation conversation about these three areas, please hit me up. Help me help you drive that conversation. Thank you very much. My name is John Willis, @botchagalupe on Twitter, jwillis@redhat.com.