DevOps: Better, Faster, AND Cheaper. How?

Log in to watch

London 2016

DevOps: Better, Faster, AND Cheaper. How?

John Willis

Director of Ecosystem Development · Docker

Damon Edwards

Managing Partner · DTO Solutions, Inc.

DevOps makes it possible for high-performing companies to work better, faster AND cheaper.

Remember the old adage “high quality, quick delivery, or low cost… pick two”? The DevOps movement has repeatedly shown that this simply isn’t true. It turns out that you can have it better, faster, AND cheaper. What are the common patterns and techniques of high-performing companies who have achieved this breakthrough? Why does it work? What is next? In this presentation from DevOps Enterprise Summit in London 2016, Damon Edwards and John Willis, hosts of the podcast DevOps Cafe, explain what they've discovered working with successful organizations.

How businesses are operating Better, Faster AND Cheaper?

- Org structure that stays out of the way

- Build everything through a SDLC

- Make the work visible

- Immutable Infrastructure Delivery

- Microservices

- Respect for People

Chapters

Full transcript

The complete talk, organized by section.

Damon Edwards

So I'm Damon Edwards, and that's John Willis.

John Willis

I am John Willis.

Damon Edwards

So we, just by nature of our work, DTO Solutions, John at Docker, we get to see a lot of companies: high performers, low performers, everybody in between, big enterprises, high-flying startups. So Gene asked us to talk about common threads, things we see the high performers doing.

John Willis

And so Damon and I, I don't know, seven years ago, used to have all these conversations about this stuff, and we started recording it. So for about seven years now, we've been doing this DevOps Cafe podcast series.

Damon Edwards

Yeah. And you can get it on iTunes. We talk to people we find interesting, people who like to kind of do long-form style interviews about all things DevOps transformations, try to dig into what makes people tick.

So talking about high performance, I'm sure you've all seen these State of DevOps Reports. There's new numbers now in 2016. But something really interesting is, well, first of all, it's just this amazing tale, right? You've got 30 times more frequent deploys, 200 times faster lead times, faster mean time to recovery. Even got some business statistics here.

So it's fascinating because people are doing it faster, they've got higher quality, and they're more effective, and the data is pretty clear about that. But of course, this flies in the face of this conventional wisdom, right? I mean, we always know this kind of iron triangle, right? You can have it fast, you can have it good, you can have it cheap, but you definitely can't have all three. You've got to pick two, right?

And the crazy thing is this data is showing that that's not true, right? That this triangle has been busted. So the conversation's been for a while, faster, better, and cheaper. Is that even possible? And I think the data has shown that is possible. People are achieving that.

So I think the question that we want to focus on now is how? How are these companies achieving this high performance? How are they doing it faster, better, and cheaper?

That's you.

John Willis

Oh, yeah. Thank you. I'm tired. I just flew in this morning.

So, at the first DevOpsDays in the US, it was an amazing event, and me and Damon did a podcast after that. Normally we interview people, but this one, we just wanted to talk about it. And we kind of accidentally created this acronym just by trying to summarize what that event was all about.

So, kind of a checklist: culture, automation, measurement, and sharing. It's been used as a loose taxonomy for DevOps for about five or six years now.

Damon Edwards

Yeah. So when Gene asked us to talk about how are these companies breaking this triangle, how are they doing it better, faster, and cheaper, we said, well, let's come up with another checklist, right? To say, hey, are we thinking about all the different... What are the different aspects that we see people focusing on?

So I'm going to talk about a few, John's going to talk about a few, and then we're going to get off the stage.

The first one: an org structure that stays out of the way. A focus on redefining, kind of wiping the slate clean and saying, how are we designing the conditions of our organization to go better, faster, and cheaper?

And I think the first point of this is thinking about silos. Siloed organizations, functional silos, are really the enemy of throughput and stability. And we have this idea of, well, it makes business sense, right? Well, we're going to group like with like. As we grow, we're going to put the planning stuff together, the dev stuff together, the release people together, the ops together, QA. We find it cheaper offshore, so we're going to put that over there. We're going to build these functional silos because that makes good sense, right?

Well, the reality, as we know by now, is that these silos, people are working out of context of each other. They start enforcing more inward, and these walls build up between them, right? And then the handoffs become more and more difficult.

The application knowledge starts very heavy in the development. By the time it gets to operations, you've just got a bunch of tickets. You're not sure what's going on. Operational knowledge, super deep in operations, but now you've got folks way, five steps ahead in development. They really don't know what production's even going to be looking like or what the conditions are that they have to deal with.

And the business intent, which is huge in planning, by the time, again, it gets down to operations, bunch of tickets, you've lost all context. And these forces are pulling these organizations apart, right?

On one side, you've got all this ownership. They can decide what has to happen, but they have no accountability to run it. And then, of course, on the other side, you've got all the accountability to run it, but you can't change anything. You're like the abused partner in the relationship. You just got to take whatever they give you.

So these natural forces just pull these organizations apart and actually make it far more expensive, far more inefficient. Throughput goes down, stability goes down, and the costs actually go up, even though they're supposedly optimizing for cost and quality.

So you see what organizations are doing to change this. They're thinking about, well, instead of having these functional silos, how can we build these cross-functional, service-aligned teams? Meaning, how can I put everybody together who needs to deliver and run, right? So it's the full life cycle for these particular services and align it towards some type of customer-identifiable service. So it's kind of one value stream, one team, with the notion of we're going to get rid of the handoffs and we're going to make things as smooth as possible.

Of course, well, hey, we can't do that. We're a huge bank. We can't give everybody their own data center. We can't give everybody their own security team. We can't give everybody their own monitoring infrastructure. So we see them building these. Instead of, say, these functional silos, they're really internal service providers that are providing operational capabilities as a service through these pull-based interfaces. So kind of a redesign and rethinking of the org structure is a strong trend we see with the high performers.

Next one up here: they're building everything through an SDLC, right? So everybody knows CI loops. It's the common way to do things now on the application side. I check my application code into a repo, a CI server spits out some packages, got a package repo, do some distribution, do some automation to deploy my code. I can do it through an automatic trigger, or I can do it on demand. It's kind of the way things are, right?

But think about the high performers. They want that same discipline. So here, infrastructure as code, that's not the key part. The key part is infrastructure through an SDLC, right?

They're using the same discipline to say we want everything from our app code, our environment specifications, our runbooks. Everything goes through this same SDLC process, and that becomes the point of collaboration. The specifications in the source repository is how development and operations actually communicates because their work is going through this same... Everything's got a build process, everything's got a packaging point, everything's got a distribution.

Might be different tools, but they're all kind of running through this same mental philosophy and the same people. You only have kind of two ways to interact with the system. One is through the source repository, two is through these procedures that go that way.

John Willis

And it should be clear, I'm sure people talked about this yesterday, but I put two points. I saw the last presentation: automate everything, right? But then the other thing is everything starts in source control.

Damon Edwards

Right.

John Willis

And that's the big change, right? Whether it's a script, it's a config file, source, everything starts from that.

Damon Edwards

Yeah. And I'd say almost more important than the choice of automation is the idea that we're putting SDLC discipline. So instead of the software delivery life cycle or software development life cycle, it's a service delivery life cycle, right? Everything is going through an SDLC discipline. Doesn't matter if it's shell scripts or the latest, greatest automation tool. It's getting that discipline as an organization.

And the third one here is the notion of making the work visible, right? So you see one of the key, most stark differences between high performers and low performers is the work that they do, the flow of information artifacts through the organization, is visible, right? And it's often the idea that there's a shared visibility across the organization. So folks in development, on the operations side, they're aware of what the factory floor looks like.

And in low-performing organizations, again, that kind of functional-siloed point of view, all they really see is what's going on in their corner of the world. They might have seen a PowerPoint, they might have an idea what happened somewhere else, but no one can really tell you what actually happens. How do things go from a requirement all the way through to when they're running in production? What is the actual step-by-step life cycle there?

So you see that the high performers do this job of making that work visible. This is actually a value stream map. This is actually in the cookbook, right? Or the handbook, sorry. And the notion is, do you have this point of view? Do you have that supply chain, factory-floor point of view in your organization?

And what's the point of it? Number one, it builds alignment and consensus across team boundaries, right? So if we're having a problem, say I'm on the ops side, John's on the dev side. From the ops side, I might say, "Well, developers keep giving me broken stuff." And obviously from the development side, John's like, "Well, you keep breaking stuff," right? But why does that happen?

Is it because I'm not providing the development side with the production-like environments to test against? Or maybe from the development side, they're giving me this 900-parameter configuration file when only two ever change per release. All these different issues that come along that only really can be solved if that end-to-end life cycle is visible, right? And that helps organizations empower the people who do the work to find the problems and fix the work.

John Willis

And the key point here, too, is most times that when you go into an organization, and Damon does this a lot, but you go in and what they think the organization is, is the org chart.

Damon Edwards

Yeah.

John Willis

Right? So they see that, and when you show them the value stream, it's so enlightening because it is cross-functional. There is no kind of boundary. You basically see the work as it flows.

Damon Edwards

Yeah, and it becomes an important part for actually transforming the organizations. We've talked about organizations that they went from a functionally aligned organization to more of the horizontally aligned, service-delivery-aligned organizations.

What made that possible is they first understand the flow of work and then mold the organization around the flow of work, rather than the traditional, "I'm going to start by defining this org structure and then I'm going to try to stuff the work into that org structure." And that's where you get that functional silos. That's where you get the bad handoffs. That's where things break. Sounds great on paper. When it actually translates down to the work level, it creates chaos in the work.

John Willis

And I've got to tell you a story. So Damon does this a lot where they go in, they do value stream mapping, they do a couple day workshop, and they'll take a whole board and they'll just sticky the whole board and fill out the flow. And he tells this great story where the C-level person who brought him in came in after the second day and looks at the board, and I'm going to do a little color commentary, but, "Hmm. Hmm. There's a whole lot of stupid up there."

But in his defense, he said he probably has his name on most of it.

Damon Edwards

Yeah. So yeah. It was his choice.

And really the visibility becomes the heart of... We see the high-performing organizations, they have a continuous improvement system. Now some, they just kind of do it reflexively because it's been part of their culture. Other organizations, where it hasn't been part of their culture, they have to be a lot more prescriptive and actually build an official continuous improvement system in their organization.

But the idea just being, hey, do we have a way to constantly get a shared perspective on what it is that we're actually doing? Do we have an actual way to identify what we're going to go work on and go and fixing those things? Do we have the right improvement metrics to actually look at the end-to-end system and say, are we improving how we deliver? And do we have the right visibility and oversight so the executives can actually steer how the work is happening, not just try to steer the org chart? And looking at that sort of for each value stream in the organization.

Again, some folks can do it naturally. You talk to the guys at Netflix, it's just like, well, this is how we've always done it. We get together every once in a while, figure out what needs to change, we change it, and we go on our merry way. But everything else in the organization is built around giving that visibility so they have that shared understanding and they can drive it.

Other large, more functionally siloed organizations, it needs to become more of a specific improvement system that they put into place, all towards the idea of do we have those hooks? Do we have the visibility to drive the continuous improvement around how we work, right?

We've got a lot, John said, we've got the org charts, we've got the application architecture, but are we looking at actually the factory? Are we looking at how we deliver, how we work, and do we have the right tools in place to constantly look at that and constantly be improving that?

And in talking about transformation, the organizations that can do that, actually, they fix themselves. There's enough brainpower in all of your organizations to fix the organization. The point is, how do you harness that? How do you get all of that aligned?

And just by saying, "Go fix stuff," or, "We're going to start small in all these different places," if you don't have the improvement system in place to harness all that and keep it aligned towards improving the end-to-end system, you get a lot of little localized optimizations, which people think they're fixing things, but they're fixing it in their silo. It makes the whole end-to-end system actually slower or worse. People get frustrated. They're like, "See, I knew we could never change." They give up. Blahbity, blahbity, blah.

So, John, you're up.

John Willis

I'm up.

So, around 2011, 2012, I think the first article was Keith Morris at ThoughtWorks wrote an immutable servers article. And then Netflix then wrote an article about building infrastructure with Legos.

And the idea was, the Netflix one was... Well, the Keith Morris one was like, imagine you could beat your server up with a bat. But the Netflix one was describing how instead of using kind of application JAR files as your delivery mechanism, what if you could actually create Amazon AMIs, machine images, as your immutable artifact?

And so what they did is they actually kind of built their AMIs, stored them in a repository, an artifact repository, and at delivery time, basically delivered immutable servers. Now, caveat is there is no such thing as immutable server. Once you power on a machine, it's not immutable by definition. But as a pattern, it was immutable in that you never change it. You don't change files, you don't change systems, you don't update it. And literally, you just kind of roll and replace.

There's a meta layer that you connect things, but in general, you don't change any of the infrastructure. So that became reasonably popular as a delivery mechanism.

And then, a couple of years ago, containers got really interesting. And we started seeing this pattern of what I coined as immutable delivery. And so instead of starting at the kind of artifact from the software repository as immutable infrastructure, we're seeing developers now create immutable artifacts on their desktop.

Because the idea now is, particularly in a service-oriented architecture or microservices, which we'll talk about in a few minutes, literally a developer can build the whole construct of everything that's needed for that delivery: the operating system, any potential middleware, and the app, all in a binary container.

And if they need to test that container in a service architecture, they could literally pull from the artifact repository the other services that they're going to test on their laptop. When they're done with that, they push it down the line and it goes into a test, acceptance test, automated testing. And then again, that process is then immutable. All the other artifacts by other teams and that developer's artifact.

And then when it goes into production, again, it's immutable. So it's immutable all the way through the process. And so Werner Vogels, the CTO of Amazon, says, "You build it, you own it." One of the patterns of DevOps early on is developers wear pagers.

So imagine that the binary that you tested on your laptop is bit for bit the binary that you have in production. And the thing about binary containers or container images, they don't care where they live, like bare metal, Amazon, GCE.

Damon Edwards

Yeah, so.

John Willis

Yeah.

So about a year and a half ago, I ran into Josh Corman. The thing about immutable delivery was it's reasonably dangerous for a lot of reasons. It's kind of a black box thing. And Josh told me about this book that... I'm a big Deming and Toyota Production System fan, but I missed this one, The Toyota Supply Chain. And it's this concept of the four Vs in learning. It's called 4VL.

And basically, this pattern by Toyota, which was extremely successful for their supply chain, marries really well with immutable delivery.

And so, for example, the variety, the first V, if you have immutable delivery, and so you marry that with containers. So Docker happens to be my favorite one, but the...

Damon Edwards

Yeah, pick your poison.

John Willis

But containers instantiate in like 400 or 500 milliseconds. VMs instantiate in two minutes. To build a cluster or a service structure of multiple containers, a second, two seconds to converge that. On a good day, to converge a virtual image stack, eight, nine minutes, 10 minutes, right?

So, you get this kind of speed. So in the variability, you get to learn fast. Developers don't have these kind of 15-minute context switches. They're actually iterating really fast.

Me and Damon had done a podcast at one point, actually a couple of podcasts, where we talked to developer managers. And early on in the Docker experience, they'd say their developers get furious if they can't rebuild and converge their service structure in three seconds, right? So you start getting this learning fast at the developer stage.

At the integration stage, you can build these crazy horizontal scale tests. There was an early story of like a Postgres table state with Docker where they actually test the state of that table 1,000 times. With VMs, that could be like five or six days. So basically, you turn days and weeks into minutes and hours.

And then in production, the ability to iterate a cluster happens in seconds, right? So you have this kind of variability, all the things that we've learned from Lean Startup, Lean Enterprise, build-measure-learn, learning fast.

And then on the velocity, I already kind of described that. This idea that you can literally build service structures in seconds. You can roll fast. So the velocity is kind of inherent with immutable delivery.

So one of the things I do want to point out is, if you take containers and immutable delivery, there's a perfect marriage there, right? Because containers just fit really well with this immutable delivery pattern. And in fact, a lot of people have picked up on this pattern since containers become popular in the last few years.

Variability is my favorite V, because in the old, old days, basically the way people built infrastructure was checklist. Manual, right? And the variability of your infrastructure at every stage was chaos. You'd be testing software on your laptop, and then it went into kind of integration testing, was a completely different infrastructure. And it went into production, it was completely different, right?

And so then, about 2006 or '7, Puppet became very popular. CFEngine had been around for quite a while, but it really wasn't that popular. And people started adopting this infrastructure as code model. Way better than the checklist. Decreased variability, right? I can infrastructure code my laptop. I can infrastructure code my integration. I can infrastructure as code my production environment. Great.

But guess what? There's a level, and I would say at scale, a significant level of variability there as well. Because, by the way, infrastructure as code, it's a script build. It's not immutable. So what you build here, whether the package repository is available, if a script dies. There's a great paper on this, it's called "Why Order Matters," right?

And so the beauty of immutability is that variability goes away. There is no variance. You have hosts and then you have containers on a host. I'm not going to say there's zero variance, but in general, you have a binary and that binary is basically immutable. So you really decrease the variability.

And at scale, when you add that with speed and variety, you get this speed. And we're seeing this now. We're seeing companies that are adopting these things and being able to deliver software at an incredible pace.

The last one's visibility, and that ties in very well to microservices, which will be the next topic. Which is that I'm not talking about like a GUI that shows the delivery. It's about bounded context. You're building services that are domain-driven. They're smaller pieces. Basically, it's easier for you to kind of grok the architecture and the visibility of what's delivered and what is actually running in your infrastructure.

In fact, one of the big things that I loved about this Toyota supply chain is that it really expresses ideas of bill of material, things like that. So when you have immutable delivery, the danger is, what is all that black box stuff running in production?

If you follow the kind of 4VL, you basically do things like you build a material, you define provenance. You do all those things all the way to the line, and you have this really good visibility.

So imagine you can restart a cluster in three seconds, and you've basically tagged all the bounded services that are running in these immutable infrastructures, and you have kind of a zero-day attack and you need to basically fix that immediately. So you could roll a cluster in three... Basically, now you know where everything is, and you can roll a cluster in a couple seconds.

Damon Edwards

I think a key point here is that, of all this, right, if you've probably gathered by now, it's not just doing the same thing faster or doing the same thing a little more reliably. But fundamentally, these things build up for organizations to change how they do business, to change how they function and deliver as an organization. It's pretty revolutionary.

And then, microservices. Next slide. Next slide, sir.

John Willis

Sorry. Go ahead.

So microservices. I'm sure most people know this by now. There's two good definitions, in my opinion. There's probably a lot more. But Sam Newman, who wrote the O'Reilly microservices book: small autonomous services that work together. My favorite is Adrian Cockcroft's. Almost everything Adrian says is my favorite. But he says it's loosely coupled service-oriented architectures with bounded context, right? So again, the idea is loosely coupled and bounded context.

The thing is, there's a really cool convergence that has happened within the last couple of years. We went from SOA to Eric Evans' domain-driven design to 12-factor apps from Heroku, and now we've got microservices. So we've got this idea that we've loosely coupled bounded service that we develop. At the same time, containers come in and fit really well.

People ask me, "Does Docker have to have microservices? Does microservice have to have Docker?" The answer is no, but they converge really well. And then if you add this kind of small two-pizza team, which is kind of a Lean idea that we've adopted, you have this nice little thing that you've got small teams, you kind of have bounded context or microservice delivery, and you can build immutable kind of service artifacts.

Next slide. So I joked to Damon, like, the world would be really boring if we didn't get to steal Adrian Cockcroft's slides periodically.

So, anyway, the idea is microservice is the opposite of kind of monolithic services. So your classic monolith. If you're delivering a release plan, you basically have to have some coordination, right? You can do the next slide if you don't mind.

So like, as you're coordinating, you got to coordinate if there's bugs. So there's a lot of kind of inertia around a monolith. Now, some companies do it really well, but it's really hard. Like, they pay their dues. Etsy has a monolith, Facebook is a monolith, but they work really hard to do that well.

Damon Edwards

And I think the idea is the transactional overhead, right? So the process of that integration, the testing, the discipline has to go in there. Everything you have to build to try to grease this process because you have these natural bottlenecks, because you have this natural friction that has to happen. Both friction in the communication in the organization, friction with dependencies at the application level.

You've got all these long rework loops, right? We don't find bugs till they're integrated later, or then we find bugs till it's integrated later again in production, drives this rework through it. And the idea is it just adds friction to the entire life cycle, and you have to spend a lot of effort then to try to decrease that friction.

You're constantly fighting to decrease that friction, and that eats up a lot of the capacity and value-add time that could be going to value-add activities in your organization. John said some people do it great, but think about all the effort that has to go into greasing that pipeline to get things through.

John Willis

Yeah, the unicorns are awesome at it, right? The horses are kind of going to be better off with this kind of microservice architecture.

And the beauty of microservices, you go back to Adrian's definition, right? Loosely coupled, bounded, and then you have this parallelism on delivery. And so there's other side effects, too, right? Like notice that some of the services are PHP, Node, Python. That doesn't get in your way. You get to make these bounded decisions for the delivery. You get the speed. You can parallelize the delivery.

So by nature of this delivery model, smaller teams, and again, immutable infrastructure, and less inertia of how you have to deal with a monolith from delivery.

Do you want to talk about databases? And then, yeah. Amazon has this great presentation where they talk about this kind of just distributed database, where there's no central database. So there's even a kind of a microservices architecture for data delivery.

And then, finally, so the whole idea is Damon went through kind of org structure of high-performing organizations, SDLC, important, making work visible, value stream mapping. I think immutable delivery, a container, small teams, immutable delivery, microservices is the way to get a competitive advantage.

We saved the best for last, which is respect for people. So next slide.

So here's the thing. We never really set out to define DevOps by CAMS, and I'm going to say this, and I'm not trying to create a new definition for DevOps, but I think a lot about DevOps is, if I had to create a definition today, I'd say it's patterns that turn human capital into high-performing organizations, right?

Because it's all about the cap. If we've learned anything from Toyota, if we've learned anything from Lean, if we've learned anything from all the presentations of the brilliant people that Gene has gotten together, we basically understand that it is how we collaborate and how we work together as humans. The tools are secondary, right?

So the high performers care about their people. And so the high performers embrace diversity. Etsy uses it as a competitive advantage. High performers understand empathy. Things like embedded ops into dev, dev and ops, C-level people having to check in code, C-level people having to show up at hackathons, board members coming and checking code, companies that rotate people. It's all about empathy hacks, right?

And then finally, the area I've been very passionate about over the last couple of years is this concept of burnout.

So originally, I started presenting about burnout because a young gentleman committed suicide in the Los Angeles DevOps community about a year and a half ago. I started talking about it, I started doing research, and what was interesting is I got to meet this woman, Christina Maslach.

Damon Edwards

Berkeley.

John Willis

She is the foremost authority on occupational burnout. She's a PhD professor at Stanford.

Damon Edwards

Berkeley.

John Willis

What? Oh.

Damon Edwards

Berkeley.

John Willis

No, no, Stanford.

Damon Edwards

Oh, Stanford.

John Willis

Go to Stanford.

Damon Edwards

Sorry. Incorrect.

John Willis

So.

Damon Edwards

I'll be back.

John Willis

And I'm from Atlanta.

Damon Edwards

I'll be back here. Sorry.

John Willis

So, he's from San Francisco.

But she's identified this concept of the kind of workload balance scale, what they call the six mismatches. And the idea is that this kind of clinically defined burnout can occur when an organization and these six mismatches occur. And what's interesting is it's about the system, right?

And so, overwork, overload, those are classic. Death marches are not cool. Lack of control, like we kind of embrace intent models. But rewards and fairness, there's a balance there.

In some organizations, the idea... Come on, give me two more minutes.

Damon Edwards

Three more seconds.

John Willis

In some organizations, the concept of fairness, like salespeople might be the highest-paid people in our organization. For you, that might be okay. For somebody else, that's not okay.

The point here is, when I was doing this research, I realized that these burnout, these clinical definitions of the mismatches of an organization of people are the anti-patterns of DevOps.

And so Christina Maslach says in some of her presentations that she goes into a corporation and she says she wants to help them with burnout, and the common response she gets is, "I'm not running a country club." Like, "What, should we get couches for everybody?"

And I thought about what if we could flip this, kind of the inverse of that. If we were able to understand how to treat our people in this kind of way, what if we could turn that into optimization? And what if we could sell C-level people about not only doing the humane thing for our people, but actually use it as a competitive advantage?

Damon Edwards

So that's our presentation. And so John and I are going to be over at the unconference tables after this if you guys want to talk about any of these things. I know it was kind of a fast survey course. Please do.

Also, you can contact us directly. Please follow me on Twitter because John is way ahead of me in the Twitter follower count...

John Willis

And lords it over me.

Damon Edwards

And our podcast, DevOps Cafe, and buy the book.

John Willis

Right.

Damon Edwards

Awesome. Thank you very much.

John Willis

Thank you so much, guys.

Damon Edwards

Yeah. Cool. Thanks, John.

John Willis

Thanks, guys.