How Continuous Delivery and Lean Management Make your DevOps Amazeballs

Log in to watch

San Francisco 2015

How Continuous Delivery and Lean Management Make your DevOps Amazeballs

Director of Organizational Performance Analytics · Chef Software

Dr. Nicole Forsgren will present the latest research that uncovers what really drives business outcomes of market share, profitability, and productivity as well as DevOps transformation awesomeness…

Hint: these include continuous delivery (and what is most important when you do CD) and lean management (and what that means for us).

This exciting research was done with Jez Humble and Gene Kim, and is promising fun new projects in the space.

Chapters

Full transcript

The complete talk, organized by section.

Dr. Nicole Forsgren

I'm here to talk about how continuous delivery and lean management make your DevOps amazeballs. I'm very excited about this.

Last year when I was here, we talked about how IT investments... Finally, we found out that these investments in automation and tooling, when accompanied by practice and process and culture, actually impact your bottom line. And we call this DevOps. That was exciting.

It totally was. It was exciting because, for decades, this hadn't been a thing. We really hadn't seen it. So DevOps and the bottom line showed us that.

Well, it's a year later, and there's new data. So was this a fluke? Does this actually still show up in the data? Is there validation? I rub science on things. Is the science still there? What do we see? Let's take a look, see what happens.

Okay, here's the deets. I ran the 2014 and the 2015 Puppet Labs State of DevOps Report. In the 2014 study, we had about 9,300 respondents. In the 2015 study, we had about 5,000 respondents.

So when we ran the analysis, I did a cluster analysis. We come up with three distinct groups of IT performers. Because I am infinitely creative, I named them high, medium, and low.

Because, and I'll chat about this a little bit later, all of the metrics that we're about to chat about all move together, so we don't see trade-offs. So the high performers are amazeballs at all the things. The low performers are not amazeballs at all the things. And the medium performers are super hanging out in the middle at all the things, at statistically significant levels. So I can think of super exciting words, but really: high, medium, low. That's what we see.

For those of you playing statistics bingo: correlation, CMV, partial least squares, structural equation modeling, exploratory factor analysis, Varimax, and eigenvalues. I did get a request for that last year at a conference because someone actually was playing statistics bingo with a friend.

Okay, let's just cut to the good stuff. We know that IT performance impacts organizational performance. We found it last year. The super great, fun thing is that it shows up again this year. Completely separate set of data, different values, different numbers. We do have some overlap in the respondents, yes, but the data's validated.

So high-performing IT organizations outperform low IT-performing organizations. We see two times more likely to exceed productivity, profitability, and market share, and that is across both the 2014 and the 2015 dataset. So this is consistent results year over year. This is fantastic.

Now, I mentioned the 50% higher market cap growth over three years, the prior three years, with a little asterisk. I didn't get enough data in the 2015 dataset to collect this. I didn't have enough statistical power, but I mention it. It was significant in the 2014 dataset. But in terms of the organizational performance metric, totally consistent: 2X.

So boom. For those of you only interested in that, we're at three, four minutes in. Peace out, y'all. Thanks.

Okay. DevOps is good for business. Now, DevOps is also good for IT performance. Again, we're finding this again. Different dataset, different year. Results are validated. Results are confirmed.

In case we want a refresher on what I mean by that, we're talking about throughput and stability. We're talking about software delivery, and this can be across several types of organizations. We're not just talking about WebOps. We're talking about software delivery. We're even talking about firmware. We're hearing amazing stories here at DevOps Enterprise Summit.

But we're generally talking about agility, reliability. What do we mean here?

In agility, we're talking about deploy frequency. How often do you deploy? Make a note here: this isn't necessarily how often you're pushing all the way through to customers. That's reserved to be a business decision. You can push all the way to customers whenever it makes sense, but you're not hamstringed by your technology.

We're also talking about lead time, length of time from code commit to code deploy. That is lead time. We're also talking about reliability. The important one here is MTTR, mean time to restore. How reliable is your infrastructure?

Now, change fail rate is there with a little asterisk. I mention it and I note it because it's interesting, and we capture it in the data. However, this is not something that actually predicts or is part of my IT performance construct. This is not something I use to classify or categorize organizations because it's not something that holds up from a statistical point of view. But it's still something we use to describe the organizations because we still think it's interesting and relevant. So that's the little asterisk there.

So that's IT performance. I hinted that this is still a big deal this year. Still holds. DevOps is good for IT performance. Our high IT-performing organizations are more agile. And these numbers were the same across both years, 2014 and 2015. We're seeing 30 times more frequent deployments. We're also seeing 200 times faster lead times than their peers: code commit to code deploy.

So think about this in terms of your organization and what this can mean in terms of speed to market. Is this relevant to us? Think about it in terms of new content creation. Think about it in terms of A/B testing. Think about it in terms of a compliance or a regulatory environment. You may think, "Oh, I don't need to get to the market fast. I have no competitors." That would be awesome. I'm excited for you.

But at some point, there will be a security vulnerability come out, or there will be compliance or regulatory changes, and you have to get that code out quickly. That affects all of us. Imagine having that ability to get that code through your system very, very quickly. This is huge. This is a great ability for these companies and these organizations and these teams.

Okay. The other side of this was reliability. High IT performers are also more reliable. Now, we saw marked improvements over here. In 2015, we saw 168 times faster MTTR, likely because of an improvement in the emergency change process, because you're no longer panicking. "What do I do?"

Also, change success rate. Again, the little asterisk because this was something that we're just collecting and gathering and noticing. So change fail rate is when I introduce something into code, does it fail? What's the rate of failure when I introduce it?

This is also likely why MTTR is going up. When I introduce these changes that do fail, though, they're likely smaller. Because my deployments and my lead times are small, I'm introducing much smaller chunks of code. So when something goes wrong, when something fails, it's very easy to push forward a new change or roll back a change.

So validation from last year: organizational performance is improved by DevOps. IT performance is improved by DevOps. This is amazing.

Now, last year, we also took a look at what types of things contribute to this. We had some great data on correlations. One thing that's interesting to note: MTTR, lead time for changes, deployment frequency, version control for all production artifacts was highly correlated with all of those. Monitoring was important for MTTR. Automated testing was important for lead time for changes. Deployment frequency also had continuous delivery in there.

We were starting to notice some patterns around some tooling, maybe some processes. Also super important: culture, job satisfaction, climate for learning. I put some asterisks there around culture because culture was highly predictive. So culture was predictive of both IT performance and organizational performance.

Okay. So this was great, right? We really, really liked this. This year, though, we wanted to see what made IT performance, well, amazeballs. What really drives our IT performance? Not just correlates with it. Correlates is great, but we want to know what drives it.

So we designed the study so that we could run analyses that were based on prediction. So the rest of what I'm going to present actually looks at prediction analyses across this entire dataset for 2015.

So what this required, again, quick recap framework. Step one: understand IT performance in the context that we're studying. Step two: question mark. Figure out what comes next. Step three: who knows step three?

Profit. Profit.

So we started with continuous delivery. A lot of the things that were showing up in some of those correlation analyses sounded familiar: continuous integration, some ideas around version control. We looked into continuous delivery.

Okay. But continuous delivery is a complex concept. It's comprised of several different things. So we included several more items, and we ran a more complex analysis. This is PLS. This is partial least squares.

So the types of analysis, this allowed us to combine several different things. And from a statistical point of view, we can say that based on the data that we looked at, continuous delivery is comprised of these things: test, deployment, and automation; continuous integration; and all of the production artifacts being in version control. These were the things that meaningfully contributed to continuous delivery in organizations.

And when they did, continuous delivery made our work better. It made our IT performance get better, and it decreased our change fail rates.

And we see this. So Yahoo has a great story. So a product owner said, "We never had testability before. We have it now. We have this experience, and we know this stuff is working, and it's working with controls."

So they talk about having automated configuration of deployments of 250,000 nodes. They can currently deploy up to 140,000 node configurations in eight hours, and they can patch their entire infrastructure in six hours of having a patch being made available to the team.

Do you know what's even better? You can work better, but it also makes it feel better. Because we know that deploying's a thing, but back in the day, when John Allspaw and Paul Hammond presented their seminal talk at Velocity 2009, "10 Deploys per Day: Dev and Ops Cooperation at Flickr," everyone either thought it was amazing, but they also internally cringed because they thought about how much that hurts. Deployments have this visceral reaction of how painful it is.

So we also wanted to collect data about what the work feels like. Continuous delivery: the teams that had stronger and better continuous delivery processes report feeling less deployment pain. So imagine a situation where not only is your work happening better, performing better, but you also feel better about it, and you're happier about it.

Amazeballs.

And by the way, all of these arrows are predictions. And by the way, this carries through to organizational performance. All of these carry through.

So that's the continuous delivery piece. But we also wanted to take a look at something else. Any guesses on what that is?

Lean management. Lean management.

But what does that mean? That's a whole can of worms, right? There's so many things to start with and so many things to look at. And so, especially within our context, because lean comes out of manufacturing, we wanted to think of what handful of things or practices or processes might really contribute to this, particularly in our space.

And these are the things, from a statistical point of view, within our sample, really, really contribute to lean management in a DevOps space in a meaningful way.

So WIP limits that drive improvement. By the way, we are the worst at this. But WIP limits make a big, big difference. Okay?

The second one is visualizations to show and monitor our work, because when everyone can see where we are, it really, really helps.

The third one: monitoring to make business decisions. Note that it's monitoring to make business decisions. Monitoring is great, but this isn't just to let us know when something goes down or to page us in the middle of the night because we were having this amazing dream that we shouldn't keep having. This is monitoring to make business decisions. We use these charts and these graphs to help us know what to do next.

This lean management contributes to IT performance, and it makes us do our work better.

A great example of this is Etsy. They've said, "If it moves, graph it." Here's a great example of this. They include walls of charts and graphs, and they include these monitors several places throughout their environment. The vertical lines are color-coded, depending on what it was that they pushed, but these vertical lines represent a code deploy.

And by the way, they've said that this is out there. They're posted so that people then know what's worked on. If you push code and something drastically changes to the graph, you know what happens. So in part, it's almost part of your continuous delivery because it's fast feedback. But the visualization also helps drive business decisions because then everyone knows what work should be done next. This helps with their firefighting. Everyone knows what is coming. And this is part of their Graphite toolset.

So, by the way, lean management makes us work better. It also makes us feel better. So lean management contributes to our IT performance. It also contributes to two other things. It decreases feelings of burnout among our workforce. So we know that burnout is a problem in our industry, particularly in our field. We've seen it. We've felt it. I'm sure several of us have pulled way too many all-nighters.

It also contributes to that Westrum culture construct. If you've heard any of these talks before, and I called it out a little bit earlier, culture was a significant predictor from 2014.

So culture showed up again. We collected it again this year, and now we have something else that helps enforce an organizational culture.

But back to that decreased burnout. We can all think about our own stories and our own scenarios, right? To a certain extent, we can all, or at least several of us can probably identify with the fun and the adrenaline rush that comes from saving something. Right? We can save something. We can stay up all night. We can be the hero.

But at some point, we know that WIP limits are a good thing. Now, we have a quote here from Julia Wester at Turner Broadcasting: "I was trying to figure out why my team was working themselves to death but not getting anything done. By implementing WIP limits, we were able to focus on our work. Finishing work feels better than sprinting and feeling like a hero in the moment, because that's only a moment."

So again, WIP limits make us work better. They make the work better. They also help us feel better.

So again, helps the work better. It also carries through all the way through to organizational performance, and I did include that highlight into organizational culture. Because DevOps drives IT performance. It drives organizational performance.

But if you remember, these IT investments only hit the bottom line when they're accompanied by all three things: automation and tooling, practice and process, and culture. You can't just buy that server, throw it in the closet, give it that pretty uplighting.

So the tooling and automation: continuous delivery. Practice and process: good lean management principles. And we know what comprises that now in a meaningful way. Culture: Westrum. And what characterizes Westrum? High trust and information flow.

And if anyone's interested in seeing one good way to measure that in particular, you can take a look at the metrics white paper that the DevOps Enterprise Summit team put out. And it includes six or seven items that you can offer to people. It's questions, and they've been fully statistically validated and tested.

So if this is something that interests you, measuring and benchmarking teams from the team that brought you the Puppet Labs State of DevOps Survey, Gene Kim, Jez Humble, and myself. You can benchmark your performance across four axes: culture, automation, process, and measurement. You can compare your IT performance against an industry standard, and you can get a personalized report with results that you can share.

So, if you'd like to learn more, you can receive the following: an exclusive invite to our DevOps benchmarking tool; a chance for a personalized analysis of your results with Gene, Jez, and I; a copy of this presentation; as well as a copy of that white paper that I mentioned previously on how to measure metrics throughout your organization, not just culture. It includes several scenarios.

So just send an email to nicolefv@sendyourslides.com with the subject of DevOps. Again, if you want the invite to that DevOps benchmarking tool, we're offering this just to the DevOps Enterprise Summit attendees this year. A chance for that personalized analysis. We're thinking probably about 10 people would get that. A copy of the presentation and a copy of the metrics guidance white paper. That email is nicolefv@sendyourslides.com, and the subject is DevOps.

And I do believe we have a few minutes for maybe just one or two questions.

Q&A

We do. Any questions? Right there, front row. Front row. Are we on?

Q: I was just wondering what part was the predictive part. I didn't quite catch that.

A: The predictive part? Yeah. So the boxes and the arrows, every single arrow is predictive.

Q: I see. So the lead time we're starting with there was for code commit. Is there any thought about moving further up in the value stream to work getting to that point, and do you have thoughts about getting some information about that?

A: Absolutely. So we've looked a lot into, and we've spoken with several companies and customers, and I have myself, looking beyond or earlier in the value chain, beyond code commit. The challenge is that doing a study such as this with thousands of respondents, operationalizing that, and measuring it with any kind of fidelity beyond ideation or ticket submission or anything earlier than that is really, really difficult.

So it's easy for me to do on a case-by-case basis or even within an industry. But once you get earlier than code commit, it can be really, really challenging. But yeah, there are several discussions around that.

If you want to just say it, I can repeat. I'm supposed to stay here.

Q: When you present these statistics to folks, and in their insanity, they push back on them and say, "These don't relate to me," what are the objections that you hear most often to this stuff, and then how do you get past those?

A: So it sort of depends on the objection. Sometimes I'll get objections on methods, and then it's like my dissertation defense, and then we just have this awesome banter on stats methods and which ones we prefer. And then that's just an inherent limitations discussion.

Sometimes I get objections based on, "Well, it doesn't apply to me," or, "Well, that's a different sample," or, "There's no way that could be true because we've never seen it before."

The great thing is that, in particular, the State of DevOps Report in 2015, the Puppet team put together a really, really fantastic report. They laid it out really well with even more charts and graphs than we had in the 2014 one that seemed to communicate really, really well.

Some of the challenge is that we didn't present things in ways that some executives are used to seeing. So we don't have an ROI in ways that are traditionally presented. We don't have ROI. We don't have internal rate of return because when you collect data from around the world in that kind of a dataset, I don't have financial statements from all of those different companies.

And so I'll get that kind of pushback. And when I get that kind of pushback, then I will say, "I would love to do that kind of study. I've done those studies before," because my background's academia, by the way, with a master's in accounting and a PhD in MIS. But I would need some kind of funding to be able to get access to that kind of a dataset, and then I'll ask them for that funding to do that kind of research.

But sometimes it's just positioning. So if you can figure out exactly what their concern is or what their business challenge is, and then speak to that. Sometimes it's ROI specifically, and then you can proposition it around either the value of infrastructure as code or the value of organizational performance.

Or specifically, the best answer I've had is around either throughput and agility or the stability metrics and say, "What's most important to you? Is it speed to market? If it's not speed to market, is it compliance and regulatory changes? Is it stability and reliability? And how can this movement speak to you specifically? What are the types of use cases where having more uptime..."

And sometimes it's reputation around uptime or downtime. And those types of statements or exercises, thought exercises in particular, work very well. And then you can just throw those two or three numbers of, "Okay, you think 30X is unreasonable? Imagine being five times faster than we are right now, at being able to respond to those things in time to market or time to respond to the latest SSL vulnerability that just hit everybody." That's been the very best one that gets everybody.

That was way too long. I'm sorry.

Host: Thank you, Nicole. That's all we officially have time for. The expo hall is open. Please be sure to go downstairs and support the sponsors that help make all this happen for us. And Nicole still has a voice left this year, and she'll hang out and answer individual questions for you guys.

Thank you.