Episode 3: The Quest for Accelerated Delivery

Log in to watch

San Francisco 2016

Download slides

Episode 3: The Quest for Accelerated Delivery

Carmen DeArdo

Director, Build Capability · Nationwide Insurance

Jim Grafmeyer

Systems Architect · Nationwide Insurance

Cindy Payne

Director, IT Architecture · Nationwide Insurance

Nationwide's journey began 8 years ago with an Agile at Scale implementation. This transformation created over 200 Agile teams which produced some demonstrable results. But our drive for Continuous Improvement created the realization that it was necessary to drive further changes in process, technology and culture across the entire Delivery Value Stream.

Nationwide, like many other Fortune 100 companies, acknowledges that having a world class IT Delivery Capability is essential to remaining competitive in the next decade and beyond.

This third DevOps Enterprise Summit installment focuses on the progress made to date and the journey that lies ahead on our continuing Quest to Accelerate Delivery.

Chapters

Full transcript

The complete talk, organized by section.

Carmen DeArdo

So a little bit about Nationwide. There is a Nationwide post-Peyton Manning retirement. We continue to rise up on the Fortune 100 scale. We're involved in a lot of different things. We have 20 different business-facing IT areas: insurance, pet insurance, annuities, etc. And there's property and casualty, and there's financial. Jim and Cindy are going to be talking a lot about the great work they're doing to prove these concepts in the financial area.

So a little brief recap of the journey to date. Episode one was sort of "A New Hope": agile. And I say that tongue in cheek because it really was a hope. This was about 10 years ago. Cindy was also a key part of that, and it was an experiment. Can we actually try to make this work? We had three teams we started out with. We're now over 200 teams. We're committed, as Heather talked this morning around Target, of being 100% agile.

So the experiment worked. And as you would expect, we got some good results: improved quality, productivity, availability, delivery. So it's like, "Hey, we're done." Well, as Lee Corso would say, "Not so fast, Carmen."

There's some more work to do. And the work comes about from the fact that we're going very quickly from the point of a card getting into the backlog of our agile teams through a set of iterations. But if you go to an agile team and you say, "Why can't you go faster?" They're going to say, "Well, they're waiting."

So what are they waiting for? Well, they're waiting for work. They're waiting for somebody else to do something for them, like another service or some functionality they need, or they're waiting for an environment. So at the beginning of our process, we don't have continuous flow from our portfolio into the backlogs of our agile teams.

And then once the series of iterations is done, we're not going to production. There's a lot of manual and what I call, oh, this is being taped, high-ceremony processes. And they're kind of necessary just because of where we were, and Cindy will kind of relate some of our history around that. So we have this Water-Scrum-Fall thing going on.

So that sort of leads us into episode two, which is we need to take a broader look at our value stream, all the way from the business and the kind of things Mark Schwartz talks about on the business value and what is business value, all the way through to monitoring and feedback to determine did we actually add value, and feed that back to the business to turn the crank again.

And the red part, as you can see, is where we're pretty... I'm not going to say we're perfect, but that's the part that we sort of focused on with the first transformation. This transformation is really looking at kind of the orange and the green. How do we get work more rapidly into our teams, and then really moving to a model of certification or release when ready. And Cindy is going to talk a little more about that.

So up till now, we really hadn't been focused on that kind of mindset, and it is a mindset around certifying when things are ready.

So that sort of brought us into episode three, which was, okay, we have this value stream. The last two years, I talked about how we automated that. Some of the practices that you'll see in our DevOps house of pancakes, as some people refer to it.

We built an automated pipeline. We used technology like Tasktop to identify our four sources of work, which I know we all know from The Phoenix Project. There's business value, there's unplanned work, there's operations, and there are changes. So we reduce variance to determine sources of work. We use Tasktop to provide an integration pattern to bring that into the pipeline, and then we used products like UrbanCode to put together an automated release and deploy.

But again, that's not enough. You've heard over and over again about culture, about mindset. John Willis talked about Deming and the whole mindset of psychology, and you have that whole culture pillar there. And if we're going to reach our true north, which is reduce lead time so we can become more responsive to the business, we're going to have to address some of those things from a culture perspective. Just having the technology available is not enough.

So that brings us into another topic you hear at this conference, which is the organizational structure. So we probably may or may not have a typical structure. I sit in the shared services area. Cindy and Jim are in the financial business unit, and so they're actually doing real work, while I'm just talking about how people can do work.

And then we have this interesting dynamic with the actual owners of the tool. I don't own any of these tools. I just have these wild ideas about how maybe we can do stuff. And as you can see by the caricatures or their icons, some tool owners may be a little more open to trying new things than others. Some are very risk-averse.

Some of the DevOps things that you heard this morning around the leaders yelling at each other around availability, invectives like that, and, "What do you want my application to do again?" So how do we get this work through this organizational structure in order to reach our goals?

So the way that we're doing this is a lean concept called utilizing model lines. And we had one model line in property and casualty and one model line in financial, which are Jim and Cindy. And as I put it, they really have the street credibility, right?

I'm sort of sitting off and it's like, "Oh, here comes Carmen and his team again to talk about how we can do this continuous delivery stuff," right? But people like Jim and Cindy have been known to be innovators in the enterprise. They're people that didn't wait for tools. They went out and did things themselves. Sometimes they were kind of anti-establishment because we really weren't providing them with the capabilities they needed to satisfy their customers.

So it's a tremendous opportunity to take somebody like that and partner with them and help them drive the transformation by demonstrating what actually can work, right? This is no longer, "Here's Carmen talking about his theory." It's, "Here's what actually can work." And then we can run experiments, and we can develop patterns for other organizations.

So with that, I'm going to have Cindy talk about... Oh, I have the clicker. So if anything goes wrong, it's my fault.

Cindy Payne

Thanks, Carmen.

We like to joke with Carmen about being in his DevOps ivory tower, and we think we can get away with that since architects are usually getting the ivory tower bad rap, or as we heard this morning, the Visio bad rap from Gene Kim.

So, what's it like from our side, being in Nationwide Financial, which is a heavily regulated business unit of Nationwide? Well, first, we have very large project portfolios that we need to deliver. We need to do it at a rate faster than we've ever done it before. We need to deliver within a yearly funding cycle, which dictates that the money that is allocated each year has to produce the delivery within that year. And we have to do it with sometimes what feels like a half-empty tool belt. Not a problem, right?

So let's look at that a little bit closer.

So first, what are we doing in the business units? Well, we're really, really busy. We have large transformational projects that are modernizing our legacy systems. And in Nationwide Financial specifically, we're currently responding to one of the most disruptive regulatory events that has happened in our industry in decades. And that stuff is really important and very difficult, and it's Jim and I's primary accountability.

Second, Carmen's not done yet, and he can't do it alone. So DevOps at Nationwide is not sitting on a shelf, ready to be consumed by practitioners wholesale. There's still a lot of work to do. And as Carmen said, we've been busy building our own solutions for years.

And we're not the only business units to do that. So that's what causes us all to look like snowflakes, and we're heavily invested in those solutions. So, as an architect, I understand what problem that causes.

And finally, we have a business partner. Our digital marketing partners have a real need for speed, so they're asking us to accelerate the refresh rate in their digital experiences. They're also asking us to help finish building those end-to-end digital experiences for financial advisors and consumers.

So what we've done is we've decided to optimize around speed. We feel when you optimize around speed, everything else falls into place. And we've seen this happen with a recent project where we rewrote Nationwide Financial, and it came in six months ahead of schedule and 40% under budget.

So what I want to talk about next are some tool sets that Carmen usually doesn't cover in his three-year episode journey. And what I'm going to do here is I'm going to focus on how these tools help us optimize for speed because the tools themselves are not anything surprising.

So GitHub is the first one. At Nationwide, our standard version control currently is Subversion, and it is one of the most widely adopted enterprise tools in our company, and it's very cost-effective. So why would we even look at replacing this tool?

Well, I'm going to give you three reasons why. We have all focused on speed. The first one is Git is just better at version control than Subversion. When we look at something as simple as doing merges, we're seeing a 90% reduction in merge times, and at Nationwide, we do a lot of merging.

The second thing is that GitHub enables a model of self-service for things like creating new repositories and doing team management. With Subversion, our current state, those work requests would go to our infrastructure and operations team and take about three days.

Finally, my favorite reason is that GitHub really enables inner sourcing. So when I think about the potential of this for a company like Nationwide, where we have, as Carmen said, hundreds of teams, many of them with Java skill sets, and one of them discovers that they need a single line of code written by another team, our current process would have them make a request that could easily take two months of calendar time and hundreds of hours not doing anything technical, but simply in our demand management process.

When there's no skill set reason or funding reason why that code change couldn't be made by the team that found it. So, embracing an inner-sourcing model can really help us avoid all of that waste.

So the next one I want to talk about is New Relic. Our current application monitoring tool really does a great job at helping us identify when something goes wrong, but it doesn't do a very good job at helping us determine why. We found with New Relic, it helps us with both. It gives us the feedback we need, but it also gives us the full-stack view and helps us pinpoint and be more responsive with outages.

The next tool is Splunk, which I have no complaints about Splunk at Nationwide from a tooling perspective itself. It by far solves the problem of manually navigating logs and searching through logs. One of the things I want to share today, though, is what happens when we don't implement DevOps tools in a self-service manner, which is we did not do that with Splunk.

So teams still have to put in work requests to get logs added to Splunk. And that's a problem that we recognize and we think we can fix. But the second problem is we actually have an internal chargeback model with Nationwide, and in this case, if a team wants to use Splunk, we're going to charge them for it, and the more they use it, the more we charge them. And that's not really the message we want to send with our DevOps tool chain.

So what we really want to think about going forward as we build out the tool chain further is we really need to find a way to make it barrier-free for teams to adopt DevOps practices and tools.

And finally, I'm really excited to share that in 2016 Nationwide has an internal cloud offering with self-service, auto-provisioned infrastructure at a fraction of the cost than we've ever had before. And what I really love about that story is that it motivates application teams to change their application architecture to fit onto standard platforms.

Instead of leading with a plea to please become standard, which a lot of our infrastructure and operations teams do for us, instead, we get to lure them in by the promise of total control, ease of use, and low cost.

So now I'm going to go ahead and introduce Jim, and he's going to talk a little about what application architecture changes we've been making.

Jim Grafmeyer

Thanks, Cindy.

I'm sure most of you are familiar with the idea of two-speed IT. So systems of engagement in your organization might move at a faster pace than systems of record. This is not something we're embracing at Nationwide. However, we're applying the concept to something we'll call two-speed applications.

The problem we faced was that our systems of engagement happened to be legacy monolithic web apps. These legacy apps had a tightly coupled business tier and presentation tier, so any change to either required a complex deployment of both. This complexity led to long lead times, so these releases only happened once every two or three weeks.

So, for example, if one of our business partners wanted to add some marketing text to the website to, say, promote some new product, they had to wait for two or three weeks for that to go live.

What we did was we completely separated the presentation tier from the business tier, and we built out separate paths to production for each. The presentation tier deployment process was engineered to be as lean as possible. We basically made it free for our business to change the prez tier as much as they want, and it's something that we encourage.

So now, that marketing change that I talked about that could take two or three weeks has now been reduced to hours, and those large complex releases dropped by 80%. They were replaced with 150 small presentation-tier deploys per month.

And probably the most important result of all is our business is in love with us. We gave them the speed that they desired without making them pay for some large application rewrite to microservices. We didn't make them buy a CMS system. This was basically accomplished with process changes to how we deploy our code.

So now that we can move really fast in the presentation tier, what's the next step, obviously? Let's shove a whole lot more into the presentation tier. So we've embraced things like Angular, where the whole MVC framework runs in that presentation tier.

So now we can do things like add new pages, new navigation, other complex functionality that historically took us weeks can now be done in hours.

Next, I'd like to talk through some of the architecture changes that we've made to the application. The first is this whole idea of a 12-factor app. If you're not familiar with 12-factor app, I highly encourage you to go back to your hotel rooms tonight, curl up, and do some light reading. But it's basically a manifesto for how to build modern, scalable, portable applications.

So what's an example of one of the factors? A 12-factor will say that you need to treat all your dependencies as services, and these services need to be able to be attached and detached at will.

Historically, a lot of us might be good at this for things like web services, but when you look at your entire app and all of its dependencies, things like databases are hard, file I/O is hard to treat those all as services that can be easily attached and detached.

So why is this important to us? There's the obvious benefits of decoupling, but what we're finding the most important is this whole idea of being able to control what we can control inside of our application areas.

Our friends in our infrastructure group are hard at work on developing Nationwide's hosting platform of the future. At this point, we don't know if it's internal cloud, external cloud, some sort of hybrid approach of both. We also have no influence on the timelines. What we can control, though, is ensuring that our applications are ready to utilize that platform the day it becomes available. And 12-factor will argue that if you follow all those guidelines, your application will be portable on whatever the hosting platform is of the future.

Next is the idea of feature toggling. Many of you might be familiar with dark launches. With dark launches, it's releasing code into production turned off, ideally to be turned on at a later date by a business partner. We're kind of considering this as table stakes for achieving any sort of continuous delivery.

What we were finding was, even when we were able to achieve speed, sometimes our business wasn't ready to accept the changes at the speed that we were delivering. There's a lot of reasons for this. I think Carmen hinted. Nationwide Financial is heavily regulated, and any change to our web apps could require legal approvals, different regulatory approvals, compliance approvals.

And this idea of dark launching code really decouples ourselves from those processes that we don't control. Once you get good at dark launching, there's a few other variations that we're finding important.

The first would be canary launches. So with canary launches, you roll out a feature to a small subset of users, test it out, things go well, roll it out to everybody.

And then finally, full-blown multivariate testing, where you develop multiple versions of a feature, roll that out to subsets of users, and then perform true test-and-learn experiments.

All this we're finding is key to drive down our batch size, which Cindy's going to talk about.

Cindy Payne

Thanks, Jim.

So we reviewed tools, platforms, and application architecture with you, but one of the most surprising things are the challenges we face when we are looking at process changes.

So our current process, besides the example that Jim provided and a handful of others, we've been unable to scale down our process to accommodate small batch sizes. Our normal process is designed for large batch sizes, and its primary focus was to ensure predictable delivery. And it does that, but we recognize it is highly inefficient.

So now we have initiatives at the enterprise level that are focusing on workforce, culture, and capabilities that'll drive efficiencies through those processes.

So what are the types of things we're doing? To get out of this vicious cycle of large batch sizes, one of the first things we're doing is we're focusing on fixing our methodology. Our current methodology is pretty heavyweight and the culture around it is that you're expected to justify any time that you have to skip a step.

What we're doing with the new methodology is it's agile-based, and that if you wanted to add steps to the process, you would have to justify it.

The second thing that we're doing is looking at our release processes. Currently, we're stuck in a time-based release cycle, where every application releases on the same day every month. What we want to move to is a readiness-based release, a lot similar to what Jim described when we decoupled our presentation tier from our business tier. Our presentation tier now has a very lean process that's all around releasing when ready. So as soon as we get business approval, compliance approval, we can go to production.

So when those two fixes are made, we now have designed our processes and methodology so it shouldn't crush our small batch sizes when we put them through. And once we can get to small batch sizes, we can get to simpler releases with less dependencies and faster delivery for our business partners.

On the business partner side, they will need to embrace dark launching, things like test and learn, being able to take calculated risks, and occasionally being willing to fail because we can recover quickly. Those are all new concepts. And those are still the challenges ahead of us.

But now Carmen's here to talk about the next steps.

Carmen DeArdo

Thank you, Cindy. Thanks, Jim.

So just to pick up a little on the last point Cindy made, sometimes I use the analogy of a basketball team. I think in the Nationwide culture, you're better off taking two shots and making them both than taking six and missing one or two, right?

So the idea that we would actually run an experiment, get results, say, "Did that add value or not?" is really counterculture to the way that we thought, and that's a lot of the psychological barriers you have to get through to actually run those kinds of experiments with your business.

So the results, I think, again, are the credibility to the culture change, right? We have an internal peer-to-peer teaching model. We have something called Teaching Thursdays, an internal TechCon, which Gene wowed them at last year.

Now I'm not the one talking about these things, right? It's much more powerful when the model lines, Jim, Cindy, the other model line that we have in our direct area, they're the ones talking. They're the ones demonstrating the results. They're the ones, again, that can lend credibility and say, "This is possible," and it kind of drives an internal competition among areas, which allows you to build that critical mass that you need to do a transformational change like this.

It does show that you can balance innovation and some kind of... standardization's kind of a bad word, but I'll call it disciplined innovation, right?

I worked and started my career at Bell Labs, and Bell Labs was known to be very innovative, but no matter where you went, there was a standard way to do things, and I used to say it was sort of like in the air. It was not a manual, it just was. You just sort of knew this is the way you did software development.

And so it is possible to do that balance of innovation and still reduce the variance that you need to, to provide the automation patterns that you need in order to go more quickly and build technology solutions. And then again, it gives you those results, right? They're solving real business problems. They're respected leaders, unlike me.

So their story is a powerful agent for transformational change. And I'm very grateful for their support because without that, we could not have gone further in this transformational journey.

However, the journey goes on, right? So where are we going next? One of the things Cindy kind of alluded to is this whole funding model, right? Part of the funding was around how we charge ourselves for things that kind of are disincentives, which why do we want to do that, right?

We still have a yearly funding model for our business initiatives. So we still have this kind of waterfall mindset around how we fund and how we're going to do work, right?

This whole idea of decision velocity. So again, a lot of folks I'm sure have moved to Git before us and it was like, what's the big deal? But in our case, it's like there still is that kind of risk-averse culture. If it's not broke, don't fix it. Well, that's not going to get you too far, right? Going to the next decade. You're going to have to break yourself, right?

I say you're going to get disrupted. You're either going to get disrupted externally, or you better disrupt yourself, but no industry is going to get through to 2020 without going through some kind of disruption.

This ecosystem for decisions, right? It's sort of like you can't look at each tool in your chain and say, "Well, I got to make a business case for that, a business case for that." I kind of allude like, do you have to make a business case for a hubcap, right? There's certain things that add value to what you're trying to do and you have to look at it from a more holistic approach.

And that's part of breaking down those silos, and if you have the budgets of the application owners and they're paying the run support budget and the licensing budget, right? They look at me and say, "Well sure, Carmen, it's not in your budget. You want to bring in these new tools," right?

So that kind of model again is what works against you when you're trying to do something from a DevOps perspective.

I think we need help, right? We see a lot of opportunity with IT service management and continuous delivery, right? We've recently implemented a new service management suite. As I said, we have a delivery pipeline. Things like release when ready. We want to automate that release, that RFC process, right? Why does this have to be this manual process? Why does it have to go in front of a CAB, right? We're doing all that readiness certification.

So there's lots of opportunities there to provide automation between processes like release, deploy, change, config. All those things provide lots of automation. And now with technologies like Tasktop, you have that capability available to do that if you can define those integration patterns.

But that sort of gets us to the last step, which is you got to sort of trust your certification, right? A lot of the things in the anecdotal stories I hear today about why we slow down is like, well, we have to slow down because this happened once and this happened another time and that happened another time. But the reality is the reason those things happen is because we really didn't certify readiness.

So once we start to get to a model of certifying readiness and automating it, we have to be able to trust it, right? Because if we don't, we're not really going to achieve the accelerated velocity that we want.

And then finally, and this gets back to another thing that was talked about at the conference in one of the focus areas that I think we had a very good breakout today led by Sam and Topo on, was audits and security, right? We have to be more innovative. There's more opportunities with this pipeline, with integration, with more process integration. There's actually more information to enable audits and security, but you have to build that into the pipeline and take advantage of it and automate around it.

And so again, as I say every year, need lots of help. It's an honor to be able to present and talk to people who... I know there's answers out there, so I'm hoping you'll seek us out later and provide that with those answers.

And again, thank you very much.