Patterns for Enterprise Success: The DevOps Journey at Nationwide

Log in to watch

San Francisco 2014

Download slides

Patterns for Enterprise Success: The DevOps Journey at Nationwide

Carmen DeArdo

Director Application Development · Nationwide Insurance

Hayden Lindsey

VP and Distinguished Engineer · IBM

Your business depends on software. So it’s critical that enhancements be meaningful and timely – but you are confronted with legacy IT systems, process complexity, and organizational silos that take too long to deliver the software changes needed to support your business. To tackle the delivery of complex enterprise applications, businesses are embracing DevOps, a software delivery approach that focuses on speed and efficiency without sacrificing stability and quality.

In this session, join Carmen DeArdo, Director Application Development, Nationwide Insurance and Hayden Lindsey, IBM VP and Distinguished Engineer, to learn about DevOps. Carmen will share how Nationwide has improved software quality by 50 percent and reduced system downtime 70 percent by implementing DevOps processes and tools. Hayden will discuss IBM’s POV on DevOps, including current and future capabilities to drive ROI by integrating new systems of engagement applications with existing systems of records running on System z and Power platforms.

Chapters

Full transcript

The complete talk — auto-generated from the talk's captions.

I'm going to talk about DevOps, and I'm going to focus on DevOps for the systems of record. And you saw that little upside-down triangle or what have you, that showed that those systems move more slowly, and maybe there's not too much DevOps happening there. That doesn't have to be the way it is, and the fact of the matter is, if the systems of engagement are moving very fast and the systems of record are not, you have a problem because the apps span both of these, and that's what runs the business. And so we have to be looking at DevOps for the systems of record as well.

And just to motivate a little bit, I just like to call these the fun facts. We forget these things sometimes. Maybe you never knew them. But CICS is used tremendously.

This is a transaction processing system that runs on the mainframe 18 times more frequently than Google searches. And COBOL is still responsible for the majority of the business transactions in the world, not to mention PL/I, and Assembler, and RPG, and things like that. However, the people that are coding these systems, maintaining, writing new systems, are not getting any younger, and so we need to do something about that as well. The only thing I want to mention here, since you already know what systems of engagement and systems of record are, is that there is this impedance mismatch.

There's need for speed on the front end. If you can't keep up on the back end, you're going to put the business at risk. So we have to do something better in the back end. And here we are dealing with inertia.

People have been doing things the same way for decades. Not years, decades, and it's time to improve. And by the way, if you're not, most of you I know are not, working on these back-end systems and these older, mature programming languages, you probably work with some people that do, and so it is important. Your job will be a lot easier if they embrace these technologies.

Now, this is my one from marketing slide. This is the IBM point of view of DevOps, which as you can see, is the entire development and delivery life cycle. And we're not trying to just confuse everybody with doing this, but we thought about DevOps and we said the principles of DevOps, like automating everything you possibly can, is exactly what we've been doing with our collaborative life cycle management or application life cycle management for years. And so we said, "Let's just fold it all under this term." And if that confuses you, I'm sorry.

The other thing, it's not just about, obviously, speed of delivery, it's about speed of getting feedback. And you guys all know this, but that's why there are a lot of circles and arrows and such on this slide. All right. I condensed a whole lot of slides onto this one, and I'm just going to give a few examples of the things that I see out there, because I do visit with a lot of clients around the world and talk to them about what is possible.

But the state of the art or the state of the norm out there is that people doing these back-end systems on the mainframe or on power systems or what have you, are using the most outdated tools you can imagine. How many people have heard of something called ISPF? All right, there's a few people who are going to understand how awful this is. This is a green screen editor that is over 30 years old, and I would assert that there are at least 95% of the mainframe developers are still using it, despite the fact, for more than a decade, we have had Eclipse-based modern tools for doing COBOL development or assembler development, which is exactly the same as what you'd use for doing web or mobile or Java or what have you.

So there's a lot of room for improvement, and the same is more or less true for the team tools, source management, defect tracking, and that type of thing. Second problem, as I already mentioned, these people are not getting any younger. So the challenge, of course, is to bring in new talent to take care of these systems and enhance them when folks decide to retire, which they will. Additionally, especially the team tools, but also the IDEs are totally different, and they're disconnected from what the distributed web and mobile folks are using.

And so you have a lot of time and effort wasted on coordinating across the environments. And then there is a lot of FUD out there. I don't know who to attribute this to, but there's a lot of FUD saying, "Well, oh my God, they're not teaching COBOL in university anymore." Well, by the way, when I went to university almost three decades ago, they weren't teaching COBOL. Okay?

I learned that very useful commercial language called Pascal. Okay? And then I went to work for IBM one summer and read Kernighan and Ritchie and wrote 3,000 lines of C code because programmers learn languages. And the people that are coming out of university now, you don't ever hear them say, "Well, I can't learn PHP.

I can't learn JavaScript." Well, they can learn COBOL as well. You just pay them. They will learn it. Okay?

So, anyway. That one I like to get on my soapbox just a little bit about. All right. The other thing is manual testing is the norm, and if you can't automate your testing, that is going to be the bottleneck in your software development and delivery lifecycle.

And there's a lot of cross-platform coordination required and so forth. Now, what can you do in order to move from at least in the direction of the unicorns and start down this DevOps path? Now, when you're coming from the place where this backend development is going, it's a tremendous challenge, it's a tremendous cultural change, but of course, the opportunity is way greater than if you're coming from a more modern place. So the things that you can do, you can have modern IDEs, as I said, that are exactly the same as they're using for the distributed development.

In fact, we have one IDE that has everything from mobile support to COBOL and Assembler all in a single IDE. We have team tools that allow you to unify onto one team platform all of your development people. And whether you're dealing with JavaScript or COBOL, independent of where it's going to run in production, you have a tool to integrate and let the tool do the coordination, the builds, deciding when to integrate into a build. If you have a defect that requires mobile changes and COBOL changes, it will do that coordination.

All right? There's a lot more tools, but that's just an example. As you've heard, if you're trying to do culture change, you're not going to roll it out all at once, so start with a pilot. Gain some success and some confidence, and then put together a rollout plan.

And if you're talking about large enterprise, it is going to take several years to roll out to hundreds or thousands of developers across dozens of teams. And there's already been mention about having exec sponsorship. When you think about these backend teams where they've been doing things for decades, inertia is the enemy. You're not going to have the grassroots adoption like you have in the distributed arena.

Okay? So you must have the top-level support in order for the tree huggers to start changing. And then once you start getting some success, people will get on board. Okay?

But nobody wants to be first. People are very risk-averse. All right. And there's lots of other things.

Clearly, you need to automate tests, then I think worry about automating deploy. You can virtualize service or stub things out so you can test earlier. In the mainframe case, we have a solution for you to actually test off the mainframe so you're not using MIPS. The mainframe, you have to pay for your cycles.

And so if you're going to increase your builds and tests by a factor of 50, or 100, or 1,000, and you're paying for those cycles, the CFO will not allow it to happen. So move it off the mainframe, have your own test environment owned by the development community, just like it's done in the distributed arena. All right. So I want to give just a couple of case studies very quickly.

I'm going to do these very quickly. And this one is a little bit eye-popping, but these are the customer's numbers. By adopting Agile and DevOps, developer productivity up 1,600%. There's no company name on here, so it's hard to verify, but that's what they say.

All right. By doing, I don't know, by automating deployment, deployment time goes from up to two days to a few minutes. We all know that this is possible. That's what automating deployment's all about.

And automating impact analysis with a tool versus manually. This client says 10 to 20 times faster. Again, you believe it because that's what a tool can do, crunch through stuff. And when you're talking about millions of lines of code, trying to do that manually, you know it's going to be hugely error-prone as well.

VP Securities, a client I work closely with, they converted, did an automated language conversion from an old 4GL to a modern one that we have with virtually no problems when put in production. And Cathay's, they decided to consolidate from three team platforms and SCMs and so forth onto a single one, and they feel like it improves communication and integration and so forth a lot. Now, and this is my last slide, and I'm going to ask Carmen to come up in just a second, and I know I've taken too much time. The one help that I would say, and it was actually the IDC speaker talking about it.

We have a lot of metrics. Now, this one... Okay, I keep wanting to point up here. Cuts cost.

Is it 1% or was it 75%? So we need to get a little bit more quantified results, and the other thing is be able to translate things not just in technical terms like we deployed in two seconds. What does it mean to the business? So anything you guys can think about around business metrics, that helps convince the CFO and the CEO, not just the CTO or CIO.

All right. Carmen. Thank you very much. Hi.

So I don't have Buzz Lightyear. I could've shown Peyton Manning eating a chicken Parmesan sandwich. Anybody seen that commercial? Feedback for marketing department.

So Nationwide, all aspects of insurance, pet insurance, financial systems, 35,000 employees, lots of regulation in different industries.We have about 8,000 IT professionals, and we have now 105 agile teams, and that's growing at a rate of 35% a year. So 64% of our development work, new project build, is going through agile teams. So how'd we get there? I joined Nationwide nine years ago.

We didn't have anything going as far as agile goes. Hayden, I think there was some reference to an idiot. I don't know. There was unicorns and horses, and I guess I'm the idiot.

But yes- ... go talk to the mainframe teams. If there's anything worse than talking about agile, talk to a mainframe team who hasn't hired anybody this century about continuous integration, and- ... then it's like, what...

You. Yeah. So how did we get there? So I think we started a lot like I think you heard about GE.

We started with a small area. We did have some believers. We had what I would say dabbled in agile. We had started projects.

We hired consultants. We did the project. We did the norm, form, storm, perform, whatever order that's supposed to be. And then we declared victory, everybody went off, and we had nothing persistent in our environment because everything was project centric.

So we said, "Look, that's not going to work. We've got to change our model. We've got to build agile teams around our most important assets and bring the work to the teams, and keep those teams together and invest in practices, both engineering practices and management practices, around agility." So we started with some true believers that were spread across the organization. We had the cover of a VP, a senior VP, which was very important to get started because in the beginning, we talk about DevOps being a bad word.

Well, agile was a bad word, and if anything went wrong, it was those agile guys' fault, right? If the toilet overflowed, "Were those agile guys in there? What's going on?" Right? It's agile.

Everything was agile. So you need some time because things are not going to go smoothly at first. You need some time. You need to get your footing.

You need to establish some common practices and tooling. Yes, you need innovation, but you need some kind of standardization if you're going to have some kind of scaled approach if you have 2,000... We have 2,000 developers, so we're going to have to have some discipline to go along with that innovation. After about a year, we started to get results, right?

And you'll see them on one of my later charts. Quality's up, productivity was up. Everything was good. And so then you have a story to sell.

But in that first year, you need a little bit of cover, and it was investing in that concept of an application development center and having the backing and some of the cover of management that allowed us to begin. We did start with systems of engagement, okay? So we didn't jump right into a mainframe system. But we also were saying, "We're going to apply this across all technologies.

We're not going to build something that works just for one set. We're going to do this for Java, we're going to do it for .NET, and yes, we're going to do it for mainframe." So we did start very early on to bring mainframe teams into the application development center and apply some of these practices. So it makes you think about what does continuous integration mean for mainframe? Right?

You need a development environment. One thing we learned early on was we were getting a lot of value out of the development environment. The teams own the environment. They could deploy on it constantly, multiple times a day.

They would run all their tests. Anything that was broken, they would fix at the beginning of the next day. We developed patterns of provisioning those environments so that new teams could get them right away. They had Maven, Jenkins, Sonar, security analysis, all those things in that environment.

Well, we needed to do the same thing for our mainframe teams. And there is the ability within the IBM stack to start to bring some of that modernization, not just in the Eclipse IDE, but also in having a virtual mainframe that you can run and treat as a development environment. And it's not quite as slick. I'm not going to say yes, it's just as slick.

You push button, check in, runs continuous integration. But you can meet the goals of having an environment that the team owns for their hard test cases. And the first time we gave it to them, I used to say it's almost like bringing a Ferrari to people who don't know how to drive that much. They look at it, it's pretty, but I don't know if I want to touch it, and I really don't want to get in it, right?

Because it required a skill set they weren't used to. They weren't used to system programming level of really owning. It's like, "Yes, it's okay. You can IPL the box." Really?

I don't have to call five infrastructure people and submit five requests? "No. Look, there it is." Oh, wow. Call the police.

So it was- ... people say they want to be empowered, but sometimes you sort of challenge that. Well, here it is. Go run with it, right?

And we do have traction now with our mainframe teams taking advantage of some of those modernization tools. So what was the road to change? As I said, we sort of changed the model. As you've seen before in some of the other presentations, we have cross-functional teams.

So we have teams of 12 to 24 people. We have paired programming, eight to 10 developers, testers. There's a product owner who sits with the team. There's a Scrum Master iteration manager who sits with the team.

We have an infrastructure delivery lead who's assigned to an area who can help work through some of the environment issues and other infrastructure issues for that team. So it-If you look at it, what we did is, although we weren't thinking about it at the time, we sort of took the DevOps model that Hayden described, and we optimized the middle. We optimized the design, develop, test part of it. You have to start somewhere, and that's sort of where our investment was, and it has proved very successful from that perspective.

So where we are today is, as I said, we have about 100 or so of these teams, and they're growing at 35% a year. And this is not being mandated. So we did not come across with a heavy hand and say everybody's got to do this. What we did say, if you're going to do Agile, here's the way you do it.

We also obviously have a culture of learning, a culture of continuous improvement. So it's not as if we feel we have the answers. We know we don't have the answers, and we continue to get better. But if you're going to do this, we have two organizations.

We have the ADC. This is where we sort of have the pilot and where we've grown to scale this, and we continue to do new things there from a continuous improvement perspective. We also have an organization that I'm part of which helps do the training, the practices, makes the standard tools, sort of drives some of that roadmap. Because you do need to make an investment if you're going to get 2,000 people doing this at scale from a development perspective.

So as I said, we sort of optimized the middle. But we're not really satisfied with that, because if you look at the end-to-end process, we still have some issues. So from the middle perspective, everything's very good. It shows up in a prioritization.

We have epics, we have stories. It shows up on this backlog. We do visual management. It goes through its set of iterations, and then we mark it done, and life is good.

But we have some problems. First of all, where do these things really come from the backlog? We have all these team spaces now, open spaces. We have tooling.

We have some of the rational tooling for the visual system management. But it's like, where's the attachment to the actual business portfolio? Because we have a very complex business model. We have around 20 different business units.

So if you look at our main offices, life and auto and homeowners and pet insurance and financials and retirement plans, they all have their own portfolio leaders and plans and everything they have to do. So that planning component is kind of separated. It's not visible. We sort of have this hidden factory going on because the work gets planned, it's in spreadsheets, and then through a lot of manual flurry of activity, it may show up in a backlog, and then it goes very smoothly through these iterations.

But then where does it go? Well, we don't deploy... I like the distinction made this morning between delivery and deployment. We don't deploy at the end of that iteration.

It's still got to go through some managed environments. But as far as having our visibility, it's sort of dropped off the face of the earth. So we need to expand some of the agility and lean principles that we're doing across that life cycle. So some of the things that we're currently doing is things like how do we start with the beginning, the projects, the work requests?

We use Clarity for a lot of our initial planning work. That starts it. We need visibility to that as soon as possible. A lot of the work we do impacts multiple applications.

There's a lot of dependencies. You need to have visibility to that, because if you don't, then what's going to erode speed is going to be all those dependencies. We heard about that before. The other thing is trust.

If you can't see what's going on and you have a bunch of manual activities, you're going to slow down because you're really not sure what's around that next curve. So we're starting to work to build out this end-to-end model, which we have prototyped right now in the lab, where you start to see the release planning. These portfolios are becoming visible. They're attached to the team's backlog.

So you can start to do some flow leveling. DevOps and continuous delivery is about continuous flow. So you need to be able to see where do I have opportunities to do work. If I don't know when I have opportunities to do work, I'm going to miss those.

We're very good at delaying things. We're not good at looking for early opportunities to do work. Sometimes we have teams sitting around starved for work. Well, that's a form of waste.

They're sitting around, they need work to do, and yet we don't have enough agility and visibility to our front-end processes to actually get that work to them so that they can continue to go through this design, develop, test model that we've created. And then on the other side is the deployment side. We still have a lot of governance and a lot of... How shall I put it nicely?

We have this thing called 4-2-1. Which essentially guarantees we're going to delay every delivery by four weeks. And actually, it's a good thing, though, because we had so much chaos around delivery, you have to start somewhere. So we started with some standardization around scope lock, code freeze, scope freeze, test freeze.

I'm sure this resonates with some of you out there. And that's okay as a stopping point, but if we're still doing 4-2-1 in five years, we've failed. We haven't gotten any better. I was kind of joking with the release manager, and I said, "Well, if I take the software and I put it in the corner for a week, and if I put it in the corner and come back a week later, did it get better?"I don't think so.

Time does not make software better. It's the fact we're doing all these manual activities to guarantee the quality of the software that causes us to go slow. It's like, again, driving a car. If you don't trust the road, if the weather's bad, you're going to slow down.

Well, we were slowing down because we couldn't trust the quality of what we were producing, and one of the reasons we couldn't trust it was because we couldn't see the information. We have different information and different tools. In our quality system, we have information about defects. In our security system, we have information about security scans.

When we deploy, our deployment process was completely disparate from our release process. It was like another manual activity. Nobody knew what the path to production for these systems were. Unless you talk to the tech lead or the system engineer, you don't know how many environments they have and what they're going through.

Or do they have an IT? Do they have an ST? When are they doing PT? Are they not doing PT?

Where's their UAT? You don't know these things. And so because of that, it was a bunch of manual activity, and again, we didn't really have a model of the system. So what we've done now and what we're piloting now is that complete end-to-end model where, through some integration capabilities, we have a lot of different tools.

IBM's a great partner, but we have HP tools, we have CA tools, we have other tools. So we're leveraging things like Tasktop to synchronize data across all that. So when a tester puts in a defect in Quality Center, it shows up in our Rational Team Concert as a work item to work the next day. If there's requirements work getting done in RC, it shows up in Quality Center so that you can write test cases against those requirements.

So there's a lot of capability now to integrate tools, integrate practices first, because tools can only serve practices, and if you don't have some common practices, it's going to be hard to implement common tooling. But to integrate some of those tools in a way so that you can sort of see the work being visible and also drive continuous delivery. So to me, plus, that's sort of the next hurdle is to take what we've learned in the agile development space, apply it end to end. So here's some of the results we talked about.

Critical defects. We're getting to the point where we find very few defects in system tests. We're able to reduce those intervals. Productivity, on-time delivery.

Our system availability is higher. So the kind of things that you would expect when you're starting to apply some of these practices and you're finding things earlier in the process when they're less expensive to fix, and you have a lot of automated testing that you can rely on so that you can embrace change. So to the Yodas out there, what could we use help with? So I think if you've listened to me, you know I need a lot of help.

So this whole planning thing, we need to get better at agile planning. We still have a yearly planning cycle. So even though we've sort of changed everything behind the curtains, we've tried to keep the business sort of out of it as much as possible, but at some point, the business has got to be more engaged. Yes, we have a product owner who sits on the line and helps drive the team, but the actual people in the business, senior levels, the people who are planning the work, we have to get off this yearly cycle and start to be more agile and also break things down into smaller chunks so that we can feed them through the factory.

Product ownership for shared applications is a problem. The model sounds nice when you have somebody there who's the product owner and can prioritize, but if you have five different business units all using this application, all thinking their work is the most important, how do you deal with that when you're trying to schedule the work in the next set of iterations and releases? The whole fear of silos, you've heard it. How do you overcome fear of the comfortable silos?

One of my favorites, ITIL. We have a very strong ITIL presence. I had some passionate conversations about people who want to optimize the ITIL stack, service, problem, change. When they talk about release, I say, "No, no, release can't go there.

Release is part of the continuous delivery," and that's when you get called an idiot again. So how do you deal with that? I've seen some talks, well, ITIL and this, and they can all work together. But I'm looking for suggestions on how to make some of those things work because you have to sort of pick what you're going to optimize and there can be some clashes there.

And then again, executives. Executives like things planned. They like certainty. This is creating more of an adaptive mindset.

It's saying we may not be sure the next three months. For now, we know what we're doing, but three months from now, I don't know. It depends what we find out. Well, that's not going to get you many points working with executives.

They would rather see the next 18 months planned. That's sort of counter where we're trying to go here. And then metrics, and I appreciate some of the metrics that have been shown because I'm going to use them because I think one of the things we're trying to sell is I really like that S&P metric. If I'd have known that, we could all have been on vacation.

But anyway, those metrics are important because we don't really even have a metric right now around delivery speed. I like the saying, time is the new currency. I actually said, I was in a meeting with one executive, and I said, "Well, this is going to help us get faster for delivery." And they said, "Well, do we want to get faster?" This is being taped, unlike the Disney thing, so I could be fired after this. I shouldn't have signed that waiver.

I don't know. I think we want to get faster. So you can be very successful. We didn't hit rock bottom.

We had a talk about, well, you hit rock bottom. You hit rock bottom, that's bad, but in a way, that's good because then people are ready to change. If you do heroic work to keep from reaching rock bottom and you hide all these things, people say, "Well, it's not really a problem. What do we really need to do?" So those are the kind of things I need help with, and anyway, thank you very much.