Banking on Innovation & DevOps

Log in to watch

San Francisco 2015

Banking on Innovation & DevOps

Capital One, one of the largest credit card companies and the largest digital bank in the US, is also a 20-year-old technology company. A couple of decades ago, Capital One disrupted the credit card industry using “Big Data” analytics to customize credit offers before the term “Big Data” was coined.

Our current focus is bringing humanity and simplicity to banking, and we are poised to disrupt the industry through technological innovations. To do this, we are embracing modern technology such as public cloud and open source, and principles such as Agile and DevOps, and recruiting and fostering the best technical talent in the industry.

We started this journey about four years ago. Today we practice Agile, DevOps and Lean principles across the whole enterprise. Insourcing of technical talent rather than depending on vendor-provided solutions is what helped us rapidly transform and get back to our roots as a technology company. We have internal communities of practice for all latest technology areas DevOps, Cloud, Open Source, Container, Programming Languages, etc., and we recently had our first internal Software Engineering Conference with 1200+ participants.

We are creating our own solutions based on a deep understanding of what our customers value.

We love open source. We follow “open source first” policy across our entire software stack. We are also in the process of open sourcing many of our internal tools, which allows us to increase quality and speed to market. We are building cloud native digital solutions using microservices and deploying them on the public cloud. Through it all, we have developed a culture of innovation, openness, and creativity while living up to our startup mindset.

In this presentation, I will share some background on this journey and some of our learning along the way.

Chapters

Full transcript

The complete talk, organized by section.

Topo Pal

Good morning.

I think it's the first talk, so it might be a little weird to start the talk with a bank being here and talking about their DevOps transformation. And thank you, Gene, for the kind introduction.

The topic of my talk is "Banking on Innovation and DevOps."

Before I do that, I want to say that I'm really honored to be here and to present Capital One's journey to become a modern-day technology organization. I have been a fellow traveler in this journey, and a lot of times I've been leading some of the key initiatives that are going on in Capital One in terms of DevOps and open source.

More than sharing my story, I'm here to learn a whole lot more from you guys, and this has been an awesome conference. Can I just pause here and ask for a big round of applause for the organizers?

Thank you. Thank you.

Before I begin my talk, I want to talk about Capital One. Most of you know that Capital One is a credit card company. We are one of the largest in the U.S., with over 70 million accounts.

Many of you know that we are also one of the largest banks in the country. In fact, we are the largest digital bank in the country.

But I don't know how many of you know that we are a small technology company led by our founder, and we are just 20 years old. If you look at the chart, our closest peer is about five times older than us, so we are kind of a startup in the banking industry.

But we disrupted the credit industry with our leader's vision. He thought that every single person in the U.S. had the same credit card, and it made absolutely no sense.

So we wanted to fulfill that vision. We used data technology and data science to execute an information-based strategy. We designed products based on customers' needs, passions, and life stages. It's not one size fits all. Everybody had different needs. We made adjustments to our products and offerings based on the data analysis, the results that showed up in that process. We predicted business results before full-scale market deployment.

And we have been doing big data even before the term "big data" was coined.

But we had this concept or this notion of banking in our head, and we wanted to change that to become a digital bank. Digital is our new branch.

What's striking is the way mobile has become the preferred channel for banking. It's a complete digital revolution.

Look at this chart. If you notice, right now mobile banking usage is about double the size of the usage on the web. So it becomes almost a given that we need to transform our core software engineering practices to compete with that kind of advancement in mobile technology.

This is what our leader said: "Ultimately, the winners in banking will have the capabilities of a world-class software company."

And that is our current focus: to be a world-class software and technology company.

But it took some time to get there, because our core value was data analytics. We were not so good at delivering software.

This is about four years ago. We were mostly outsourced. We were waterfall. Even though waterfall looks good in a picture, in a software industry, it's no good.

We did quarterly releases. We released software very carefully, slowly, with lots of planning, lots of documentation, meetings, approvals, tickets, so on and so forth.

And we had manual processes all over the place, right from building software to testing, to deployment, meetings, all those things.

And the most amazing was the change order process. I'll tell you a story about that.

I was hired at Capital One as an SOA architect like five years ago. This is my fifth year running. And I was in a design review meeting, or not a design review meeting, it was like a code review meeting.

People came up with a problem saying that their testing was failing because of some mismatch between the WSDL on the client side and the service side. What came out is that one XML tag field was kind of mismatching: `custLName` versus `custLastName` or something, and testing was failing.

Now everybody was thinking what to do. I said, "Change it."

I know what you're thinking.

And people kind of looked at me and said, "Change what?"

I said, "The tag."

"Well, we can change it on your laptop, but that's where it ends. It's not going anywhere."

I said, "What do you mean it's not going anywhere? If I change it and commit my code, it should get built automatically and get tested automatically and..."

"Ugh, stop it."

So what happened was it came to a point where, to change the tag name, you needed to have some kind of change order, because the people who coded are an outsourced company, and they did not touch any code unless you asked them with all these processes.

So I said, "I need to do something about it. Maybe just automate the build process."

And that's where I started with the seed of DevOps. We didn't call it DevOps at that point, because I think the DevOps term was kind of emerging in the industry. It was purely simply, how do I automate my build process?

And then we brought in some other concepts also, such as a good static code analysis tool, a good binary repository, and we lit some campfires. We took a small team, did all this stuff, and we found some good results.

Initially, before all this, the build used to take 48 hours. Yeah, two days at least, if everything went well. And if something broke, it came back, and then you had to submit a manual ticket to get your software built.

But with this, the build just happened in minutes, as you all know.

And then we shared this amongst our peers and leaders, and that resulted in many campfires.

Now, before it went out of control, we paused a little bit and thought that maybe we need to build a strategy around DevOps for the whole enterprise.

I was a part of enterprise architecture as an enterprise architect, as I said, so we thought that it must be a good idea to build an enterprise-level strategy around DevOps to actually tell people what DevOps means.

And this is what we came up with. We have the development teams, which had architecture, design, coding, and testing. And, of course, this is business coming closer to the development: typical agile transformation.

To get our builds done in time, deployments done in time, with a lot of automation, we needed operations to come closer to the development.

At the same time, there was this movement going on in the company about going from waterfall to agile. I showed this picture to everybody, everybody nodded their head, and then the last one was with InfoSec.

And they said, "Yeah, sure. You're doing that. How about security, application security, and all that? You can deploy your code to development, but then we need to scan and look at many things."

So I said, "Why don't you guys help me?"

And there it was. Information security said, "We are going to come closer to the development and see what you're doing and try to help you."

And we call that DevOpsSec. You call it rugged DevOps, DevOps security, or just pure DevOps, it's the same thing. Get all the people who have a stake in that software delivery life cycle, come closer, and do the same thing.

This is our DevOpsSec pipeline: everything as code. And then we build it, we deploy and do the test execution, and then we release and monitor. Pure simple.

Everything is code. Infrastructure is code, application is code, test, of course, is code, and everything is checked into source control. Nothing is sitting on somebody's laptop or some server. Everything is in source control, and it is peer-reviewed before you get merged to the main branch. You can do pull requests or feature branching or whatever, but the idea is the same.

It gets built and then deployed in many environments, starting from dev to int to QA to security to performance, all the way to production. Lots of test automation happens in every step. Sometimes they get manually promoted to the next environment. Sometimes they are automatically promoted to the next environment.

And then we release the infrastructure code and application code.

We also support tools like service virtualization, test data management, browser and device farms to automate many things in the process.

One thing that is striking about this pipeline is, I have to say that there are two news. One is good, one is bad.

The bad news is it's never going to be perfect. You'll keep working on modifying this pipeline till the cows come home.

The good news is it's only going to get better.

The flow of events goes from left to right. Feedback is on the other side.

But there was a problem building servers in a data center. I can do all this if there are servers available to me. Now, what we found is that building a server in a data center is not easy.

Sixty-two steps. Sixty days, at least. And these are some rough numbers. It can be different for you, but that's kind of the ballpark.

$25,000.

But that's not all. The pain. And every time that happened, I felt like this.

And the opportunity cost, who knows?

So what we thought is, we'll basically go this route: public or internal cloud. We called it a next-generation infrastructure team, and I'm a part of this particular team.

Our goal is to actually not have developers wait for any infrastructure for doing anything. They can go from code to deployment to wherever they want, with whatever kind of infrastructure they want, just with a click of a button.

We have been running our dev and QA environments on public cloud for a while now, but we have actually moved to production with some of the public clouds. And we also heavily use our internal cloud, and we want to actually verify what we do is correct before we move on to public cloud.

And we did this too. Right now, we are running some of our critical production load in Docker containers on public cloud. And being a bank, I think that's huge.

By the way, we also launched our first open-source tool in July during OSCON. The reason I bring this up is because, while doing public or private cloud and Docker containers on cloud, we did a lot of tooling ourselves because the whole area is not mature enough. So we went to production with the tooling that we developed internally.

As a part of DevOps, this is one product that I'm proud of. We use that to actually track the health of our software delivery pipeline, and we thought that we should open source this so that people can use it.

I'm the community manager of this product, and I'm also a core committer of this product. Our internal committers are from DevOps, quality engineering, and the delivery transformation team.

You can go onto this GitHub site, `github.com/capitalone/hygieia`, and you can find out there's a good video on this. But I'll give you a preview of that here.

It's a single dashboard where you can see the features that your team or the product team is working on, the coding activities. It can plug into your source control. It shows the status of your coding activity over the last 14 days, like through two sprint cycles.

It gives you an overall view of your build health, the failed builds, the passed builds. It gives you statistics on how many builds you are doing over the last seven days, the last 14 days, how many you have done today.

It gives you a view of your software quality and shows you if you have any blocker or critical defects in your source control, your unit test coverage. And it also can show you some security vulnerability in your code if you can tie into some of these commercial products that you may have.

It also gives you a deployment view in your environments and shows you whether your environment is up or down.

It also has a pipeline view where it shows your component life cycles through dev, QA, int, performance, and production: the versions.

And in the next releases, which are coming out soon, we are going to have a cloud view: number of instances that you have been running, how many are tagged correctly, how many are encrypted versus non-encrypted.

By the way, this is just some sample data, not the actual data. All our instances are encrypted.

And the number of instances that are up for a while and you're not bringing them down, and the CPU utilization on all these instances. You can get a list of all these things just by clicking any of these charts.

The whole idea is to manage your cost and health of your instances on any cloud environment. Because in your cloud environment, if the instances are running for long, it's no good, because then you are losing the patch cycles and all that.

And there is more to come.

We have developed some awesome tools in-house to help our data analysts, and we call that Analytics Garage.

What it is: if you look at any data analyst, they want to use a lot of tools in isolated form, and every day they ask for new tools. So we have built a platform, a Docker-based platform, that will help analysts to actually launch an instance with some specified tool or tools of their choice, run their analytics, get the result out, and kill that instance just by clicking a button.

We'll be open-sourcing this sometime early next year.

We also developed a tool which is basically a push button for getting developers' environment. Our goal is to have a developer join Capital One on the first day, get his or her laptop, then start coding, and deploy that code to production on day one. And this is kind of going towards that.

That is not all. We have been actually contributing to open source for a while, for about 18 months.

But before I go there, this is one of our blog sites, `capitalone.io`. We have been posting technical blogs over there, and we'll be announcing our future open-source software releases in that blog post.

These are some of our open-source contributions. We have lots of people contributing to many of these GitHub repositories, as well as these known open-source softwares, starting from Hadoop to D3, Apache Spark, Storm, Solr, Jenkins, Elasticsearch, and NGINX and all.

Now, there's a reason that we are doing this. The question is: why are we open sourcing our tools?

Any guess from the audience why we are open sourcing our tools? I just want to know what people feel, because I've been attending many conferences, and I've been in a booth in OSCON and many other places, and people ask me, "Why are you open sourcing your tools?"

Any guess from the audience? I just wanted to make it a little more interactive.

Recruiting.

Re-development.

Better quality.

I heard recruiting, better quality.

Development.

Good development and building a community.

But I think it is the right thing to do because...

Thank you.

Because if I ask the audience how many people don't use open source, guess what the answer would be. But if I ask how many people use open source, all hands will go up.

I think it's for too long that open source communities have been carrying big enterprises like ours on their shoulders, and I think it's time to give back. Time to give back to the community and be a part of it.

We have been using open source unknowingly for a long time, but right now we are quite open and honest about it.

Our policy is open source first. So if somebody is looking for a tool and asks for permission to use some tools, if it is not open source, they need to clarify why it is not open source. So anything that you want to use in our company, it must be open source first.

Second thing is culture of continuous experimentation and learning. The Third Way of DevOps. Thank you, Jenkins.

So what it does is, basically gives developers and all the people involved in the whole delivery pipeline to actually experiment with many things. And if it is open source, you are free to experiment. You can actually create your own tools and see how it feels like. It fosters a culture of innovation and experimentation throughout the organization.

Open sourcing makes it better.

You all know that you have teams in your company who are actually developing tools, and you go and ask them and say, "Can I take this tool and have the other guys use it?"

And they go, "Oh, no."

So the only way you can do that and make it better is tell them that, "That's a great tool, and we want to open source it," and then you find amazing things happen. The tools get better from day one.

We found it inevitable to do DevOps security the right way, and the reason I say is this.

So when we were transforming our whole CI/CD pipeline, to start it, we said that there is going to be only one copy of one version of a particular library in the binary code repository so that we can check for security vulnerability and the legal vulnerability.

And what we found was that prior to that, developers had all these different kinds of libraries. I don't know where they came from. They're in their own code repository, and it was very hard to manage.

So we said, "Okay, let's figure out where these libraries came from."

And what we found was that some libraries were already modified, and some libraries were kind of a collection of other libraries, and they're all sitting over there.

So we said, "Okay, first thing is we need to clean that up. But most important is what do we do with the things that we have already modified?"

And the answer was, "I don't know."

So I said, "How about giving it back to the community?"

And they said, "We are not allowed to."

But we have already modified it. Why can't we give it back?

So we started talking with our lawyers, and that's an amazing story.

One of my colleagues had been very vocal in this area in our company. He and I said, "Okay, let's do something about it. Let's get our acts together. Let's talk to the lawyer to see what he has as our notion of giving back to the community."

So we had a huge list of things that we wanted to give back to the community, and were looking to meet the lawyers. There was a meeting set up. In our company, the lawyer is sitting in a different section where you need a special badge to get in. So the meeting room was inside that particular section.

He and I go there, and the door was not open, and our badge was not working. He was trying to call the lawyer, the legal VP, and he was not responding, and the meeting time was up, or almost five minutes late.

So he started knocking on the door. "Please open the door. Please open the door."

And I think that was hilarious, but it also tells you the motivation that our people have to solve for things that they really care for. I think that was a great story to share.

Where are we now with all this?

Many of our early adopters, or the teams or the product teams, are seeing some amazing results. Some of these teams are heading towards a zero-touch continuous deployment model.

But at a high level, code commit from random occurrence to hundreds per day for a given product team.

Integration, which is end-to-end integration with all the other things, dependencies, from once a month to about every 15 minutes.

Deployment: manual to completely automated.

QA performance: once a month to at least four times a day.

Production release: monthly or quarterly to at least once per sprint.

Unit tests: unknown at that time, four years ago. Now it is about 90% plus coverage, which is pretty good.

On the other side, we also found some good statistics on defects during release. We saw about 60% defect reduction in production. The feature delivery rate has been increased by 50%. These are some rough numbers. We are trying to get more stats around it so that I can present it in an official way.

How do you scale at the enterprise level?

There are many ways that we scale. One is CoPs, communities of practice. We have lots of communities of practice around DevOps, security, cloud, architecture, design, programming languages, test automation, et cetera.

Office hours. We have an enterprise team, which is a collection of teams that support all these automation tools at massive scale. They're on office hours every day for one hour.

The whole idea is, if you have a problem, if you have a question on any of these tools or how to build a pipeline, or if your Jenkins job is failing, or you don't know what happened to your Maven POM file, you come to these office hours and get your problems resolved.

Voice of customer. Once a week, we have voice of customer, the customer being the development team and the providers being the providers of these automation platforms. They hear what the customer wants. Do they want a new tool? Do they want a new process? Is any process creating some bottleneck that they cannot get their work done? So we hear those and try to solve for that.

We also have an internal website called Pulse, which is our way to communicate within ourselves. And in that Pulse, we have a page called Got Goo? where people come and post their goo list, what is causing problems for them, and then people actually take up at a higher level and try to solve for those.

This is our internal Stack Exchange, where people post questions and answer them.

And that is not all. Recently, we had our first software engineering conference. We have been working a lot. We have offices all over the place, so we thought that it's a good idea to get all the people in one place and hear their voice and build some personal relationships face-to-face.

But what's striking is that it not only attracted developers. There are a few talks that came from our risk office that manages our risk across the enterprise.

For me, as a DevOps enthusiast, my day was made when I saw these two slides that I want to share with you.

Okay, before that, let me go through the stats. It was for two days, 1,200 attendees, 13 learning tracks, 28 tech expo booths that are organized by our internal people, 52 sessions, zero vendors.

Thank you.

Now, these two slides.

Our risk officers came and presented this slide in one of the talks: "We hear you, that you need to push code more quickly. You need to understand the risk requirements earlier than later."

So I think that was a big win for having this kind of conference and have the risk office come and talk to us.

And this one: "Innovations in risk? Hell yeah."

That made my day.

I'll end my talk with a short clip of that software engineering conference.

Video Clip

It is very, very rare that we get this much talent together in one place at one time, and it's great that we're coming to this conference and learning a bunch of stuff that's new and what's going on in the engineering field.

But I think the other big value of this is actually the building the relationships piece. And so I challenge you guys, by the time that the day is done, that you build a relationship with three folks that you've never worked with before.

And today, why host a Capital One internal software engineering conference? And what does success look like for something like this?

The technology organization, I hope you've noticed, has been increasingly investing in the technology organization, right? That starts with the people. It includes our capabilities, and it includes the infrastructure.

I find conferences are a great place to learn, and if I can get a nugget out of every session, I find it to be hugely valuable.

In something like today, where your Capital One peers are telling you the things they're working on, the impediments they're hitting, the successes that they're having, it's actually applicable directly in our environment. So I hope that that's actually something that you also found today.

Topo Pal

That's it. Thank you very much.

Q&A

Q: So I got to spend a couple of days with Topo, and there was actually a piece of the story that you left off in terms of how heroic the effort was to find a lawyer, internal counsel inside of Capital One, to support your cause. Could you just briefly tell us about that? Because I think you sort of underplayed just how heroic an effort that was.

A: Yes. So I think, just to start with, it was not that the lawyers were against us. They just didn't know what we were asking for. I think that is very important: to communicate to them or any stakeholders in this process and let them understand what you are looking for.

It's not just about, "I want to use this open source, and please approve that," because they don't know what it means, right?

So this Al Sell, he's my awesome colleague, and he's tall, right? He can make things happen.

And he had this burning desire actually to make things happen. So he wanted to meet the lawyer. It's just that he was getting frustrated.

So that knocking on the door, I still remember seeing that in front of my eyes. "Please open the door. Let me talk to you. I have this list of open-source software that we have modified, and we don't know what to do with it."

So I think that was amazing. It shows you how motivated people are in this environment.

Q: And how many lawyers, internal counsel, said, "No, I'm busy"?

A: Initially, there used to be a lot. Now, actually, they are with us. They understand. And as I said, the risk management office is saying that, "We heard you, and what you are doing is the right thing to do, and let me help you."

Q: So this vision of, what's his name?

A: Al Sell.

Q: Al, is that right? But basically going door to door, talking to every lawyer, trying to find someone to adopt the cause.

A: Right.

Q: It's just this astonishing story. Thank you so much.

A: Thank you.