DevOps at Verizon

Log in to watch

San Francisco 2016

DevOps at Verizon

Chivas Nambiar

Director, DevOps and Public Cloud Platform Engineering · Verizon

Ross Clanton

Fellow, DevOps · Verizon

DevOps at Verizon

Chapters

Full transcript

The complete talk, organized by section.

Chivas Nambiar

So I just want to give you a sense of who Verizon is, right?

You've heard of Verizon. I'm sure a lot of people have used Verizon services. But what is not always apparent is the range of things that we do. We are your internet provider. We are your largest LTE cell phone provider. But we also have a very strong enterprise presence, and now over time, as you've heard us talk about Yahoo and AOL and things like that, we're building a digital and media services organization.

So, a pretty wide range of things that we work across. And that's relevant to what we're going to talk about today, because the one thing we see that is common across all of those areas is that our customer base is changing, and our customer profile of what people want from us is really focusing in on being digital, being instant, being very simple, because people want simple experiences, and something that's reliable and intensely personal, right?

So as we start to think about how we, as a company, evolve, this is the backdrop. This is what our customers want from us. And everything that we do, these are the common themes.

It's not just customer experience, though. You have to think about the deep technical changes that are happening in our industry. If you look across the top of that screen, you can see year over year what's happening is the data that our customers are using, that's starting to ramp up, and it's doubling every year, really, more than anything else.

And it's driven by experiences that customers are starting to demand across their streaming experience, their high-bandwidth TV experience, and some experimental stuff around virtual and augmented reality. So there's a deep core technical component to what we're trying to change across our enterprise.

So what this all means is that we need to retool that consumer business that we have, that we run. And we kind of recognized that this was coming way back in 2010, 2011, and we said we're going to get on a three- to five-year journey around how to do this. And the largest aspect of it was, how do we get faster?

And we decided that the way we get faster is that we start to reduce the complexity of what we're doing. So if you look at the right, there are some key architectural principles that come to play when you make a decision like this, right?

Which is, we say we're going to get to a single customer experience. And to get to a single customer experience, you have to build a set of systems and build a set of architectures that enable that. And if you think about the history of Verizon as a company, we've come together over acquisitions over many, many years.

And so at some point, like any other large company, we had thousands of systems that all kind of got patched together, that all got put together to make it work. And so defining that we would end up with a single, for example, catalog, or a single billing environment, is a massive undertaking, but something we absolutely had to do to kind of retool the business.

The second part of it is, we looked at it and said, this can't be a business-value-driven conversation where we're just talking about cost savings, or we're just talking about what is it going to do for us in the next year. This is a seven- to 10-year journey, and we're setting ourselves up for success.

I'll give you an example of what that looks like for one entity. So we said we're going to build a unified customer experience, and this is kind of what we started to look at. If you look at this, each of this is a suite of applications that we use to provide our online experience, our billing experience, and our back-office and order-management experiences.

And as we look at it year over year, what we're starting to do is eliminate a cost associated with systems that we don't need. We start to reduce the systems that need to talk to each other, right, and simplify that. And again, going back to those architectural principles about getting to single ones of each of these.

And we're well along our journey, and as you can see, we started to shut down systems that we don't need to use: 140 systems that we shut down to actually create one national billing platform. So, tremendous amount of work that you don't quite often see directly in customer experience, other than the fact that our customers then start to get a better billing experience that's over a few years.

Anyone who's worked in a large company has seen a chart that looks like this. This was our systems landscape for one part of our business. And as you can tell, and it's not intended for you to be able to read it, there's tons of systems and everybody's favorite acronyms in there.

But over the course of these years, from 2011 to 2016, where we are today, as we built the single view of the customer, and as we simplified all of the call-in experiences that you expect to get, and as we simplified billing, we went from about 700 applications down to just over 200. And we're still starting to churn through these systems and shutting them down.

This is important because if you don't do this, you really can't start to do a lot of the things that you talk about when you're talking about Agile systems, right? If every one of your changes has to touch the 200 systems or 300 systems that make up the value chain of how you get a change out the door, there is no way you're going to get to Agile or DevOps or any of the cool stuff that we want to do.

There are some interesting things that happen, which is you start to consolidate all of your data in single places, and now you've got a good, unique view of your customer that builds the personal experiences. We decided architecturally that we're going to build patterns of API gateways that allow us to kind of take the changes as they were happening and protect layers above it, right?

So how do you make sure that one change in one system, or as you swap it out, it doesn't break something upstream? We decided on one single unified user interface, and out of all of this came a common analytics platform that we've been using to personalize.

So this is all kind of the technical path to how we got there. Again, if you're interested in details around how we went around doing this for the different functions, stop by. I'm happy to chat about it.

There were some secondary benefits to this, and the secondary benefits to this is what actually happened in our infrastructure consolidation effort around this. So as we start to take these systems out of the environment, the single biggest benefit is that we start to remove pets from our ecosystem.

So we were pretty much a shop of pets. Every single time we had a new feature, a new product that we wanted to build, we'd go out and build this amazing, new, unique thing. These grew up to be big, hairy dogs that just stayed around forever and ever. I'm not going to do the, "We took our dogs out back," part of this conversation, but really that's what happens.

And then, as we start to simplify and we get to single customer profile, single billing interfaces, we took all of that infrastructure out of the system, out of the ecosystem, and that allowed us to shut down our data centers where we didn't need them. So we went from 14 to seven. We started consolidating mainframe, and yes, we have a lot of mainframe, and we still have a lot of mainframe. But at least you start to take some of that processing out of the ecosystem.

And then the other part of it is it allowed us to make technical changes in our infrastructure, where we went from physical to virtuals and got to a state where these things no longer look like pets. And these are things that we can instrument, and these are things that we can consider infrastructure that's driven as code.

So that was phase one. This is phase two.

So we've then spent now since 2015, the beginning of 2015 is when we said we're going to start to change what we're doing and use DevOps as an acceleration strategy. So the simplification phase was done, and we kind of laid out this aspiration for the entire company.

And communicating this aspiration was important. We kept at it for almost a year, year and a half. A big part of it was trying to push through that incredulity, where people come in and say, "Oh, we've heard this story before. You always tell us about all the cool things that you're going to be able to do, but can we actually do it?" And we went through that for a little while until we started to see some results.

And we went and talked to teams about it, and we kind of walked people through this concept of a value-creation loop, which is, how do you allow developers, how do you allow your engineers to start doing smarter experiments? How do you allow them to unleash their creativity that really drives the customer experience?

And then there's a little bit about how you scale, which is something we're working on. But if you think about the first two segments of what we were talking about, a large part of that was kind of my new job in the last couple of years, which is build a platform that makes it easier for your developers.

Don't go tell them that every time they need to go build something that they have to start from scratch. So we started building a platform, and everybody knows all these logos. I'm sure most of you use it if you're down this journey, but really it is building a better software development lifecycle platform.

Getting to that single code repository where people can all see what everyone's working on. A default, open Agile tracker is super important, because really, you don't want to have people hiding what they're working on or just not even knowing what they're working on.

Starting to build some automated testing infrastructure. Again, it doesn't have to be perfect. Just start and give the team something that they can use by default, and what you'll see is that that gets better and better every single day.

We started to embed some code quality and security scanning in the mix as well, because part of this is, how do you take that ecosystem where you're pushing more changes through, you keep that safe, you keep that secure.

And I like to call it, we optimized for enthusiasm in the first year. It was grassroots-driven. It was not a top-down mandate. And the way, really, that we pushed this forward is we took the business benefits that we were able to derive through this process, and we kept publicizing that, and we get teams to show it off.

And then you get this conversation around, wait, if that team can do it, why can't I? And that's what's driven most of the journey at Verizon for the past year, year and a half.

Did it succeed?

So if you kind of look at it and how we're measuring adoption, at least in terms of how the tools are used, how teams are transitioning, it's widespread across the enterprise. We started with, I want to say, 100 Jira licenses at the beginning of the year, 2015. We're at 15,000, which meant that the last couple of months ago, where we went to the Jira Summit that they had, they told us that we were one of the largest Jira customers in the world at this point.

So if you think about it, and that's all driven by grassroots adoption. Making the platform easier for your developers to use really gets them to adoption.

The other part of it is measuring success. Not driving people to the metric, but just saying, "Okay, here's what you're doing today. Here's what you're able to do day after day after day." And we have proven measures and metrics that show us that quality and velocity started getting better, and these are kind of the things that we look at.

The other side of it is what allows us to look at our change and the safety in our changes. We were able to see 30% reduction in production incidents, 30% reduction year over year in our outages. That's a pretty cool story.

The second part of this conversation was around building and scaling. So how do we build and scale? And this is what we've been working on for the past, I'd say, almost eight months. And that's really building a cloud environment where we got to this point about halfway through this journey where everybody was building things faster, and then they'd run into the infrastructure wall, because you'd have operations teams that still consider infrastructure this amazing storyline that nobody could touch anything because things would be unstable.

So we started to retool our private cloud infrastructure, built an OpenStack and a Cloud Foundry environment over the course of about three months, got it out into production. We started putting applications in there. And what that did was it helped us test our hypothesis that if you give developers infrastructure that's API-addressable, that they will start to use it, and then they will be able to start to take those experiments that they want to try and build it in an environment that gets them to deliver it faster and, more importantly, that they can then scale if it's actually useful and successful.

Also, an interesting thing that happened was teams started to kind of figure out the microservices patterns that they needed to use, and we started to see some scaling in those lines.

Got to the end of last year, and we said, wait, we can either go out and build this at scale across our entire footprint, or we can start to build a better model for allowing this experimentation. And so we started looking at the public cloud environment, how we enable that in an enterprise setting for our developers.

So we started building out this model of how do you take everything that you're doing today and allow us to do this securely within AWS as a starter, and we'll be working across a couple of other public cloud providers over the next year.

So, we started with non-prod workloads. We're teaching and retraining our teams to do this infrastructure as code at scale, building in the security and compliance pieces, because this is super hard. It is really hard for security teams, audit teams, compliance teams to understand why it's okay for somebody to go out and spin up an entire new server farm whenever they feel like it, and they don't have an IP address, and they can't scan it, and they don't know what's going on on those machines.

So retraining those teams, teaching those teams what that meant, was important. And that's the journey that we're on over this past six months. Yeah. So making it easy.

What is this going to lead to? For us, this is what our strategy is. We're going to allow this lift and reinstall into the public cloud for a lot of our workloads. It'll help us build the technical capability. We'll drive automation, and just based on conversations across the industry, we're expecting, and we're starting to see with our first workloads, about a 30% reduction in the infrastructure cost that we have.

And this is not to say anything about getting from code to infrastructure out the door in two months to a couple of days. So that's a big change.

And then the second part of it is the conversation around cloud native. How do you start to transition these applications, break these big, giant monolithic applications into smaller cloud-native components?

The innovation we're also looking for there is, how do you build A/B infrastructure? And the expectation is, again, based on what we've seen other folks do in the industry, that will drive another 40% efficiency in the infrastructure.

So what does success look like? This is what success looks like for a couple of applications from a systems perspective.

Verizon.com, which is our primary online customer destination, we go from code to deploy multiple times a day now. We have end-to-end environments that we build up in minutes versus weeks. We have a 70%-plus test automation environment, which is not super great, but we're still working on it.

And Coffee Anywhere, which is kind of our customer-care system, went from monthly to releases twice a week. We have elastic scaling now built into the platform, and we have proven cost reduction in our infrastructure.

So this is what each of your systems could look like if you kind of keep at this journey, right?

Culture. And I'll let Ross talk about this because culture is kind of interesting. We wanted to build a model where teams are not just thinking about business and business value, but we're thinking about how we get better at what we do. And this is super important.

So you kind of see where we are, and we thought the easy part was done. And since the easy part was done, we brought in somebody cool to work on the tough part.

Ross Clanton

Yeah. That slide is wrong on so many levels.

So Chivas, at one level, well, we don't embrace heroes in DevOps culture, for one. But when Chivas and I were bouncing slides back and forth, you can kind of see where the pecking order is here. Every time he'd send the slides, I would pull this out, and it always made its way back.

So I've only been at Verizon a few months. And so Gene kind of likes people to come and talk about the story and where they're going and what they've accomplished. I haven't accomplished that much yet because I just got there.

So I'll talk a little bit about the plan and the strategy moving forward, which is where I'm largely focused, and it's going to be, how do we focus more on the people side, and how do we accelerate innovation across the enterprise?

I will say that before I go there, Gene did ask me to talk a little bit about what's it like shifting from one large corporation to another. And I will say, same types of challenges, different complexities. I will say one big shift for me was moving from more of a people-leadership role to a fellow role.

And if you want to know more about what a fellow is, Adrian Cockcroft is going to be presenting about that tomorrow afternoon. But really what that meant is I'm focused more on strategy coming in. And so we're going to unveil a little bit of what we're thinking moving forward, and we'll see next year how close we were.

So how do we scale this across the entire enterprise?

And what we're really focused on, and I think you've heard this theme already today, how do you scale engineering culture, and how do you scale engineering practices? There's a whole workshop on modern technology practices from the DevOps Enterprise Forum, and it's going to get into some of these things.

And the argument here is, in this new model where we're moving away from waterfall-driven, functionally siloed organizations, there are a core set of technology practices that you have to build proficiency and competency across the entire organization.

So what's the best way to do that? Because that's a very different model than when those things were specialized skills on different individual teams. You'll see the ones I've listed there, but essentially cloud, DevOps, Agile, security, these are core practices that every team needs to understand.

So we're thinking about this in terms of four key tracks. And I'm going to start kind of top and scale down because I think they're kind of foundational for each other in that regard.

First, how do you foster engineering culture? And there's really two things that we plan to drive here. One is building and strengthening internal community. And when you think about our organization, we have a lot of lines of business, and we have a lot of silos within the organization. So how do you build community across those organizations?

There's definitely playbooks in the industry around running internal DevOps conferences. It's something I have familiarity with in my last job. We're going to look to do stuff like that. There are some virtual approaches and pockets of innovation and culture that I'm seeing in different parts of Verizon. So how do we actually draw that out and expose it to the other parts of the organization?

That's going to be a big focus as well. And then finally, our external technology brand. Really important that as we're going through these changes, we're out talking about them, and we're out showing the changes that we're going through. And we're active in the local communities where we have a strong technology presence.

Then scaling the engineering practices through immersion, and I'm going to talk about that in a minute, what our strategy is to scale those practices.

Finally, how does our operating model need to change? And there's actually another workshop readout from the forum on operating model and organizational structure. To me, this is super important. I'm not going to get much into where we're going on that here. I'll leave you guys with a question at the end on it, though.

And then really continuing down the technology-excellence path we've been on, moving more aggressively into cloud, moving more towards cloud native, some of the stuff that Chivas already alluded to.

So how do we plan to scale practices, and what's our strategy there?

And really, as you kind of look at this bottoms-up, there are common sets of patterns and frameworks and practices that you want to make available to the organization. There's still value in having a shared-services type function in this model.

But the thing I would stress is that's a function focused on enablement, not focused on control. And part of this is actually harvesting practices that already happen. There's innovation happening all over the company, and you've got to figure out how to pull that in and expose it out to the others.

Enable, which is where I am going to spend some time. There is a growing trend of internal labs and dojos. Heather talked about it a little earlier, the Target dojo story. Capital One's going to be talking about their strategy for dojos.

That's what I'm going to spend a little bit of time on in terms of how you drive immersive learning and getting people in an environment where they're learning by doing, which all the research shows that's how it sticks. You don't go to training classes to really drive deep learning.

And then dedicated embedded coaches. Key strategic programs will get that type of service.

I stole a Target slide because it was public, so I figured I could.

So the picture on the left, this is just what I was talking about in the last slide. There's a growing trend of companies using this model, this immersion model, to drive learning in their organization. I thought I would throw out a few comments of examples, relatively well-known examples of that.

Like I said, Capital One will be presenting shortly on this. The Target example was discussed at length last year, actually. And another really good one in the insurance industry is Allstate has stood up these CompoZed Labs. They're modeled somewhat off of what a Pivotal Labs would be.

And the vendor model is proven too. So Pivotal Labs is kind of a pioneer in the space, but you look at IBM Bluemix, you look at Red Hat Open Innovation Labs, same concept. Bring people into an immersive environment. They work on their actual product, not some fake thing, and they're building it out, learning new practices in the process.

So our strategy is to operationalize some dojos across key hub locations. They are a core part of our strategy. This is by no means everything that we're doing, but I had seven minutes, and now I have three.

So first, what are those core engineering practices? What are some key roles? We're going to establish some key roles to drive those practices in different locations. And we're moving to kind of a hub model, where we'll have dojos stood up in close proximity to where we have large engineering presences, both in the US and in India.

And we'll have a strategy to kind of scale up the dojos based on demand and scale out coaching and space. We've been partnering with our real estate organization, and we might be able to announce some pretty creative dojo ideas publicly here in the near future.

And then finally, how do we know we're going to get there? Measurement is really important. The last talk on measuring business value I think is very interesting, because we get hung up on this discussion a lot with DevOps, like how do you actually measure that you're being successful with it?

My view is the best measures are outcome measures. We've been working with Hygieia across multiple parts of our organization. I saw Topo in the room earlier, so Topo from Capital One. This is an open source project that Capital One leads that allows you to get kind of measurements across your DevOps toolchain, so you can actually measure how your software factory is working.

But if you can couple that with capability measures, and DORA just got released this year, and it's taking the State of DevOps survey that Gene Kim and Jez Humble and Nicole Forsgren have been very involved in, and it's now offering an internal tool for companies to benchmark themselves with that same survey against both the industry and against themselves.

We've piloted it with one area in Verizon, got some very interesting results. We're moving through action plan there, but that's essentially giving us a way to benchmark capabilities.

By itself, capability's okay. It's not my favorite type of measurement. But when you can couple it with outcome measures, I think it's really powerful.

My challenge to the community, and I was just talking to Topo last night about it, is I would love to build a widget in Hygieia where we can actually plug in the capability measures for teams too, so you can see it all on one nice dashboard. You can see your outcomes and capabilities together.

And so, in the zero minutes that I have left, where do we need more help?

As we get into that last track on how do we transform our operating model, I would love to hear how people are moving to more customer-aligned product and services organizations, and away from kind of the functional, COBIT, waterfall, siloed organizations.

I think product model in a traditional IT company, people talk about product model with tech companies all the time, that were kind of born in that model. I think it's an under-discussed topic with traditional IT, and I think it is a trend starting to happen. I'm hearing more companies do it, so I think we need to be talking about it and sharing more about it.

And that's it.