Log in to watch

Log in or create a free account to watch this video.

Log in
San Francisco 2014
Share
Download slides

Huawei’s CD Transformation Journey

Huawei is a $40B company delivering communications technologies for telecom carriers, enterprise and consumers. This talk will provide an overview of Huawei’s Continuous Delivery and DevOps transformation initiatives in a complex embedded software evironment. Ting Zhou Principal Solution Architect for Huawei, alongside Electric Cloud’s CTO Anders Wallgren, will present the business drivers and benefits of their Continuous Delivery solution encompassing build, infrastructure provisioning, deployment and testing plus reporting including release pipeline visualization and progress dashboards.


Huawei’s CD and DevOps implementation is a centralized, shared cloud service currently used by 2,000 developers supporting 20 applications and is in process of being extended to 40,000 developers servicing over 1000 applications. Some additional statistics on our current implementation:


– More than 2000 releases per year

– More than 50,000 compile & builds per day

– More than 1million test cases run per day

– More than 30million LoC, product is complicated

– More than 480K code review/analysis per year

– More than 170k system integration testing per year


The benefits of our solution include:

– Reduced cost of delivering software

– Increased resource utilization and productivity

– Shorter time to market with higher quality

Chapters

Full transcript

The complete talk — auto-generated from the talk's captions.

So I'm Anders Wallgren. I'm the chief technology officer at Electric Cloud, and I was supposed to be joined by my friend Ting from Huawei today, but unfortunately, he was called away on other business. So it's just me today, I'm afraid. We were just talking about my gorgeous picture here when I had a little bit more hair, which really meant I forgot that I was supposed to get my hair cut before we took all the pictures.

So just a few slides. So what I'm going to talk about today is Huawei and a lot of information about their CD transformation, and a lot of metrics that they've gathered and what they went into, why they went into it and what they did and so on. But just a few slides on Huawei. Huawei is, if you've never heard of them, I often refer to them as sort of the Cisco of China, but that really kind of undersells them because they're huge.

They're in all kinds of different businesses. And these numbers here are actually from 2012. Since then, I know the enterprise and consumer sides of this have grown significantly as a portion of their revenue. So you're going to see them all over the world, basically.

You don't see them a lot in the US for various reasons. But they're basically all over the world. Pretty large country, lots of R&D centers. Out of the 150K employees that they have, half of them, roughly, are in R&D.

So this is a huge R&D operation in lots of different centers. And again, all over the world. Not a huge amount of presence here in the States, but they're significant in the rest of the world. In terms of their R&D investments and innovation, tens of thousands of patents are coming out of these processes, out of these organizations, and they're just pouring money into R&D.

I know these numbers have grown even since 2012. These are the latest numbers I could find that weren't in Mandarin. So I don't speak Mandarin, so I was a little hobbled trying to put that one together. And they have a pretty significant presence in standards organizations as well.

A lot on the carrier side, obviously, but also participating a lot in things like Tosca and other things that are relevant to what we're doing and talking about here this week. And really what it comes down to is what it allows them to do is, 2012, they produce a phone which kind of looks like a phone from about 2006. No offense intended, but a couple of years later, they produce a phone that looks like this. And I've seen these things.

They're pretty competitive with anything else that's out there. They're fast, they're gorgeous, slim, all of those kinds of things. And so they are just cranking stuff out at incredible speeds. But where were they before they kind of started down on this journey?

In some ways, they were doing pretty well. They had developer builds of about 10 minutes on relatively large code bases. That's not bad. But I think where they had some pain, and in particular, was in the follow-on, which is things like six hours in order to do a production build, four hours to do regression testing, and lots and lots of hours, I'm not even going to try to do the math, on doing a full test thing.

And the cycle time was hurting them. And if you've done Agile, you've done Scrum, you've tried to do CD transformations, those things, you know that cycle time is just key. Cycle time will kill you when you're trying to go do Agile or do continuous D, whether that's deployment or delivery or however you define it. Thousands upon thousands of CPU cores available on demand for doing all of these processes.

So there's no lack of resources here. Really what it was was a lack of automation, lack of orchestration is really what they were suffering from. But they decided they're going to fix it. And these guys, when they make a decision to do something, they don't go in halfway.

So they set up a team to do an evaluation and try to figure out how they're going to solve this. And they went after this in a very much, in a, "Hey, we want to do this in an open source way as much as possible. We'd like to leverage all of the cool stuff that's out there." And to a large extent, they have. I'll talk a little bit about where they didn't.

And you can guess which vendor I'm going to talk about. But here are the things that they looked at. So they were looking at project management tools, SCM, and they basically evaluated the whole stack. Right?

Where are we going to go? Where do we want to be? And really realized that they had to look at this as one large end-to-end problem, not just a set of pockets or set of silos where they had to fix things. So they were looking at project management tools, SCM, collaboration, provisioning, monitoring, design development, testing, everything, and including, more significantly, at least from my perspective, how are we going to orchestrate all of this?

What's going to be the glue? What's going to be the backbone that really orchestrates, that really does the continuous delivery, if you will, delivering from sort of a pipeline stage to pipeline stage, product to product, tying everything together. And they looked at a lot of open source solutions there. The only commercial solution that they selected in all of this was Electric Cloud because none of the open source solutions could do what they needed to do in the orchestration side of things.

I'll stop the product pitch right there. So what they did was they picked a group of 20 people to do this implementation. Really the core of what they're building on top of is three or four products, Jira, Rally, and Electric Cloud, and then tons of other open source stuff that they're building it around, obviously. But they picked 20 people to work on the implementation of this, gave them three months to get to version one.And with a goal of doing about 1,000 pipelines a day in that version one product.

And they got there. They started late last year, and by March, they had their first product team onboarded. Now, the onboarding process was kind of a little bit interesting. They asked for volunteers, and there was sort of the awkward silence of crickets.

But they finally found one group in the router group who were willing to try this on. And there's a reason I'm telling you this. There's a reason why I'm telling you that this was sort of a little slow in getting started, not a huge line of volunteers lining up to change the way they work completely, if you will. But what ended up happening was they went from our current metrics that we saw before to one-minute dev builds for this group that they onboarded.

Ten-minute prod builds. So a significant speed-up in their build times. But even more so doing 4X, 6X type speed-ups on their testing. And sort of the most significant part of this, I would say, is getting their feature delivery time down from 30 days to 7 days, which means that they can do significant feature functionality turnarounds within their iteration windows.

And that was really the big deal, right, was that they went from 30 days, which means now you're spanning... Unless you're doing really long iterations in your agile processes, you're spanning multiple iterations throughout your process, and it's much nicer to be able to fit them into one window. That really changes the way that you're able to work and speeds up your processes. So, what's faster for them?

I mean, their design processes, they've driven from days to hours. And part of the way that they've been able to do this is they get such rapid feedback. So, rather than design something 30 days later, hear about whether it's working in functional testing or so on, they're getting feedback in days, and even in hours, as opposed to where they were before. Development time, they do quite a lot of code analysis.

They do lots of static analysis, those kinds of things. And that was starting to slow them down. They've been able to get those times down from hours to minutes, or even minute, in the case of developer builds. Their compile build times and obviously a lot of hardware emulation type things that they're doing because they are a device manufacturer.

These things have been driven down quite significantly. And then the product validation phase as well, really torn down. So, where they are in their process right now is they onboarded their first team in March. At this point, they have 10 product teams with another two on the way that have been deployed.

Their product teams are on the order of 100 to 1,000 engineers per product team. So they're significant size teams that they're onboarding. They've got a goal of getting another 10 on board by the end of the year. They've got two in process right now as we speak.

Their goal for next year, for 2015, is to double that, to get to somewhere between 40 and 50 product teams that are onboarded. And to be honest with you, I don't know what the total number of product teams is, but if you do the math of 100 to 1,000 and 70,000 R&D employees, we're looking at maybe 100, couple of hundred teams, somewhere in that range. The overall goal, obviously, is to get everyone on there. And ultimately, the kind of workload that they're pushing through this process is pretty significant.

They're doing 10,000 plus releases a year across all of these product lines. A million system integration builds a year, 100,000 builds a day, easily, I would say, on top of a 30 million lines of code base. 100 million test cases run every single day and a lot of code reviews. They're pretty fanatical about code reviews, static analysis, code quality type things.

And that was part of their challenge, was being able to speed those things up and really crank more of that stuff through the system. So now, so we go back to the little cricket slide. They actually have teams lining up and volunteering to come onto this because it's made their life so much easier that you can iterate so much faster. And the program managers and project managers, they've got some fantastic dashboarding that unfortunately, they wouldn't allow me to show you.

But they've got some fantastic dashboarding that they've built up where they can literally, from a product group, start at the very top and look at what are the various product groups doing in terms of their release cycles, timelines, product feature functionality releases. Drill down from that into product group, into product line, individual product, all the way down to individual engineers. And when are they checking in? How long are their builds?

What was the timeline for this feature release? All of those things. So they've built up a huge amount of... They've really done the kind of top to bottom analytics approach to this to give themselves a dashboard where everything is available into there.

So that's kind of the reason I put that one slide in there of they had a hard time getting the first team on board. Nobody really volunteers to change the way they work. You have to be pretty miserable to do that. And it isn't necessarily a culture where people are going to put their hand up and say, "I'm miserable, please help me." But now they've seen how successful the groups that have been onboarded and how much easier a time they have of kind of cranking through all this workload, they've got volunteers.

So it's kind of exciting to see the speed at which they're now pulling this stuff up.So since there's only one person talking and not two, we went a little fast today. But I figured I'll take questions, and see if anybody has any questions. I'll try to answer them. Unfortunately, Ting's not here to give you the detailed answers, but I'll do my best if anyone has any questions.

Crickets. Question in the back. Do you know which part of Huawei exactly adopted Electric Cloud So the question was, do I know which group in Huawei is adopting Electric Cloud? So I know that the first group that came on was building a router.

That's about the only thing I know, unfortunately. So I know in the networking group, I would imagine. So more on the enterprise side. I'm pretty sure that some of the phone side groups are also now coming on board.

But I don't know that for a fact, so take that with a grain of salt. Unfortunately, when I visit them, I don't speak Mandarin, so I need to have translators with me. And obviously our people that work over there are fantastic, but it's still a question of, they sit there and talk for about five minutes, and then somebody looks at me and says, "Yes." And so there's a little bit lost in the translation, unfortunately, which is why I was hoping to have Ting here today, who can be very specific about these answers. Yes.

How would you characterize the releases? You said there's 10,000 releases a year. What kind of So a release for them is really a feature release. Sorry, a functional release.

So it's not necessarily something that's going to an end customer. So they have pipelines that they build up, and what they're trying to do is basically drive their iterations through the pipeline every single time and get to where they're in a releasable state, customer releasable state. So they really are kind of following this mantra of, we want the release point to be a business decision, not a technical decision. And so their goal is, at the end, be in a releasable state.

Doesn't necessarily mean it goes to a customer. So as you can imagine, the product cycles on some of these things are quite lengthy, even though they're trying to do it through CD. And I think that's kind of an interesting observation, actually, because I think we all hear the great stories from Netflix and Facebook, and they all say, "No, we're not unicorns." But in fact, yes, they are unicorns. They're a little bit different than the rest of us.

And especially for Huawei, for a company that does so much work in embedded type systems, routers, switches, telephones, all of those kinds of things. I think it's really illustrative of how useful CD is in general, right? And then the notion that continuous delivery isn't about, I have a website that I want to update every 10 seconds. It's really not about that.

What it is, is I want to be in a releasable state at all times. And that doesn't mean that you're not building something that goes in a box that goes to Amazon. So I think it's really an interesting case study, what they've done. So I think question there.

So did they evaluate some other continuous delivery tools and like Go, for example? They did, yeah. I can probably pop back to the slide. Oops, let me do that and do this.

Yeah, there was a whole... Sorry, let's go to here. Build this out. So this is basically the list of things that they evaluated.

And so in the orchestration arena, yeah, they did look at Go, Rundeck, Jenkins. I know that they chose us obviously. I know they're working with Cloud Foundry. I believe they're working with Chef, although I'm not 100% sure of that.

Can you elaborate on the reasons why they decided for Electric Cloud? This person does not work for Electric Cloud, by the way, so just so why choose Electric Cloud? It comes down to, I think there's a couple things. One is we look at this problem as an end-to-end problem and not something which is just a, you do CI, and then you pick another tool to do your deployment, and then you pick another tool to do your testing, and all of those kinds of things.

It's an end-to-end process that needs to be integrated end to end. Any time that you have kind of silos of automation, you have delay, and you have room for error and all of those kinds of things. So I think our philosophy on how to look at this was part of it. And then quite frankly, the scale at which they want to operate was beyond what these other tools could do.

The number of pipelines that they want to run, the number of engineers that they have interacting with the system. They have servers that are handling hundreds of API calls per second, in order to deal with the load that they're putting on here. And so the scalability was an issue. And then they're really putting together a platform for doing this.

And like I said, they've built lots of UI on top of this, quite frankly. They've built lots of dashboarding and analytics on top of this as well. And so they wanted and needed a platform, essentially. Something that they could build on that didn't necessarily need to be the center of the universe in terms of user interface, and that had a full API to all functionality and had a security model that worked for them and all of those kinds of things.

So I think it was the complete package. But I think also just our goals and our needs aligned in terms of we look at this as an end-to-end problem, not as a set of siloed issues that you need to work with the CI team to do this and the release team to do that. You need to be able to look at this as one problem for all of us to solve end to end. So yeah.

Other questions? So what were the tools that were selected So I don't know all of them. I know they selected JiraI'm pretty sure they've selected Chef, although I'm not sure how far along they are in their automatic provisioning. That's a separate group that we're also working with, but separate from the CD group that we've been working with.

I know they use Selenium. I don't know on the code analysis and unit testing and so on. Honestly, I've seen what they use, I just don't remember which one it is, so I don't want to pick the wrong one. And I know that they're working with Cloud Foundry as their cloud platform to do this.

Because ultimately, their desired end state, which I only just talked about with them a few weeks ago, is they really want to get to the point where not only do they have this wonderful sort of CD process stood up, but that the back end IT aspect of it is also completely elastic. And so that as they stand up a new team, all of their build servers, all of the infrastructure that they're using gets set up in Cloud Foundry, grows as they need it when the product releases are hot and heavy, and then does the elastic thing and shrinks when they don't need it. So today, they're running largely on a virtualized, but a static virtualized environment. So they're not necessarily standing things up and tearing them down every five minutes.

That's the desired end state from an IT perspective. They want to get to where all of this is very elastic and not quite as static as it is now. It is virtualized, but sort of static virtualized, not dynamic elastic virtualized. So, yep.

Any other questions? We've got some time, so. So, Huawei's not a new company, so I know they have a lot of legacy code. Yeah.

And they went in and they did all this stuff in three months. Yeah. All that legacy code. No, so to the point where they could get the first team on board.

Right. So version one, first team onboarded, three months. They pushed their code. Yes.

They picked one of the ones that was probably doable in three months. Correct. It's not the legacy. Exactly, yeah.

And I think they're definitely biasing this a little bit towards more newer products, right? Easier. Teams have just been formed. It's easier to change.

There isn't so much established, not just legacy code, but legacy process and legacy people, and those kinds of things. So for their evaluation of the tools that they use, were they going to try to maintain a consistent set across the products? Or was the first introduction a combination for the one product, and they might be looking at different combos for different products? Honestly, I think their going in assumption was we're going to pick one thing that's going to work for everything.

And I think they found out pretty quickly that isn't really going to fly. Or at least that that makes it much more difficult to transition teams onto it. Because it's one thing to say, "Look, we're going to do a new process. You're going to check in every day," all of those kind of things, to say, "Well, yeah, that tool that you're using, not so much.

Here's a new one." So I suspect that this will grow over time in terms of when they start to onboard everybody else. And I think they realized that pretty quickly. But they also had, with a three-month timeline to get to version one and onboard the first team, they focused quite a bit. Obviously, that's the only way to get that kind of process done that quickly.

So. Quick follow up. Sure, absolutely. What was mode of collaboration with you?

Did you have a consultant on site, or did you have just some initial training and they do it all by themselves? How did it look like? I would say we spent a significant amount of time with them. But they also did the lion's share of the actual implementation work.

Like I said, they had 20 people on a team working on this. But we've had people on site for days and weeks at a time, either at kind of critical design stages or critical implementation stages or critical rollout stages. Including I've been on site two or three times in the last year, working with them to kind of get to the next level, work on troubleshooting, work on doing all those kinds of things. So it's definitely something that we've been heavily involved in.

Because they have such a long vision for this and an aggressive vision for it that it actually helps pull us in the right direction, and so they're a great customer to work with from that perspective. They push and pull us pretty hard, which is it's always fun to work with customers where they've got the support from on high to do this. And even though it took a while to sort of get the first team onboarded and there weren't necessarily 100 volunteer teams that were volunteering, now that that's really changed and they're seeing the results and they now got people lined up to get onboarded to this, it's actually been sort of a fun transformation to watch. Anybody else?

If nobody else has any more questions, then you get seven minutes back. All right. So that's my gift to you.