Intel’s Journey to Large Scale DevOps Transformation
Is it possible to transform large enterprises with 100’s of in-flight projects across myriad technology stacks and entrenched processes, requiring massive workforce re-skilling? In this session, I’ll share approaches we employed to increase the likelihood of success through DevOps adoption by:
-Offering of a common Continuous Delivery Service, similar to industry offerings from Codeship.io, CloudBees, and others
-Establishing a Maturity Model to help teams incrementally adopt DevOps practices
-Coaching teams through Kaizen sessions to eliminate bottlenecks and waste in their value stream
Chapters
Full transcript
The complete talk, organized by section.
Sherry Chang
I'm the chief architect for DevOps and continuous delivery. I work for Intel IT. I've been a software developer for over 20 years now. My title is architect, but I still consider myself a software developer.
How I got on this journey is, two years ago, our team was tasked with just solving the path-to-production issue. Our path to production was taking, on average, about eight weeks. That is code that was already complete, having to go through all the various CCB, change control board, permission to be released. So we were told, shrink it down to something less than eight weeks.
By research, I came across the book by Jez Humble, Continuous Delivery. You may have heard of it. I read the book cover to cover. At that time, I was already familiar with continuous integration, test-driven development, and I was an advocate for those, so I felt that, hey, this was the logical next step.
So I basically convinced, cajoled, bribed the rest of my team to get on this exciting journey with me to adopt this new practice.
Fast-forward two years later, we have a couple of successes under our belt. We have our showcase app. About a year ago, they finally said, "Okay, you guys have the official task of figuring out how to bring the rest of the company along."
To give you a sense of what the rest of the company looks like, again, I work for Intel IT. We have about 6,000 employees, supporting about 100,000 employees worldwide across 66 countries. The 6,000 number, by the way, doesn't include contingent workers and also a lot of embedded IT, so the actual number is actually much larger.
Our challenges, not unlike any big corporation, we have a lot of applications. Already in our environment, we have over 1,300 applications of every imaginable technology stack you can think of, except mainframe. We don't have mainframe. Intel architecture and IBM doesn't mesh, so thankful for that.
We found that in our assessment analysis, we do have a large workforce, thousands of people to retrain. We have developers who don't have the right skill set. Most of them don't know test-driven development, don't know how to write unit tests, or have desire to. You have a lot of testers who don't have the automation testing skill set, and sysadmins who are used to just following the manual, executing things manually.
In addition, Intel is also a big adopter of ITIL. I know I've heard conversation rumbling whether ITIL and DevOps mesh. For us, we finally figured out that the sooner we accept that this is not going to change and we just have to deal with it, the happier we'll be.
So sharing some of our approaches, our successes, and some approaches that are not so successful.
The first thing we did, actually even before we were officially tasked with the role of making this transformation happen for our enterprise, we started a community of practice. We're hearing a lot of people say, "Well, what can you do? You're dealing with large enterprises, slow to change," et cetera.
We found out there are a couple approaches you don't need permission, you don't need money to do, and one of those was a community of practice. So we started meeting every two weeks, just a few of us, and it grew to a couple hundred people. We would deep dive in certain topics and tools. We started sharing each other's successes and horror stories. We invited external speakers like Adrian to come in and talk to us and share industry best practices. And we recorded all our sessions and made it available online.
So that is cross-geo friendly. You remember the previous slide, we have employees across 55 different countries, different time zones. So the recording made it easy and accessible for information to be available for others, as well as we are finding out that we get a lot of replay after the fact. As more and more people start finding out about this community of practice, about the information that is available, they started also rewatching our old videos.
When we got enough critical mass, enough excited volunteers, we also started having some hackathons. For us, it was a successful approach in terms of getting people to look at and experience our continuous integration, continuous delivery tool stack.
All the output out of our hackathon and out of our project, we put it in our internal open source. That's another tool that was really helpful for us. So if you don't have one in your company, I highly recommend getting some sort of internal open source initiative happening, because it makes the uplift so much lighter if you're spreading the responsibility amongst others.
When we finally got to a point where we had an official program and some money, we started hiring and getting official vendor training in place for people who prefer classroom settings.
Another approach we took, actually, this is one example of things that didn't work so well for us. We should have thought about making things easier way earlier, and we should have thought about user experience.
But instead, we listened to, or I listened to, gurus like Jez Humble, who just said, "If you want to get started, don't be encumbered by tools. Just go ahead and get started. You don't need anything besides a rubber chicken and a bell to do continuous integration."
Luckily, we didn't have to use a rubber chicken and a bell. We already had a continuous integration tool in place. But what we didn't have is, we figured out that we had 10 other tools all of a sudden that we needed to stitch together. Things like mocking libraries, dependency injection, our UI testing tools.
We found out that all these tools are owned by different teams. Every tool that we want that is supported, we need to file a ticket, we need to apply for permission, we need to provision the environment. In some cases, if we want integration in between the tools, we also need to file another ticket.
And that's just getting started. By the way, that's just getting started. You started in a development environment. If you want to move something to production, that's another set of hurdles to jump through.
Basically, what it ended up being by the time we had 20, 30 other projects on board, we're starting to lose credibility in that, "Hey, you're touting this DevOps thing that's supposed to make our life easier. It's supposed to automate our deployment, one-click deployment. I don't have to work weekends anymore. But instead, now I have this overhead of 20 tools that I didn't have before with different escalation paths."
So we have to backtrack and go remediate this. We found out making it easy is not so easy because all the tools are owned by different teams, and we do have critical gaps. So we're still in the process of stitching all that together and making it easier for everybody. This is very much a work in progress.
So my recommendation for you: start thinking about scaling if you're thinking about scaling.
One thing that really worked for us is this continuous improvement model that we finally put in place. How that works is, we came out with our own DevOps maturity model. We looked around; there wasn't a good one for us to adopt, and we didn't feel right. We didn't feel that we had all the answers, but we finally got to a point that it was very difficult to explain DevOps to people.
I don't know if any of you have the experience, but explaining DevOps to people is very difficult because DevOps is very squishy.
So we're having conversations that, hey, people tell us, "Go away. We're already doing DevOps because our developers are already carrying pagers," right? Or say, "Go away. We're using the cloud, so therefore we're already doing DevOps."
We finally had to put our foot down and said that if you say you're doing DevOps, these are the four things you need to be doing. Number one, agile. Number two, continuous integration, continuous delivery, and continuous process improvement. If you're not doing those four things, you are not doing DevOps.
Out of those four categories, we further refined, we broke them into 10 different categories. For each of those, we identified five levels, level one through level five.
That turned out to be a turning point for us to be able to influence. We're finally able to have the conversation with teams, especially teams who are like, okay, they come to us, "We want to get started. Yeah, we heard all these cool things that you could do, but we don't know where to get started."
So this model allows us to first perform an assessment with the team. Say, okay, let's figure out where you are right now, and then let's figure out what your next target condition is. Take small baby steps. You don't have to boil the ocean.
That was a successful approach for us, to get teams going on the DevOps journey.
Another thing that we found helpful, especially with teams that are coming to us, they say, "Oh, yeah, we want to get started, but we're too intimidated to get started. We don't know where to begin," is the value stream analysis.
This is another tool that we found success with in helping people identify areas, bottlenecks, low-hanging fruit that they can automate or just improve and streamline the process right away.
At an organization level, we knew earlier on that, okay, what I just described are things we can do at a team level to help teams move down the journey. But at an organization level, how do we influence the change in behavior for the entire organization? Because ultimately, that's what DevOps is about: influencing the culture, the behavioral change for the organization.
I just recently came across this model. It's this influencer model to help change behavior for an entire organization. This is something we experiment with currently, and if any of you heard of this before, welcome feedback.
Basically, their model goes something like this: in order to change the behavior for the organization, you need to obviously first identify measurable results. But more than that, you need to look at, at an individual level, what are the motivations for change, and does the individual have ability to change, and what can we do as a central organization to help impact that in a positive direction?
So in addition to the personal level, you also look at the social aspect. Who are the influencers, and are the friends helping the individual, the team-level changing? Do they have the ability? And finally, at a structural level, are your reward systems, or even down to the room layout, supporting the right behavior change?
We can't talk about organization impact without talking about metrics. There's been a couple of sessions that talk about metrics already. But what I will say, the metrics that really resonated with us was differentiating between action and result metrics, and I believe this is from the Toyota Kata book.
Basically, instead of looking at hundreds and hundreds of metrics and trying to make sense of it, we are trying to differentiate between action metrics, which are measures on how well you're doing something, and result metrics, which are ultimately the metrics that business cares about. And the two are not the same, and you really need to correlate the two to make sure that what you're doing is having the desired impact.
Now, we talk about the maturity model. How do we know that we're putting the right thing in the maturity model, we're providing the right guidance? We're putting it through the same rigor in that, are we giving the right guidance in the maturity model? Are we setting the right target condition? And how do you measure that those people are doing the right thing? Ultimately, does it lead to the right behavior or the right result?
So, two years into the transformation, how did we do?
To get some sense on how we're doing, one of the indicators we look at is our continuous integration tool. We look at our user accounts. Two years ago, we had 70-plus people, and over a period of the last two years, we got a 10X increase. So now we have over 700 users in our continuous integration tool. That's the good news.
The bad news is half of the users are inactive. So remember what I said earlier about making the tool easier to use. Our hypothesis is we can get the rest of the inactive users active again if we make the process easier.
Another indicator we look at is how well are the teams, of the teams that we coach or the teams that use the collateral that we made available, how well are they doing in the maturity?
How do you measure the maturity? This is a model that we adopted ourselves, we developed ourselves. Our initial take is, okay, well, you can ask people to do assessment and go back and check with them regularly. But then at the same time, we want to see trending analysis over time, and we don't want to keep bugging the team every month and every week on how they're doing.
So what we figured out is that, hey, we can infer some of these maturity levels from studying what they're doing in the continuous integration tool.
For example, how frequently are they committing the code? How frequently does build happen? Do all the builds have tests to trigger? Are they measuring code coverage? So depending on what they're doing, we assign maturity model automatically. And so what you see in the graph, this is the result.
How do we interpret the result? Another thing I mentioned, we ended up with five categories where we determined that these are the things we can really see from observation in a CI tool. The rest of the things like how well are they performing as an agile team is really difficult to observe through a CI, so we threw those out.
Now, in terms of interpreting how well we're doing, across the board, we think that teams are making good, say, B's and C's progress in the area of continuous integration, continuous delivery. Because the areas that are associated with those practices, we're seeing some distribution in the higher maturity, and even distribution across.
Now, the area that we're not making any impact, or hardly any impact, is the configuration management, infrastructure as code. That's validated through our observation and conversation and GitHub repository check-in, that this codifying infrastructure is still not a common practice across our teams. So this is an area that's helping guide us that we need to make the investment.
A future target for our program, we set ourselves a goal that in two to five years, we want 100% of the new projects to be using continuous integration, continuous delivery. Essentially, we want the graph, the curve, to shift right.
Key takeaway from our lessons.
I'm sure you heard a lot of ideas and approaches from this conference. There are a lot of good content out there, and I've shared some of our approaches. Some may work for you or they may not, but I think the most important thing that we've found is that you need to develop a model to figure out what works for you and what doesn't work for you.
For us, our measurement was developing a maturity model, correlating with our action metrics, correlating with the result metrics. For you, it might be different. But whatever it is, come up with a model that works for you. Usually that'll entail some sort of measurement, developing how do you define success, and developing some plans to identify some actions to take.
Every investment and every action that you take, because your time is precious, you should measure the impact and know whether you're making any difference.
One last slide. I know Gene asked all of the speakers to provide his wish list, things I need help with.
I mentioned before, better end-to-end integration. That's still a challenge for us. If you have used tools, or any methodology, or any internally developed tool that worked well for you, yes, please share with me. I'd love to hear about it.
Also, what we found out in our testing tool, as much as with all the testing tools that are available today, we still find there are some areas that require human judgment. So still need humans to review the test result, particularly in the area of UI testing, especially if you need to be testing across different platforms, across different form factors, and you need your UI to be pixel perfect.
Machine today can't make the judgment whether it looks right. But I think that this is an area that is achievable with some machine learning. So if you're an open source contributor, you're a vendor out there, here are some ideas for additional features.
The other area that we found lacking in our journey is the security scanning tools. There are a lot of security scanning tools, but the area they're lacking is, we haven't found any tool out there that tells us whether we have personally identifiable information in our system, in our application.
Unfortunately, that's one area we're still asking people to check the checkbox: do you have personally identifiable information? The other area is also related, is what security level, whether we have top secret or restricted secret information that we're storing. Again, that's something that we rely on form and a checkbox. Unfortunately, if you know that you're going to have to jump through more hoops if you check those boxes that say, "Yes, I have restricted content," then there's incentive to under-report.
So this is a real pain for us. If you know of any tool, again, idea about how you can figure out that kind of information, please share with me.
Finally, more tips and collateral on engaging the majority and the laggard. Like I said before, I think we feel like we are beyond the phase of the early adopter, the innovator. Our challenge is engaging the rest of the majority and the laggard. I think this is the same journey that you're going through as well, engaging the rest of your corporation to come along.
So please continue the conversation in a social media format. I'd love to talk to you and share experiences.
That's everything I have. I think I have a minute left. If anybody has any questions?
Q&A
Thanks. Any questions from the audience?
Q: How would you apply this to your microprocessor business in terms of DevOps practices?
A: It's a challenge. For the hardware side, we do have a lot of firmware and BIOS. There are software. So definitely the DevOps practices apply. But when it comes to hardware, where the constraint is hardware or manufacturing, that's definitely where we're getting pushback. And we also haven't figured out how do you apply that successfully in those settings.
Q: How big is the organization you're moving? You said you went to 800 people are using the tool. What percentage is that? How many people are you trying to reach?
A: Out of 6,000.
Q: Okay.
A: But engineers are a smaller percentage, a fraction of those.
Q: So you think you're maybe halfway there or a third of the way there, or most of the way there?
A: Probably halfway there.
Q: Okay. That's a great move. Any other questions?
Q: Looking back on this process, knowing what you know now, what would you have done differently and started sooner or later?
A: I think one of the things I mentioned is, started thinking about making the tool easier to use sooner. We're kind of playing catch-up right now, and we felt like we may have lost some of the momentum because we lost some credibility, that people felt they weren't getting the benefit and instead they were saddled with all these additional tools they have to deal with. So we're playing catch-up right now, and I felt that had we started earlier, people would have a better experience, and we would be further along.
Any other questions?
Okay. Well, thanks very much, Sherry.
Okay.