Building Quality In: QA and Test Automation at Intel

Log in to watch

Las Vegas 2018

Building Quality In: QA and Test Automation at Intel

Software quality matters. Building quality in and shifting-left testing are key principles of DevOps and continuous delivery.

And yet Testing and QA often end up as the “middle child” of your DevOps journey: the stage in the pipeline that’s too often put on the back-burner, until you realize you have a serious bottleneck, or when things break.

Test automation is one of the most difficult hurdles—and he Achilles heel— for large enterprises looking to accelerate their delivery. The problem is aggravated particularly when needing to support legacy code, complex matrix of targets/supported platforms, or testing of embedded chips or devices that cannot be as easily updated.

Too often, we see testing being handled manually—introducing risk, delays, re-work and unpredictable processes. Test engineers commonly scramble to navigate between the pace of Dev, and the requirements of Ops around environments, compliance, security, and more. In this talk we’ll share Intel’s journey to systematically build quality in, treating testing and QA not just as an integral part of the pipeline, but as the key driver—and the poster child— of our DevOps transformation.

Learn how Intel builds quality into the delivery pipeline, the patterns we’ve adopted to simplify the complex testing matrix and scale test automation, the processes we have in place to ensure test coverage, security, detect errors quickly, and optimize for quality.

In addition, we’ll share how our detailed quality dashboards and testing data became the key indicators—for both technical teams and the executives—to gauge our release readiness, expected quality and DevOps maturity.

Manish is a Software Engineering Manager at Intel. He works in the Data Center group, delivering Networking Semiconductors and Solutions for data centers and mobile networks. It includes not only advanced, asymmetric multicore communications processors, but also a growing family of powerful, innovative network and media accelerators.

Manish has over 20 years of software design/development and execution experience in the telecom and networking domain. He believes that quality products lead to happy customers, and happy customers keep companies in business. His Mantra is: Build capability to deliver “known” quality product(s), at any point of time. THIS is the most important capability that your customers are looking for!

Chapters

Full transcript

The complete talk, organized by section.

Manish Aggarwal

Good morning, everyone. Good morning. I'm glad to see so many people here who are interested in knowing Intel's journey to build quality in. I know I am the only one standing between you and your lunch, so I'll try to make it worth your time.

Some legal stuff. You want to read through it? I don't know.

Okay, so what will be the agenda today? Some introduction, then what do we do? What are the software delivery challenges that will resonate with a bunch of you? And then where we started, and then I'll go into the solution space to discuss how do we try to solve the problem, and then where we ended up. This is a wrong statement because there is no end. This is just a means. And then we'll have some time for the Q&A.

So, a little bit about myself. I'm working as software engineering manager at Intel from their Austin, Texas, location, and I've been in that field of software design and development in the field of telecom and networking for over 20 years. And specific to Intel, I'm working in their Data Center Group. And I strongly believe that quality products lead to happy customers, and happy customers keep companies in business. So let's get this right. Even if you buy a thing from Dollar Store or Walmart or Target or Costco, as customer, your expectation is always the same. You want your product to work. So software is exactly the same thing. Doesn't matter which company software you are buying, it's open source or it's a company product, you want it to work. So let's see how we can enable ourselves to deliver quality products.

So what do we do, in the sense that what does my group do in Intel? We deliver SoCs, system on chips, to the top-tier wireless customers of the world. So whenever you are using your smartphone, which I believe all of you have in your hands right now, you're watching a YouTube video or communicate to somebody using Skype, using LTE, you're probably using some hardware or software that my team has worked on. So I always say, don't waste your time comparing iPhone or Samsungs or whatever. Buy all of them. It helps.

So why is QA so difficult? Why is quality assurance such a challenge? Because it is treated as the middle child of your pipeline. It gets the least love and affection. Why? Because this is the last checkpoint in the release cycle. If the requirements get delayed, design will absorb it. If the design gets delayed, the coding will absorb it. If the coding team gets delayed, unit testing will absorb it. But guess what? If all of them delayed, nothing is there to absorb it, but the QA cycle gets shrunk. And that's the reality because you cannot change the timelines. The delivery timelines have already been committed. So you have to make those timelines, you have to meet those timelines. And as a result, the QA function always gets the stepchild treatment and the time always keeps on shrinking.

What are the other challenges? It is actually hard, and I'll have a bunch of slides to explain what do I mean by that. And then we always have an ever-increasing test matrix. The number of combinations is increasing. SuSE comes up with a new flavor. Red Hat comes with a new flavor. Guess what? There is minimal development effort required, but the QA team has to verify all those new combinations that are coming your way. And I'm in the embedded space, so embedded device configuration is not very easy to be kept updated all the time. So that adds an additional challenge.

And traditional manual testing is very risky. It's slow and error-prone. It is repetitive and monotonous. And whatever is monotonous, that means there is a lot of scope for errors to creep in. The person, the engineer who is working on it, if he's not motivated enough, if the job for him is monotonous, then he's prone to more errors. That means it is subject to delays, it is subject to rework and unpredictability, and that is the last thing that you want. So test engineers are the ones who have to scramble between the pace of the different teams, how fast the developer is delivering the software, what are the marketing needs, and this is the shuttle or the bridge in between who's trying to balance both sides of the story.

So what was our challenge? Ours, in the specific group at Intel that we're talking about. We have about 20,000 tests and it was always growing. It is always growing. We have 10 chips, we have five different software platforms, and we have 16 features. And what it means is that we have hundreds and thousands of tests that we have to verify, the scenarios that we have to validate release after release after release.

And then, as if this is not enough, there is legacy. Whatever was delivered five years ago, you still have to make sure that nothing gets broken. Backward compatibility. Customer is saying, "My feature which was working three years ago should continue to work as is. Yes, I want you to deliver these new features, but I do not even should have the requirement to recompile my code." That's how the customer requirements are these days. And then new flavors, both in terms of the platforms and the softwares, they keep pouring in, and you have to make sure that you are compliant and your software works on that. And then you have to decouple the dependencies between different teams.

And then time to market, which is always a challenge. There is always less time and more work to do. And then last but not the least, how much testing is enough? Where do I stop? Every test engineer can write thousands and thousands of test cases given infinite time, but that's not a reality. That's not feasible.

So this is just a slide to tell you where we were when we started our DevOps journey. We were making about four releases per year. The QA lead time per CI cycle was in days, and we were shipping about 85% of the code with unknown test coverage. We had about three million lines of code. We were running hundreds of test cases per day, but hundred is a small number knowing the pace at which we have to operate these days. And we had 96 permutations and combinations of the software and hardware combinations that we had to verify. We'll come back to this slide later in the presentation to tell us to where we are today and how are we trying to improve.

At this point in time, let's go back here. We had a firm belief in God. And guess what? Because you push the release and then you pray. That's the only way you can survive.

So what is the reality on the ground? We talk about all these tools, but what are the challenges that we face day to day? The first and the foremost, it worked on my machine. You ask anybody, you ask any developer, this feature is not working. It worked for me. You go check it out. Right?

Flaky and inconsistent tests. The same test will work. You create a defect, you call the developer, you get the whole world together to debug it, and guess what? It will work. The issue will not show up. Right? Which is the biggest challenge for the QA team.

And then the numbers. Everybody wants to know how many tests you have, how many tests are passing, how many tests are failing, when did this test last work, right? And that is the only way you can get handle as to what is the state of your deliverable, what is the state of your software at this right moment. And guess what? You have to do all that with no additional headcount because we are on a hiring freeze.

Right? So what are the options? If you see, there is a development time, there is a testing time. What happens with every subsequent release? The development team gets their own time to add the new features. So ideally, the testing time should increase because they have to verify the previous deliverables, the features that they have delivered, and then whatever delta has been added, they have to validate.

But what's the reality? The reality is development gets their time, but the testing cycle squeezes further. So what do we have there? We have an unacceptable gap which nobody wants to address.

So what are your options? Your option is you increase the cycle times, the delivery manager and the QA manager puts their hands up, they're, "Okay, guys, this is not done. Go tell to the customer that they're not getting their deliverable." Not acceptable.

Shorten cycle times and risk quality? That, "Okay, I'll deliver, but I will not take any responsibility for the quality. I'm not taking any responsibility for the issues that your customer will run into." Not acceptable.

So what is the only solution? You have to get smarter, you have to test faster, you have to automate.

But how do you support 20,000 test cases? That's a monster number. Automation, right? And I'll share a few slides now to discuss how we ended up automation, and hope you will be able to take some knowledge out of that for your own needs.

We started using ElectricFlow, which forms a single, flexible, scalable platform, and we designed our automation solution around it. As of today, all of our tests are automated using ElectricFlow. We hired the professional services team to get a custom UI built in, and I'll have a slide later on to tell you what is the use of that custom UI slide.

All the results are automatically emailed to all the stakeholders. So you don't have to ask anybody for the status. It's right there in your mailbox. And based on the results, the pipeline is self-flowing. So if your gating tests have failed, no other test will trigger. If your smoke tests have failed, the check-in will not even be accepted. The check-in will be rejected. So that's how you are keeping everybody in check. It's not that some code is thrown over the wall to you, and then you are scrambling as to what change is causing me a failure now.

So what do we have? We have a bunch of reusable and parameterized test automation scripts using 100% object-oriented approach, which means every tier, which is you can talk about procedures, you talk about which release, you talk about which team is working on, what is the engine in consideration, what is the feature, what is the chip type. Everything is parameterized, and that is a total data-driven approach to the procedure. And your procedure operates based on these parameters.

So based on whether this feature has to be verified on my current chip or on the SoC that was delivered two years ago, or an SoC which is just upcoming, the same test will configure itself based on this parameter and will execute on the DUT which is running that chip. So I saved myself a lot of time and effort in copying over the same test to these different variants. But instead, my same procedure, by the fact that it has been parameterized, is able to give me same value with minimal investment.

And then it is all on demand. You can do it at any point in time. Anybody can do it. The developer can do it, the QA can do it, the support team can do it. If the person, the module owner is out, anybody else can do it. You are getting away from the time where only one person knows how to run this test. Right? The work never stops. If the engineer is out on vacation, he's out on break, you shouldn't have the need to bug him and call him and be him on the phone just because you have a release.

Users can view all the results and procedures based on their own search criteria. They can have their custom filters set up. So if you are interested in only two modules, only two sub-features, then you get to see only those, not all 20,000 test cases, because 18,000 of them are those for which you have no idea. You don't care. Right? So why should you bother yourself with those 18,000? You should only be focused on the ones that you are interested in.

And these tests can be run as is, or you can just change one or two or three parameters. If you have a new Linux flavor coming in, then you keep everything else same. Your Git repository remains same, your test procedure remains same, your toolchain remains same, but you're now running it on a different platform. Or you get a whole new customer reference board, a different CRB. You have to re-verify, re-validate the existing scenarios. Well, guess what? It's very easy. Right? You just change that one parameter, and you're good to go.

And everyone shares the same tests. So at times it happens that we develop the test cases in advance before the developers are even done finishing the feature based on the requirement, and then they can use our tests to do their unit testing. They can make sure that they have covered all the requirements because everything is automated. I don't have to baby-step him. I don't have to spoon-feed him, sit with him to explain how does my test run.

So this is a slide with a custom UI. It's not very pretty, but it does the job. And what it allows me to achieve is I can select which feature, I can select the sub-feature, I can select the product which is of interest to me at this moment. I can select which test procedure I want to run. I can decide when to run and how to run. So total customization as per my need.

And then this is my favorite slide. This is green, and we have a concept in our organization which is called easy green, which means you have this bunch of emails with tests passing and failing, but if it's easy green, everybody is happy because that means it did not run for me, it did not work for you. It worked for this third party who has no priorities or no preferences. And if it is an independent validation, everybody accepts it.

So I'll shuttle back between these two slides now. First, I'll spend a few seconds on this one. So this is our bird's-eye view, where for every branch, for different test types, we have these pie charts, and they get auto-populated, they get auto-updated based on the results of the tests that we are running. And how are they helping us? They are giving me a bird's-eye view of the critical tests, and at the same time, they are giving me the ability to drill down. And I'll cover as to how do we do that.

And this is using the shared language. Everybody understands the language of numbers. How many tests are run? How many tests passed? How many tests failed? So this right here tells me, I don't know if you can read, but there is a number associated with every color. So it tells me how many tests are in the green category, how many tests are in yellow category, and how many tests are in red category.

Now, green and red are self-explanatory, but what is yellow? Yellow means these tests are failing, but I know why they are failing. I should not have the need, I don't have the need to go through the logs again and figure out again that, oh, I know there is a bug open for this issue. I know this test is failing. This color yellow tells me, okay, nothing has changed between yesterday and today. So I'm not spending my time, I'm not wasting any engineer's time to revisit the same logs for which I already know that there is a failure.

What else? It gives me the reports, it gives me the dashboards, it gives me at-glance visibility to all the stakeholders. Everybody can go there. There is no password for that dashboard. You can go there and see what is the status of a particular branch. If a delivery manager is trying to make a call whether we are release ready or not, he should not have the need to go to the meeting, call a meeting with a bunch of QA managers and developers sitting there asking about the status. They can just go there and decide for themselves as to whether we are release ready or not.

And then it is easy to consume these charts for higher level visibility. So it is the same chart which is used by the developer, it is the same chart which is used by your delivery managers.

I can also check about the specific release, about a specific platform, and what tests are failing. Once I know that I have a bunch of tests failing, then the immediate next question is: which are the set of tests, which are the subset of tests that are failing? Right from here, I can get that information.

And then the obvious next question. If this test is part of regression, this test had worked at some point in time, right? So in order to isolate the culprit, isolate the code change which is causing this test failure, this data is immensely useful. This tells me about all the previous 20, 25 runs of the same test. It will tell me whether it passed or failed. So I'm easily able to categorize whether this is a flaky test or on a particular date, a check-in was made due to which my test started failing, right? So this helps me to debug my failures very quickly.

And then on-demand is the key. You can run these tests at any point in time. Every check-in triggers these tests on their own, and anybody can run them. This is fully automated. I do not need IT to provision anything for me. The scripts themselves set up the hardware, reserve the hardware, set up the environment, and run the tests.

So this is just a flowchart about the hardware reservation process. So being an SoC, being a chip company, all of my tests, they have to run on a hardware platform, which is a costly shared resource. So I've integrated the procedure, the steps, just like a human engineer would do to reserve a resource and use it, my automation system does the same thing. It logs in to my reservation system, marks the board or system as reserved for a couple of hours or whatever is the duration of the test that it needs it for, runs the test, and frees up the resource.

What is the bonus? What additional do I get? All the metrics that we need to associate with the quality, things like code coverage. Are there any memory leaks? To finish the binary compatibility validation: that is, this same software, the same set of library will work if I try to run it against a test which was compiled with the previous version. I can do all of that leveraging this existing automation framework without having the need to put in more engineers and more resources.

But keep in mind, this is not something that you can achieve overnight. You have to divide the problem, and that's the only way you can conquer it. When we started, we also started with only a subset of tests to do the POC, and then gradually we build on top of that. It's a continuous improvement model. And always keep focus on the end goal. You will be there. You may not be there in one month, you may not be there in three months, and based on your scale, it can take you years.

And then the most important part. Everybody understands that the code can have bugs. We have to make sure that we press on this message to everybody else that even automation can have bugs, and it's okay. It's okay for it to have bugs because it's code at the end of the day. And just like any other code, it has right to have bugs. And it gets matured, it gets stabilized as you run it, as you use it more and more.

So where are we today? This is where we were. Four releases, 85% unknown code coverage, God was existing, push and pray. Where are we today? We are making more than 100 releases. From days, we are into 10 minutes for our CI cycle. From 85% unknown code, we have 87% code with known code and test coverage, and we are running about 15,000 test cases per day.

Audience member: Per day? You said a day?

Manish Aggarwal: Per day, yes. Well, it depends on how many systems you have, right? How many platforms you are running on. But yes, based on where we are today, this is the number that we have.

So what did we achieve in a nutshell? It helped us manage the enormous amount of work that we had. It is allowing us to make more frequent customer deliveries. It is keeping me, and it is helping me keep an eye on the quality of my product at every point in time. And we are using the same language, the same shared language which everybody understand, which is the language of numbers, the language of metrics.

So what else? So we started with just the automation where it was after the fact. But as we progressed, every check-in now is gated, which means if I have a set of tests and there is a new code check-in coming my way, it will not be allowed for it to break any of my existing tests. Now, obviously, I cannot run all 15,000 of them because that means my check-in will take a few days to go in, which is not acceptable. So we have a subset of tests which are continuously changing, and they make sure that I'm not breaking legacy on a large scale. It is not a DOA kind of a deal that everybody stops working.

I have multiple levels of tests. So once my smoke test passed, then I run the level one test. Once they clear, we go to level two and so on and so forth. And we are doing continuous improvements in the sense that we are doing more and more parallel test execution. And instead of fixed number of tests that every check-in will run the test across all engines, it does not make sense. We are trying to get smarter by using Robot Framework and more tools so that we are doing our test selection more relevant. If there is a code change for feature X, then I'm only supposed to run the test for feature X and the features which are dependent on that and not everything.

Thank you, and we have some time for Q&A.

Q&A

Audience member: Sure. I think you have the result of the pie chart has the three color--

Manish Aggarwal: Uh-huh.

Audience member: ...green, yellow, and red. For this, is something wrong here in the test execution line. So this figure, we have automation as well, but this figure will cause our effort to triage. So what is your best practice to reduce this triage time for the failure?

Manish Aggarwal: Okay, so the question is that we have three colors in the pie chart, and let me go to that pie chart. Okay, it's taking too much time. So the question is, we have three colors, red, yellow, and green, and it takes them a lot of time to drill down the failures, and is there any best practice to drill that down faster?

So the two things which we have realized is, the first and foremost is the quicker you identify the failure, the quicker you are able to get the feedback to your development team about a failing test, the faster is the resolution. So courtesy of this automation, because we are able to get results on the newer code base in less than a day, that means that code change is still fresh in the mind of the developer. So because of the fact that I'm able to give him early feedback, I get faster fixes.

Second piece of information that they are most interested in is when did this test start failing? And because I have the luxury to go through the logs of that particular test back in time, I can tell them, this is the exact time, this is the exact run where this test started failing, and before this point in time, it was working. And because of the fact that I have the logs as well, everything is ready for them to start triaging the issue.

Audience member: How much time do you spend on maintaining your existing automated set?

Manish Aggarwal: Okay, very good question. It took us close to two years to stabilize this. It took us about six months to develop it, but it took us one and a half year to stabilize it. And that's where that statement where I say that even automation has bugs is very relevant. Because an automation which is not reliable should not exist, right? And it was a very hard time, that one and a half years, where every time there is a failure, people used to call it, "Oh, is it a test failure or is it an EC failure?" And it was a hard time for us. But gradually, the system improved, the stability improved, and then we got the buy-in from everybody else because people started seeing the value. And after that time, there is minimal maintenance need. But yes, in the beginning it was a lot of effort.

Audience member: Do you have any strategies for determining which tests to include in your sanity suite as opposed to the robotic one?

Manish Aggarwal: Yes. The question is, any best practice on which tests should be part of sanity suite? So there are two criteria that we use. One is, if you have upcoming release, then what are the features that are targeted for that release? They are part of our sanity suite. And second is, we have a set of samples which are more at the broader level, which cover the breadth of the product and not the depth, and we make sure that any code check-in is not allowed to break the breadth of the product. So those are the tests that constitute our sanity suite.

Audience member: How do you manage the underlying infrastructure? Do you reset your agents on your configurations? Do you assign them to a particular project? Do you compete for resources? Do you have any challenges with that? Because you have--

Manish Aggarwal: Okay, one more time, please.

Audience member: How do you manage your underlying infrastructure for agents or resources for your job executions?

Manish Aggarwal: Okay.

Audience member: Do you segregate them by project or not? Do you share across entire group? Do you have competing resources, issues, and priorities? How do you address that?

Manish Aggarwal: Okay, so the question is, how do we maintain the EC agent infrastructure for the group or for the whole company? So this solution that I discussed is specific to a group, and the EC server and agent are assigned the job of running the test to a specific group. Gradually, more and more teams are showing interest, and they are running their trials, running their POCs, using our agents. But the responsibility of maintaining them lies in our team itself. And as the answer to the previous question, there was a period of about one and a half to two years where we had to dedicate resources to manage and maintain that. But at this point in time, this is going in stealth mode, where I do not have the need to put in a dedicated resource to manage that.