Build Quality in at Scale
Bringing DevOps to the enterprise is a challenging task.
Changing the culture of a large established organization brings its own unique challenges. When undergoing this kind of transformation, it’s useful not only to reach out to all parts of the enterprise but to find partners within the enterprise to advance the cause.
As part of the changes at Salesforce.com, we realized pretty quickly that the professional quality engineers were very like minded with respect to the principles of DevOps. A large component of the DevOps thinking is driven by Deming’s 3rd point “Eliminate the dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place.”
That is exactly what our quality engineers bring to their jobs every single day and that is what has made them such a perfect partner for achieving cultural transformation in the enterprise.
Chapters
Full transcript
The complete talk, organized by section.
Reena Matthew
Good morning, everyone. It's been a great couple of days at DevOps Enterprise Summit.
Dave and I would like to share Salesforce's story about our journey on this transformation, and we want to explain why we find quality as a key piece of that whole journey.
I'm Reena Matthew. I'm a principal architect at Salesforce. I've been with Salesforce for more than eight years right now. My background has been software engineering, but I've played different roles: software developer, software tester, quality engineer, product owner. My passion is essentially enabling teams to deliver the services they're really proud of and making sure they're high-quality services.
On to you, Dave.
Dave Mangot
My name is Dave Mangot. I'm an architect at Salesforce. I've been there a little more than two years, and I've been kind of a sysadmin by trade, and I've been heavily involved in our DevOps transformation.
Before we begin, our lawyers and PR and all those other people, it's important that we tell you that anything that we say here, you should not use to make a purchasing decision about Salesforce. We have all kinds of websites and things like that for that kind of thing, so please use those appropriately.
We're going to start off a little bit about Salesforce. For people who don't know, we've been around for about 15 years. We like to call ourselves enterprise cloud computing. We're about 15,000 employees, and the reason that we're on this DevOps transformation is because of this graph right here.
This shows the number of transactions per fiscal year, and you can probably easily do the math. The number on the right is over a million, and if you divide that by 365, you get about a billion transactions a day.
This slide is actually a little bit out of date. In September, we passed two billion transactions a day, and we're over that now. So 15 years to get to a billion transactions, a year to get to another billion transactions on top of it. Salesforce is growing really fast, so we have to change the way that we work.
We're going to give you a little bit of perspective about what we're going to talk about today in the talk with a video from 1984.
You're gonna love the quality, because that's what goes into a car whenever we build one. You're gonna love the quality. You'll love the quality built by Ford.
From the folks in engineering to the people in design, from the factory to the showroom to the people on the line, we say, "You're gonna love the quality. Ford quality is job one."
From the folks in engineering to the people in design, from the factory to the showroom to the people on the line.
Thirty years ago, Ford was doing DevOps. They're like the original DevOps hipsters. They were doing it before it was cool. Might as well throw in security and engineering and operations in there.
Quality, quality, quality. If we had a drinking game for the number of times people said quality in the past day and a half, oh my God. But quality is central. Quality is one of the most important things. At Salesforce we always say trust is number one. Quality is a huge part of trust, and we're going to talk a little bit about why it's so important for DevOps at Salesforce.
When you go to people, you say, "We want a DevOps transformation," everyone says, "Oh, great. This is awesome. We're going to have lower MTTR, shorter lead times, easier deployments. That's going to be great. Sign me up. It's going to be awesome."
I spent a lot of time as a DevOps ambassador at Salesforce. The issue is that when you actually go to these people and you start talking about, well, we want you to do stuff for DevOps, they're like, "Well, okay, I'm hearing some of what you're saying." It's like that old Gary Larson cartoon when they're talking to Ginger. They kind of hear a couple things that apply to them, and then everything else is sort of hard to really understand.
At Salesforce, it's a big enterprise company. Like any other enterprise company, we have lots of silos, like all of you as well. So we have to get into these different places.
When I got to Salesforce, I was placed on a team called ISD, infrastructure software development. Everybody just called us the Puppet team because we were the team that was writing all the initial Puppet code. And I was like, "Oh, we're going to talk to everybody. We have to look at the beginning, we have to look at the end, we have to look at the whole process."
And there were these people on my team that were like, "Yeah, Dave, of course we do that."
And I was like, "Oh, no, but this is like DevOps. We're going to look at the whole system. We're going to test in the beginning. We're going to have all this stuff automated after it goes out. We're going to ensure..."
They're like, "Yeah. Yeah, we know all this stuff."
And these were the quality engineers. These were the QEs at Salesforce on this team. And the language that I was speaking was their native language. I've been really lucky to have been able to work with really high-quality quality engineers at Salesforce, and Reena is one of them, and she's going to tell you a little bit more about quality engineering at Salesforce.
Reena Matthew
I'm one of the quality engineers at Salesforce, and I started having conversations with Dave, who's a DevOps ambassador at Salesforce. We started having these conversations and we could relate to what we wanted to drive in the organization.
He talked to me about Edwards Deming. Frankly, I didn't have much knowledge about him. But for those of you who don't know, Edwards Deming was a statistician. He's actually considered to be, in some circles, the father of quality. After World War II, he actually went to Japan and preached his principles of management, and that's considered to have revolutionized the automobile industry in Japan.
He authored this book, Out of the Crisis, and one of the key things in this book is 14 points of the management principles that organizations run. This particular one that's called out here is the third point of those 14 points, and it's calling out, "Don't rely on inspection." In our world, we sort of say, "Don't rely on testing." The message is, build quality in from the beginning, always. Don't think about it at the end of the cycle.
Many of you have probably been in root cause analysis meetings or postmortems. How many of you have heard the root cause of a service disruption or a customer being upset: "Oh, testing didn't catch the issue"? That's an obvious reason of testing. Yeah, if we had found the bug, we would've fixed it. We would've never deployed it to production like that.
So if your root cause is testing, you're not thinking about what could've been done beforehand, upstream, to have actually prevented the bug, and more or less, how could we have found it out earlier on?
Quality engineering, the definition, if you look into the manufacturing world, it actually calls out, it's not just about the output that you're generating, but it's also the whole process involved. Your output could be software, hardware, and the intermediate deliverables that you have, like designs, test plans. What is key is how are you generating these intermediate outputs as well as the final output? That's what quality engineering is all about.
Quality engineering at Salesforce. I want to give a little bit history about that. We consider ourselves a network of guardian angels. We're always collaborating with each other. We're connecting. We actually want to share the knowledge that we have.
Salesforce is 15 years old. Six months as soon as the company was started, they actually started hiring quality engineers. So quality engineers have been there from the beginning. They're a dedicated resource to the team. They're working closely with the product owner, the developers, whoever else is required to actually build that service for your customers.
The skill sets that we look for in these engineers are, they're definitely passionate about technology, but in addition to that, we want to make sure that they have a research-oriented mind. We want them to be actually asking questions and coming up with theories and disproving theories so that they understand the system as a whole.
And you can't do that sort of research in a silo. You need to collaborate with people within your team and across your teams. It could be usability engineers, security engineers, performance engineers, anything that's a critical aspect of the service that they're building.
At Salesforce, as I said, it's not just about testing. Testing is definitely a critical piece about how quality is assured, but that's not the only way.
We want quality engineers involved in all stages. I don't know how many of you have seen this I Love Lucy episode where Lucy and Ethel are there. They're on this chocolate pipeline. Chocolate keeps coming. They can't actually keep up. It's good quality chocolate, but the delivery pipeline is not efficient. They're just stuffing it in their hats and in their mouths.
We want to make sure whatever output that the company is delivering, it's actually being built and delivered efficiently. Quality engineers at Salesforce, we involve them in the release planning as well as all the design reviews. Throughout that whole cycle, we want to make sure that they're analyzing the risk. What can we mitigate? What's our strategy to make sure that we're reducing the risk for all parties involved?
Definitely testing is a key aspect to that. All the engineers, developers, and quality engineers are definitely focused on test automation to make sure that, in future, we don't actually cause any regressions. But it's also about manual testing. We don't completely rely on test automation because we know the end customer experience has to include some manual testing.
With quality engineers involved in the tools, we want to make sure the continuous integration tools that we're choosing, the test automation frameworks that we're building, they're all involved in those discussions. Finally, the product, when it's finally deployed to production, we want to make sure quality engineers are also coming up with strategies to validate that our customer is adopting this service. How should we release it? Should we do canary releases? These are the different aspects that quality engineers are involved with.
What's our latest journey? Dave and I are talking about the DevOps transformation at Salesforce. Dave talked about the high transaction growth that Salesforce is experiencing right now. We know we need to have the software teams and infrastructure teams and operations teams all working closely to make sure that we're all aligned in how we're going to address that growth.
For a customer, it's a service. It's the infrastructure plus the software all bundled up together, and we want to make sure that the service doesn't impact the customers running the business.
Dave and I, in the next few minutes, we want to talk about three key functional groups of how this DevOps transformation has helped them. We want to talk about how QE has collaborated with them, and most importantly, we want to call out some wins for some of those groups as part of this journey that we've had.
On to you, Dave.
Dave Mangot
We go to the developers and we say, "We want a DevOps transformation." And they say, "Whoa, this means we're going to have to carry a pager." Nobody actually wants to sign up for it.
So how do you bring the developers along? Well, there's a couple of ways you can do this. These are the ways that have worked for us at Salesforce.
The first way is empathy. This is actually a big topic, obviously, in the DevOps community right now. I have a guy, a friend of mine at Salesforce I've actually worked with at multiple companies now, Walter, and I love Walter. I was talking to Walter after one of our DevOps conferences and I said, "What are you going to do to bring your development team into this?"
And he's like, "Well, we don't want to do this because that means we're going to get a pager and we're going to get woken up at 3:00 AM."
And I was like, "Well, okay."
He's like, "Well, that's somebody's job to get woken up at 3:00 AM. That's the operations people's job."
And I said, "Walter, those people don't want to get woken up at 3:00 AM either. They want to be doing their work during the day just like your team does."
And he's like, "Oh my God, you're right."
It was awesome, using empathy to actually bring people along, and Walter's been a huge proponent of the DevOps stuff that we're doing right now.
Also lots of carrot, pride in the service. Service ownership, very popular with companies like Amazon and Google and things like that. We're really bringing service ownership a lot more into Salesforce. We're giving people the tools. We're saying, "You wrote this code, like Werner Vogels, 'You wrote it, you run it.' You have to get involved, you have to take pride, you have to make sure that your service is up all the time."
And to help them with things like that, we're building new tools. So No More Flying Blind is a reference to a metrics framework that we built. The old way of getting metrics out of your production systems at Salesforce was very clunky. We built a brand-new self-service system. You instrument your code, you do all the right things, and your stuff just shows up in the metrics framework very easily. So you can actually take pride in your code and actually be able to have this concept of service ownership.
There's other things that we've also done, and we're doing some of these things with QE.
Reena Matthew
How do quality engineers collaborate with the developers?
One thing, as I said, we want to have the quality engineers involved from the beginning, especially the design phase. One key thing is engineers want to assume that nothing is ever going to fail. But the reality is, failure's going to strike at some time. Some component's going to go down. You need to make sure your system is built to be resilient against it. We make sure that everyone's thinking about that.
We want to make sure we're testing the design. You can actually find a lot of bugs just by reviewing the design and the functional specs. Before a single line of code, you can prevent, and that's how we want to prevent a lot of bugs before a lot of code is written.
We want to make sure that we have continuous integration set up, so whatever tests are there that the engineers, both developers and quality engineers, have written, they're actually continuously running for every check-in. This is a key thing to make sure engineers get continuous feedback.
In Salesforce, we started adopting Puppet, and as most of you know, there's not a lot of testing tools in the infrastructure world compared to our traditional software testing frameworks. So one of our Salesforce quality engineers named Connor, he came up with this open source product called Roaster, and it enabled the team to write more efficient Puppet functional tests. This is what we look out for quality engineers. You find a problem, how can you solve it and build it yourself as required? And they took pride in this work that they did.
Talk about another one, Dave.
Dave Mangot
Another thing that we developed is what we call PIAB, or Puppet in a Box. When we were on this Puppet team with Connor, we developed this development environment, which is all based on Vagrant and virtual environments and things like that. But we used it so we could write the Puppet code that was going to be running in production to manage all of our servers.
The great thing about this is it turns out that this is actually a nice environment, not just for writing Puppet code, but for actually having production environment on your desktop. This got adopted all over the place.
There are tons of Jenkins servers now that have Puppet in a Box and Roaster and things like that that are doing functional tests on the code that we write for production. It also enables people to take the Salesforce app and run it on their desktop and see what it's going to look like in production.
The old way, you'd have to wait for weeks or months in order to get a development environment. Now you do a Git clone, and you can have it in minutes.
The old way cost millions of dollars sometimes to build out environments that look like a Salesforce pod, for people who know about our unit of deployment. Now it's basically free.
The old way, you'd have to wait for network engineers or systems engineers to do whatever they had to do in these big, complex environments, and now you can actually do all this stuff on your desktop.
But developers aren't the only people you have to bring along on this DevOps journey. You have to bring along operations.
Operators, you go to them and you say, "We want you guys to do DevOps," and they're like, "Awesome! Someone else is going to carry a pager too, not just me. Sign me up. Where do I sign?" As a systems engineer, I'll definitely be involved in that. And selling the idea of DevOps to operations people, not terribly difficult. They see a lifeline. They're pretty excited.
We've been holding an internal DevOps mini-conference twice a year based on a recommendation that Damon Edwards gave in his DevOpsDays Rome presentation from a few years ago. At the very first DevOps conference, I was talking to an engineering manager and I said, "What's your impression of the conference so far?"
He said, "I see a lot of really angry ops people, and I don't know exactly why they're like this, but I kind of feel like I need to find out."
And that's a sign of a great engineering manager, that he actually cared to find out.
Bringing the ops people on wasn't terribly hard, but like as Reena said, operations and QE is sort of a new field right now.
Reena Matthew
Traditionally, quality engineers are always trying to see how they can break the system, what are the failures, and make sure engineers fix the bugs. Operations engineers, they're always constantly running the service, they're seeing the failures, and they want to actually make sure that the system still runs in spite of those failures.
A couple of things that we do at Salesforce right now is we have a concept of tabletop exercises. This is where the whole team, the developers, the product management, site reliability, that's the team that actually runs our service at Salesforce, they all sit together and they're actually going through the architecture and they're figuring out, okay, what happens if this particular component fails? Do we have the right monitoring in place? Do we know what's going to happen?
They're constantly thinking of the various failure points. That sort of builds the culture of the team to make sure that we hope this event never happens, but if it does happen, we're not going to panic. We know how we're going to deal with it. That makes sure that the developers understand the pain that the site reliability folks are going through, and site reliability folks can provide constructive feedback to make sure we're building resiliency back into the system.
Game days. This is actual, in our internal test environments or even in production, we inject failures and see what happens. So we validated our design that we built, designed a while ago. Is it working fine? The things that we identified through the tabletop exercises, have we assumed all that? So when you actually go through a real scenario in production of failures, you're actually making sure that the team is prepared when that event actually does happen.
Dave Mangot
At Salesforce, destructive testing is a normal part of service deployment. You do not take a service into production without having gone through destructive testing. You actually have to prove that you understand what your failure scenarios are.
We've been working very hard on Gene Kim's Third Way, repetition is the pathway to mastery. Here's a service that we've been practicing failing over, and before we ever even tried to fail the service over, people's estimates were about maybe 12 hours to fail the service over.
Well, when we finally actually started testing this, it took about two hours to fail the service over. And what we've been doing is repeatedly doing game days, testing, failing the service over. You can see in September it says we were down to about 35 minutes, and I've heard that they have estimates that the next time we do this, it's going to be down to five minutes. Five-minute failover from over two hours, just by repetition and getting that kind of mastery.
The last group we're going to talk about is security. Security, they're kind of the scary ones, right? You go to them, you say, "We're going to do DevOps." And they are like, "Oh my God, everyone's going to need access to production. Everyone's going to carry a pager. Ah!" Not so good, but you can actually get through to security people, too.
The security people, first of all, you have to offer them a spot early in the process. We've been really collaborating with security a lot with all the services that we're trying to get out. We don't want to go to security at the end and say, "Hey, we came up with this great design," and then they're like, "Yeah, no, you totally cannot do that." We want to work with them early. We want to work with them throughout the process.
One of the things that we've done is, the team that I'm on right now, we're deploying a bunch of appliances out to production to perform this service. We went to the vendor and we were like, "Do you guys have virtual versions of those appliances? We're going to use them for development and testing."
But we bought extra ones and we went to security and we said, "These appliances, these virtual machines, are dedicated to you for the duration that we're going to run this service. Every time we're going to do an upgrade, we're going to upgrade these virtual machines first, and you guys are going to get first crack at trying to break this stuff because we want to know about things that are going to break before it goes out to production."
Our CTO of security, Taher Elgamal, likes to say that security is just another aspect of quality. And so working with quality engineers and security is a big part of how we're deploying DevOps things at Salesforce.
Reena Matthew
Again, it's the same principle. Involve the right people at the beginning. We have all our security experts involved in the design phase to make sure we're capturing all the security requirements.
In addition to that, we also want to build a group of security testing experts in the quality engineering group because Salesforce is always going to have a dedicated security team, but we know it probably won't scale. We want to make sure we find the issues earlier on, and that's why we want to build these groups of security experts in the quality engineering group.
This is a different mindset. Normally in quality engineers, our testers, they're always like, "What is the customer viewpoint?" When you're talking about security, it's different. Like, okay, how are the hackers thinking? How are they thinking about breaking into the system? This is a new skill set that we need to build, and we offer training programs. The security cloud that we have, they're always offering training programs throughout the organization, and quality engineers are heavily involved with that.
Security tools. Security team runs a lot of tools. Normally, it's been at the end, but we're trying to integrate all of that into our continuous integration system so all stakeholders get faster feedback. We believe this is the same approach we'll even take with compliance later on, because if we meet most of security, we'll probably meet most of our compliance as well. So building all these tools into a continuous integration system, it just gives faster feedback for everyone involved.
Dave Mangot
Issue remediation. Has anyone here deployed a patch in the last month? Okay, great. So, Salesforce is just like all of you.
One of the great things about deploying Puppet in this infrastructure as code is that we can now deploy patches in a few hours. And one of the things that we've done in change management is change management now considers infrastructure as code changes, Puppet changes, to be standard. A standard change is a much lower barrier to get something done than a regular change.
Non-Puppet-managed hosts still go through traditional change management, but Puppet changes now, we've already proven that there's quality in what we're doing because we have a whole CI pipeline that proves it. You don't have to look at it. You don't have to inspect it. You look at the dashboard, you're like, "Oh, okay, your test passed. There's quality there." This has been really huge.
We deployed a change recently, and on Chatter, which is like our internal Facebook, and by Salesforce Chatter. One of the security guys, after we rolled out this fix, we rolled it out so fast, he said, "I, for one, welcome our new robot overlords." Security guys were completely on board for this.
So where do we want to go from here? We want to spread DevOps to more parts of the organization. We want to go to product, we want to go to sales, we want to go to compliance, marketing. There's a lot of different places that we want to go. This is the DevOps Enterprise Summit. We want DevOps throughout the enterprise. We don't want it just in Dev, Sec, Ops, we want it Marg, Com, whatever you want to call it, attack all the other people on the end there. We want the whole company to be involved in this.
Quality is everyone's responsibility. Deming said, "It's a mistake to assume that if everyone does his job, it'll be all right. The whole system may be in trouble." I personally feel that the quality engineers at Salesforce, they're the ones who are looking at the whole system. They're allowing people to do their job. They're allowing the functional areas to do stuff. The functional areas are taking responsibility, but it's the quality engineers who are looking at the whole system and making sure that we're not in trouble.
Reena Matthew
The role of quality in the DevOps transformation, if you want to summarize it from the journey just we had, it's bringing everyone together to deliver this enterprise-quality system. We want to delight our customers. We want to focus on preventing bugs with faster feedback loops. Because, yeah, it's good, we're going to definitely find bugs, but we want to prevent as much bugs as we can.
We want to do incremental improvements, not just to the software, the infrastructure code that we're delivering, but to the process and tools.
If you look at all of this, it maps to the Three Ways that Gene Kim calls out. It's like looking at the system as a whole, ensure there's faster feedback loops for everyone involved, and do continuous improvement. That's what helps the whole organization.
What we've learned is if you're actually building a culture of quality, you are actually adopting DevOps principles. You might not be calling it that, but it's the same set of principles about collaboration, sharing, automation. They all align pretty well.
The key takeaways that we want everyone to remember is quality is everyone's responsibility. One team alone cannot achieve that. Quality engineers are definitely there to be the champions, but the whole organization needs to be involved.
We do believe that the continuous quality mindset that's there throughout the whole organization ensures that everyone is thinking about how can we make this place better, how can we make the service better? In the end, everyone wants to be proud of their work.
As part of this DevOps transformation, leverage the quality engineers that you have in your organization. They already believe in this philosophy, so they're going to be your partners in this.
A question for all of you, we're asking from your experience. We did mention we're trying to build these quality engineers into our infrastructure and operations teams. It's difficult to find quality engineers who have a testing skill set with infrastructure. We're interested in how do you go about doing that? Who are the champions of quality at your companies? There's various techniques that you can do. We'd like to know more about that.
And Dave sort of alluded to, we want to bring more people, more groups into this whole transformation. Product is definitely one key group that we want to bring in because there's always this constant battle between delivering services but making sure there's not enough debt. So we want to know how to bring them along.
That's all we have. Hope you all find that quality is a key part of our transformation, and I think it'll be for all of you as well. Thank you.