DevOps and the Healthcare Giant
United Health Group is a large-scale health care technology enterprise with a strong focus on security and compliance. In our efforts to respond to market demands quickly while remaining secure and trusted, we have been rethinking the way we deliver technical solutions.
Follow our journey as we have tackled the use of automated security testing with the Gauntlt Rugged Attack Framework within software delivery pipelines to provide teams with on-demand security and compliance feedback.
Chapters
Full transcript
The complete talk, organized by section.
Aaron Rinehart
Hey, everybody. Welcome to DevOps and the Healthcare Giant.
I think you're saying you're the healthcare, I get to be—
James Wickett
Yeah, the giant.
Aaron Rinehart
I don't know why. Why is that?
James Wickett
Well—
Aaron Rinehart
I don't know what you're saying there.
James Wickett
Yeah.
Aaron Rinehart
It's weird. Okay, cool.
That's me, if you could figure that out. I'm Aaron Rinehart. I'm the Chief Security Architect at United Health Group. My passion is really around security chaos engineering. We'll spend a little time talking about that later today and what that means, because it sounds cool. There's my contact information for if anyone wants to get ahold of me.
A little bit about my journey is, I started my transformation journey at United Health Group from a DevOps perspective about two years ago. It started with an idea, an email to the senior executive team that brought the need for bold ideas to demonstrate the art of the possible and inspire transformational change.
James Wickett
You kind of Jerry Maguire'd that thing, right?
Aaron Rinehart
Yeah, totally.
James Wickett
Yeah. Okay.
Aaron Rinehart
So if you haven't figured out by now, people think of United Health Group, they think of UnitedHealthcare. We really haven't been just UnitedHealthcare since the '70s. We're actually 360-plus companies large. We acquire anywhere from 20 to 30 companies a year. We like to call ourselves the United Nations of technology because we have everything. If you name it, we own it, we have it. But also, what that does, that breeds a level of complexity that really makes things challenging. So we are large, and we are complex.
So my journey from a DevOps perspective began with trying to enable the developers. In that, I foresaw that no matter how fast they moved, how fast they delivered, security would always be in the way. And in my career, I've been tired of being that guy. Tired of having a finger pointed at me that I'm stopping someone from being successful.
So what I chose to do is drive feedback loops into—here, yeah. Drive security as a functional quality. And in that pursuit, I wanted to build a better model. A better model was possible. And through that, we developed a strategy that drove automation security into CI/CD.
James Wickett
Got it. Be mean to your code.
Aaron Rinehart
That's right. That's how we met.
James Wickett
That is how we met.
Aaron Rinehart
So James has been very instrumental as part of our desire for bold ideas. It wasn't just to automate the security tool capabilities that we had. We had to do something bold. We had to show not only the developer community, but also the security community where we sat in the value chain, that we could be a part of the value chain.
James Wickett
Yeah.
Aaron Rinehart
So here's some lessons learned from the transformation a few years ago. Automation was important, but it was very easy to get distracted by it. We had a lot of success in our desire to simplify and to standardize what we did. And for those of you that are developers in the DevOps community out here, you know that it's easier to code things that are simple and standard.
James Wickett
Yep.
Aaron Rinehart
Embrace failure as a friend. It was not only our successes, but it was our failures, really, that drove change. Through our failures, we experienced the failure of trying to automate things like Fortify, static code analysis, or dynamic application security testing with WebInspect. We determined that we were not able to upgrade because our IT platforms were broken. But through that failure, and being open about that failure from all the security teams, was a great way to share information and to collaborate.
James Wickett
And I'm James Wickett. I work over at Signal Sciences. Yeah, we took funny pictures in the lobby here where we were just nervously getting ready for the slides and the presentation. I work at Signal Sciences over there as the head of research. I am happy to be over there. We do defense for a lot of enterprises. We're kind of on the vendor side, but we protect web applications, APIs, microservices, kind of in the application security space.
I help organize DevOpsDays Austin, and if you have a Lynda.com subscription and you're like, "Oh, that guy was really funny," then you can go watch this Lynda class. I think there's eight hours of content on Lynda, which is pretty crazy on that.
But let me tell you my journey. So first I came out of school and I was working at this web and e-com company. It's about a billion-dollar company. We were doing brutal on-call rotations. It's a big enterprise. We would do these deployments where it's like, okay, we're going to deploy, we're going to make a code change over the weekend, and it was slotted to take a couple of hours. But you knew that was never a couple-hours event. That was going to be six hours, 12 hours, and sometimes it was 24-hour-plus deployments. Everything is broken on Sunday morning. You're really questioning, why did I do this with my life? How did I decide to get to this point, and how could this be better?
And yeah, it's like, man, computers was not a good choice for life at that point. And we were just doing waterfall. But at that point, some of my closest friends now, we kind of all went through that same pressure together.
And then for a little bit, I was like, okay, I can't handle that. So I went to a startup and we were doing Amazon Cloud. I learned at that point a little bit about what actually makes me happy, a lot about failure because we didn't really make it.
And then I was able to rejoin some of my old buddies that were doing kind of the brutal enterprise on-call thing in operations. And they're like, "Hey, let's launch some software-as-a-service type things running in the cloud, and we're going to do this thing called DevOps." And I was working at National Instruments, and we were one of the first enterprise DevOps case studies back in the day back then. And yeah, I was doing it with my buddies, Ernest Mueller and Karthik Gaekwad, and some others.
And we were doing infrastructure as code. Dev and ops were all conjoined on the same team. We delivered four SaaS products in a little over, I think it was almost two years' time. And the business was pretty excited about it, and that went really well.
And then I found this thing called Rugged Software, and John Willis was talking a little bit about it earlier today. But I was like, "Oh, there's some aspirational things we need to do security-wise." I realized back then that we can't do everything the same way that we've been doing it. In fact, all the time when I was buying stuff for the team, I was like, okay, I just call out vendors like, "Do you have an API?" If the answer is no, we just hang up, right? And we did that for monitoring and for security and everything else that we could as we were building that original team back in 2010.
But then later on, I was working on some Cucumber stuff, and I was like, oh, Gauntlet seems kind of cool, or the idea of doing some security testing that way. And I worked on this Project Gauntlet, and then later I was able to join Signal Sciences, and we kind of really helped push a lot of the DevOps stuff as well.
And just to kind of give you my backstory, because I really like security, and I've been a part of security, and I care about it, but I think security is in a crisis state. I was really hoping that was going to get some laughs. So that was my funny slide. That's it. I think that's all we got, so—
Aaron Rinehart
That's as good as it gets.
James Wickett
I apologize. There's another plaza over there. It's a good talk going on.
But I think a lot of security teams, they work in this worldview where they want to inhibit as much change as possible. Does that resonate with anybody? Yes? Yeah? Yes. Oh, sigh, right?
And it's not just the business or other people inside of the teams that are feeling this. Security feels it itself, right? Steve Bellovin, he said, we're still reading—we didn't start the presentation out with Equifax and all that junk that you could have. We could have just read news clippings, time after time. But clearly a lot of things are wrong.
And he goes on to say, we keep reading all these massive breaches, but we're spending a lot of money, but nothing's getting better. And he says we're protecting the wrong things, and we're hurting productivity in the process.
I really also like this quote, to help you think, okay, is security in a crisis? I think we just know it in our souls, as we are. But I think when I read this for the first time, I was like, man, this really bothers me.
"Security by risk assessment introduces a dangerous fallacy that structured inadequacy is almost as good as adequacy, and that underfunded security efforts plus risk management are about as good as properly funded security work." So what we're saying is security could just do actuarial stuff and take out insurance policies for stuff and kind of mitigate the risk that way and not have to do engineering.
And that sort of struck me, and I know Aaron feels the same way. So we kind of combined our journeys together, and then we're going to go through four new ways that we think security can kind of go forward together. And we juxtaposed this in an old path and a new path.
So the old way, security would embrace secrecy. The new way is create feedback loops.
Old way, you do slow validation. Now you do fast and non-blocking validation of stuff.
Old way, you'd really enforce stability. Stability was king. And now we're saying, let's create chaos in our system.
And then we used to say, let's make sure we do certainty testing. And now we're saying adversity testing.
So we're going to break this up, and I think we're going to try to swap back and forth. All right. You ready?
Aaron Rinehart
Yeah.
James Wickett
You got this? All right.
Aaron Rinehart
I think I'm ready. I'm ready.
So the old way sort of was really focused on embracing secrecy, is that there was this wall between DevOps and security. And the idea is that, through creating feedback loops, that in that process begins a level of understanding, in that by tightening the feedback loops, we're shifting them left. It creates a means by which to communicate around why security is important, right? And how security can actually become part of the value chain.
So I often feel like this is how security is seen from the development community, is that we're just right there waiting to knock you down when you're trying to stand up.
So are you under attack? And how do you know that security controls are working as intended?
So what I'm next going to talk about—oh, yeah, we should switch that up, actually.
James Wickett
We wrote that. Yeah.
Aaron Rinehart
Yeah, we totally switched that up.
Well, what I'm going to talk about in a second is the advent of the application of chaos engineering in the field of information security. And what we're going to talk about is how to ensure your security controls are working as intended.
James Wickett
And before we—that was a good prep.
Aaron Rinehart
Oh, yeah?
James Wickett
You sold that.
Aaron Rinehart
I sold that?
James Wickett
Yeah. I think people are hanging for the next part. Okay, good.
Aaron Rinehart
I did that. I did that.
James Wickett
So I think now we're looking at faster non-blocking stuff. So I really think about this in the terms of continuous delivery. Security has to say, we're not going to slow things down. We're going to do continuous testing and validation. Sometimes you'll take it, and you'll test a sidecar setup on the side of the pipeline, so you're not necessarily blocking the whole thing. And maybe doing penetration testing outside of delivery, and kind of doing that in different sort of cadence than you would've done it in originally.
So I mentioned I work at Signal Sciences. We came out of Etsy. Our founding team came out of Etsy. We built the tool there, and then now we kind of launched it as a startup here. And so we do the 15—the number of deploys we do per day isn't like we're hitting a goal or anything like that. It's just like, as people write code, they push it to production. We have a pipeline that we built for that. Roughly in the last two and a half years or so, we've done about 10,000 deploys, and we continue to march that forward.
And it was funny. When we were first launching the company, the CTO, Nick, was like, "Hey, James is going to be demoing this continuous delivery pipeline thing at the meetup that we're going to have in two weeks." And I was like, "I am?" That was complete news to me. I was like, "I must have missed a call," or something. "What happened?"
So I picked up Jez's book and I read it, and I was like, "Okay, I think we can do this." And then I looked at some of the stuff that Etsy had done before, like in their Deployinator and stuff like that. And we put together something that tied all the way from single Git commit all the way to deploy to production, kind of walking through the whole stages.
And we built our own tool in-house. We call it Deployer. But we really, as a team, spend a lot of time optimizing for overall cycle time. Gene talks about this a lot. This is the time from actual code commit to running in production. And I think it's real easy to look at pieces of your pipeline and try to put things and say, "Well, the builds take 10 minutes, but deploys take, I don't know, half an hour," or something like that.
And I kid you not, we would have meetings and it was like, "Okay, the builds take six minutes. That's too long. We need to get it shorter. Okay. The deploys take a minute, so they need to get shorter." And so we started looking at the overall cycle time from code commit to running in production.
And so it was a really empowering thing. And John talked about this a little bit today. It's like, you build it, you secure it, in that model. We have the same sort of policy at Signal Sciences. And because we're not just a software company, but we're also a security company, we had to do security as part of the CI/CD pipeline as well.
And so we look at our overall pipeline, and I kind of mentally map it into these five stages of design, inherit, build, deploy, and operate. And I think you can really ask some key questions. Of course, there's other stuff you can ask of those other two bullets. It's just those aren't my specialties.
But I think about, what kind of stuff have I put in my application? What libraries or things that I'm bundling in? Or services or servers I'm running on, or whatever I depend on, whatever I'm inheriting. I like to think about that word.
And then I think about, what about for build? So whenever I build, and I run all the acceptance tests and integration tests, am I going to catch security issues before they're released? And I think you'll do the chaos thing, and then at the end, I'll talk about one of the tools, Gauntlet, how you do some of that stuff as well.
And then I think operationally, we need to look at, am I being attacked right now? And are people having success? And thinking about overall end-to-end. So security can answer that question. So you're not just optimizing for one, but you're looking at the whole piece.
So okay, there's some notes—
Aaron Rinehart
Ah, there are those slides. There they are.
James Wickett
There they are. These are those slides.
Aaron Rinehart
So what I want to talk about is the idea of security chaos engineering. So this is actually a really interesting project. It's sort of a product and the fruits of the ongoing transformational change at United Health Group. Transformation is very much a journey, and that's where we are with United Health Group.
What ChaosSlinger represents is actually a completely inner source product. It went from inner source to open source in seven weeks from a cross-functional team of folks that did not know each other. There are about 300,000 people at United Health Group. These folks came together, involved me in the journey in trying to apply the concepts of chaos engineering to security.
How I originally came up with this concept is the idea and the definition of chaos engineering resonated to me. It's the idea of experimentation on distributed systems to try to determine how they respond to turbulent actions. So—
James Wickett
Yeah.
Aaron Rinehart
Next slide.
James Wickett
Do you want to switch? Because I think—
Aaron Rinehart
Oh, sure.
James Wickett
Let's see if we mess up the video feed here.
Aaron Rinehart
So the nature of security is chaotic. Somebody trying to attack a system has to get one thing right. When you're defending, you have to get multiple things right. So the idea of chaos engineering, I started thinking to myself—so Optum is one of the brands of United Health Group. Optum hired its first SRE about three months ago, three and a half months ago. And then he started telling me about chaos engineering and these concepts, and it just started resonating to me.
And I started thinking, how does this apply to the field of information security? And I started thinking about how we build security, is that we focus so much on preventative security controls. We actually do little to nothing outside of maybe a scheduled pen test yearly, quarterly, and usually that's on a very small percentage of applications. We do very little to instrument the preventative controls we've put in place.
We put a lot of assumptions, like, for example, I'll actually talk about an example in a second.
Here's that definition again: the idea of experimentation on distributed systems to build confidence in a system's ability to withstand turbulent conditions.
So in this new world, and there's been several talks throughout the conference thus far about chaos engineering, it's important to remember that failure inherently exists. Humans, if you think about it, we are a culmination of our failures, and the things we build and design aren't much different. We build failure in.
And so it's important to understand that "if it ain't broke, don't fix it," that's old hat. If it ain't broke, you need to be trying harder, because it is broken. There is failure. It's just whether you choose to find it or not.
So failure happens.
So what I want you to do is actually take a look at this. So this is from the Ponemon Institute. Ponemon Institute produces metrics and data about data breaches. And this data's from this past year's data breach data. And actually, I've done further research since I originally looked at this data. These numbers have not changed much over the past 10 years.
If you look at the part of the graph on the left, 47% of all the data breaches are caused by malicious criminal attacks. This is the spy versus spy. This is the sexy stuff. This is what the security industry focuses on, is they focus on the malicious attack, the nation state, the Chinese attackers attacking the federal government.
But really, if you look at the right-hand side of this picture, actually 53% of data breaches are caused by human error and system glitches. So this is the space that we're interested in with security chaos engineering.
So these are all the different kinds of places where failure occurs from a security perspective, from everything from poor testing practices to common misconfiguration, to drift, to the complexity of distributed systems, to dependency failures, to all kinds of places.
So hope is not a strategy, right? What ends up happening is that, throughout my career, what I've found is that many organizations don't realize their security controls didn't work until it didn't work.
James Wickett
Mm.
Aaron Rinehart
And what is that time? It's an incident. Security incidents are not detective measures. It is too late.
James Wickett
Mm.
Aaron Rinehart
We have to be more proactive. We have to instrument. We have to determine and identify and dig at our own failures, because it exists.
So in this new world of security chaos engineering, another key concept, and I think this will resonate across all domains in IT and for IT engineering, is the concepts of antifragility and the idea of building systems that get stronger by recognizing their own failures.
So there's the release of ChaosSlinger. We just actually released the tool, like I said, in seven weeks. Actually, what's kind of funny about the release and this tweet—so this original tweet when we released the tool, this is the first open source project for all of United Health Group. So it was a very big transformational moment.
But at the same time, actually, when I did this tweet, I did it on a Friday. On the Monday morning, I got a call from the marketing office, the Optum brand office, saying, "What is this GitHubby thing? Because you are trending number two for the entire brand."
So it was a transformational sort of discussion, in that the marketing office, the business, saw the value of open source, and that was a big moment for me.
So chaos—
James Wickett
I always knew you would be trending.
Aaron Rinehart
Huh?
James Wickett
I always knew you would be trending.
Aaron Rinehart
Thank you.
James Wickett
Yeah.
Aaron Rinehart
Thank you. Yeah.
James Wickett
Okay. Nice.
Yeah. So the last one we'll talk about is adversity testing, and then maybe we'll have some time for some questions. But the idea behind adversity testing is that idea that, hey, we want to take our code, but we actually want to be mean to it. And we also wanted to create some place that had something that was like operations and security could also join together.
We see this big chasm where you have 100 to 10 to one developers to operations to security. But those groups needed a way to actually combine together to be able to speak the same language, which they weren't.
So Gauntlet, that's what we started. It was born out of the idea, let's do security inside of a DevOps transformation inside of a large enterprise. And the very great graphics are, it runs all these attack tools. Did you see that? That's pretty much it.
Okay. But it's open source licensed. It has a bunch of pre-canned hooks for it. But it runs a lot of times inside of your CI/CD pipeline, which is helpful as just like another one of those components that you can think about putting in there.
It reads plain English, which is how you kind of have the crossover between developers and operations and security. So you can say, in plain language, "Hey, I'm going to look for cross-site scripting using Arachni against this URL." And then you set up a scenario, and you say, "Okay, on localhost on this certain port, I'm going to have that test." And you can run this inside of Jenkins or Travis or whatever.
And then so you have your given conditions, and then you say, "But when I run it, I want it to run these checks, and then the output should say zero issues were detected." And this one's a simple Arachni, but anybody who's ever used it and tried to put in login parameters, it's all the security domain knowledge, how to run these tools. It's sort of locked into the security group.
So a developer could read this and say, "Well, I don't know what that line does. I don't even know what Arachni is, but I get the last bit. It shouldn't find any problems. And if I get those problems, we now are able to have conversations about that."
So you can find more on that at gauntlet.org. It was pumped in the Agile Application Security book. There's a section on Gauntlet in the book, so I took a quote out of there. But they nailed it pretty right for the thought of how we can put these sort of tests and these tooling inside of our pipelines, and then every time your code goes out, it gets tested.
We do that at Signal Sciences. Every time our code goes, we sidecar a Docker container and run the Gauntlet tests, and we make sure we haven't introduced any flaws. We built it because security tools are hard to use. They're difficult. I kind of mentioned the collaboration aspect as well.
If you're interested in more, there's a demo environment. We also have a starter kit. And here's the demo running in Travis CI. And like I said, a lot of people use it in Docker containers because that's a real easy way to kind of suck that into your workload.
And I did a thing with Matt Johansen, Matt J on Twitter, where we did kind of a workshop with eight labs on how to do it. We did this at South by Southwest a couple of years ago for developers, and we had a test for cross-site scripting, SQL injection, and all other types of stuff.
All right. So that's kind of our talk. These are some of the things we've learned and some of the tools we've put in place. I don't know. We have about five minutes or so. We could have some questions if people are interested.
Aaron Rinehart
Yeah, I noticed I really didn't cover how security chaos engineering works.
James Wickett
Yeah, we sort of—
Aaron Rinehart
We sort of left that out.
James Wickett
We're trying to cram a lot of stuff in.
Aaron Rinehart
If anybody's interested in how the methodology and, from a programming perspective, how you use it, it's all GitHub, or you could just Google ChaosSlinger and you'll find it.
James Wickett
Yeah.
Q&A
Q: Can you talk about, sorry, in terms of something like HIPAA compliance, there's a set of policies. Can you talk about possibly how were you able to pick out just enough of the things that you can put into CI and what you leave as sort of a paper exercise that your compliance officer will continue to do?
James Wickett
Well, you're the healthcare.
Aaron Rinehart
And you're the giant. So there you go.
So here's how I see the world when it comes to compliance and security, is I actually feel like the advent of compliance ruined security. In that what I believe is actually, good security begets good engineering. And I'm a firm believer in good, solid engineering practices from a security perspective. Compliance is just a documentation exercise at that point, and it is truly possible, I believe so.
I'm very passionate about that subject because at United Health Group, you saw, we have GDPR, we have HIPAA, we have HITRUST, we have, you name it, FISMA, we have the whole gamut, right? And it gets noisy to think about all of that, right? So I tend to tailor my thinking towards good engineering, because if you're doing good engineering, you can actually theoretically be ahead of compliance. So that's how I see the world.
Q: Thank you.
James Wickett
Yeah. Good question.
Any other questions? Not that I can see from—
Q: I have a question.
James Wickett
Spaceman.
Q: So, taking a look at your ChaosSlinger and the Gauntlet, and then adding on to the question on the back about the FISMA and the MARS-E and so forth and the kind of regulatory compliance world. Is the industry talking about adopting some of this and making this the way you do it, or are we just, it's still just new? Kind of what are you seeing from that point?
James Wickett
Do you want to take that?
Aaron Rinehart
Well, if it's outside of the healthcare stuff, I could say stuff.
James Wickett
I mean, I think people are—I don't know. I used to say, "Hey, security is like..." Whenever we got to DevOps, we were like, oh, ops was where developers were. It's like in 2009, 2010, it was like, source control. What is this magical thing you speak of, right? And why should we do these things?
I feel like security is kind of a laggard behind operations, and I've been seeing that for, I used to say five years, now it's six years, seven years, but we keep expanding it. But I do see a lot of positive growth in security right now. I see in security conferences, there's a lot of DevOps tracks. There's a lot more DevOps talks. There's DevSecOps conferences. The RSA Conference has a DevOps-style track.
So I wouldn't say it's mainstream in security yet. It's not like everybody at Black Hat is filling the room to talk about DevOps, by no means. But I do think people are becoming a lot more interested in it, and we see pockets of instrumentation.
I think the chaos stuff has been really good. The work that Netflix has done has done some great stuff. Riot Games has done some good cultural stuff.
Aaron Rinehart
Capital One.
James Wickett
Yeah, Capital One has done some as well. Sorry, TOPO. Sorry. My mind's not great. So anyways. But I think it's coming, but I think we're still a little ways off until we have more saturation. I think DevOps is still kind of gaining traction, right, so.
Aaron Rinehart
Well, what I like about it is I feel like DevOps is bringing engineering back—
James Wickett
Yeah.
Aaron Rinehart
—to security.
James Wickett
Yeah.
Aaron Rinehart
We became an audit compliance type of world, right?
James Wickett
Yeah.
Aaron Rinehart
And what that bled into is that led developers and development communities to believe that all they had to do was compliance. Well, security was founded out of an engineering problem, right? And then compliance was a way to get people to actually do it.
James Wickett
Yeah.
Aaron Rinehart
Right, so.
James Wickett
Yeah. The book I recommended, The Tangled Web, and I had that quote about do actuarial science and stuff instead of engineering. It's a book on browser security, but the first chapter's totally worth it because it walks you through the '50s and the '60s and the '70s. It's like, security used to be doing real engineering work and doing crypto and all this stuff. And then somewhere in the '90s we're buying insurance policies like crazy people, right? And the book's, I mean, it's worded more eloquently than that. But I was like, what happened?
How did we make that shift? And so it's a real curious thing, how security compliance kind of became the reason to do something, not the assertion that you did the thing. I mean, which is what's the original goal. That's what policymakers expect it to be. They're like, "Oh, you're just doing the right thing. We're just checking." But now it's like, no, no, no, we budget towards that, right? There's a mind shift that happened in our industry.
Aaron Rinehart
Yeah.
James Wickett
Which is weird.
Aaron Rinehart
Yeah.
James Wickett
Yeah.
Aaron Rinehart
Definitely. It's good to see it come back, so.
James Wickett
Yeah. All right, cool. Hey, thanks everybody.
Aaron Rinehart
Thanks. Thank you.