DevOps Will Save The World! : Public Safety, Public Policy, and DevOps In Context

Log in to watch

San Francisco 2014

Download slides

DevOps Will Save The World! : Public Safety, Public Policy, and DevOps In Context

Joshua Corman

CTO · Sonatype

DevOps Will Save The World! : Public Safety, Public Policy, and DevOps In Context

Chapters

Full transcript

The complete talk — auto-generated from the talk's captions.

My name's Joshua Corman. My day job, I'm the CTO for Sonatype. Many of you probably use our Nexus products or Maven Central for your components and libraries or component lifecycle management. But I'm also a collaborator with Gene Kim for many, many years.

We both come from the cybersecurity world, and we've both realized that the best way to fix security is to leave security, and to focus upstream on the digital infrastructure that we're all building. He asked me to go big and talk about how DevOps can save the world, and I agree with him. But I have to make the case to you in the next 30 minutes. And this is about public safety, public policy, and DevOps in context.

It's not what you meant to build, but you've actually built something that could be pretty transformative. Every time I hear the Marc Andreessen quote of, "Software is eating the world," I hear something very, very different. I don't think it's eating the world. I think more like a virus.

If you were on the moon looking down... Oh, boy. We have a- Well. We have a PowerPoint crash.

I don't think software is eating the world. I think it's infecting the world. I think PowerPoint is eating the world. PowerPoint does not like my assertion.

All right. Here we go. Good timing on that, was perfect. It was.

Happy accident. So I know Marc meant something incredibly different, but when to say software is eating the world, like I said, I think software is infecting the world. It spreads much like a virus, right? If you think about it, software and connectivity is showing up in every aspect of your life now.

We have software in your cars, in your medical devices, in your homes, in our public infrastructure. You're hard-pressed not to find software in something. In fact, I was talking to one of the automakers, he said they have 69 separate microprocessors in the car. A modern car now has more lines of code than a Windows XP operating system.

Mm. So, we kind of forget this, but that infection model, right? It's really spreading, and it looks much like that. Now clearly Josh is wrong here, right?

Because it's not negative. It's not like the bubonic plague or Ebola that's spreading, right? We get benefits out of it or else we wouldn't use it, right? So maybe it's not so much a virus or an infection, but maybe it's a symbiote, right?

We get certain benefits, right? So then I start thinking about metal. I built my own deck and I had to buy... Bless you, by the way.

Thank you. Sorry. I had to buy galvanized screws and galvanized nails. And if you know anything about metal, that process basically keeps it from rusting in the elements.

But what people don't realize is the heat from the process actually makes the metal more brittle. So it's really a trade-off, and it's that we would only do that process when the benefit we're seeking for the weatherproofing outweighs the risk that we introduce from the tensile strength. And I would love for us to get to the point where we view software the same way, right? So as we put software and connectivity into things, we do it when we realize that the benefits we'll get are acceptable in the face of the risks that we introduce.

And of course, you can't have this presentation without some sort of Deming reference. But in the cybersecurity industry, I don't think we actually do know what we're supposed to be doing. It's not enough to try your best if you don't actually know what you're supposed to be doing. And I think the same is true for software development.

Best practices typically aren't. And when it comes to security, good enough, which tends to be the goal, typically isn't. So what I like about DevOps is the idea of failing early and often and experimentation and trial and error and measuring everything. And that's why I think some of the patterns that Deming brought maybe to save the automotive industry in post-World War II Japan that made it into total quality management through Toyota, that's now made it into Lean and into DevOps.

John Willis likes to say that Deming is really the Shakespeare of the DevOps movement. And hopefully, I'll paint the picture here where I think Deming is actually also the savior for our public safety issues in the Internet of Things. But we'll get to that in a minute. So if you think about Deming's role in history, whether it was the Industrial Revolution being massively disruptive or the fact that revolutionized modern automobile assembly.

A lot of people don't realize the role he played in Toyota quality management, and in that rigor that we now have brought into the software world. But what we don't realize is, I made a joke when I took the job at Sonatype about, that's this January. I said, "If people built cars the way we build software, no one would ever have the confidence to drive a car again." And fast-forward nine months, and I realized, wait a second, we are building cars like we're building software. So I think we're on the cusp of an intersection where Deming's influence both on software development and on physical manufacturing is coming together.

And I'd like to plant a seed. If you take one thing out of this conversation, I want you to start thinking of software as a software supply chain. Right? If you think about it, we don't write code anymore.

We assemble it, and we assemble that from third-party and open source components. They're like Lego blocks, the building blocks. Why reinvent the wheel? So you have a supply chain.

You just don't manage it like one. And if you think about whether it's cars or drug manufacturing or the food industry, any modern industry that has the potential to inflict harm on physical life and limb has matured to the point where they have a rigorous supply chain model. And I was talking to a retailer who's very familiar with supply chains, and it instantly clicked that if you start to look at the way in which we procure, design, and develop our digital infrastructure, we can import a lot of things from the sciences that we pull out of supply chain management. So let's talk about Heartbleed.

Most of you were affected either directly in your day job or indirectly through your social media sites, right? Most of you had to do a password reset for at least a handful of your accounts.Heartbleed is far more disturbing to me, for a lot of unobvious reasons. But let's just take a quick look. The good news of Heartbleed is it revealed to the world that, guess what?

Open source is not infallible. Human beings write code. Human beings introduce some level of defects per million lines of code. The same human beings that write commercial code write open source code.

This is not slamming in the religious debate over, is open source better or worse than closed source? It's code, and code has defects. So while this one had a pretty logo and made a lot of news because it affected your social media sites, it is actually not the only flaw. This year alone, there were 17 at the point at which I made this slide, 17 different vulnerabilities in the NIST National Vulnerability Database in the open SSL code.

I think one of the reasons it was interesting is open SSL is so widely and pervasively deployed in so many places that why reinvent that stack when you can use it? And I actually think it was a great example of finding and fixing a bug pretty early. In a longer version, I might reveal things like, for example, the code commit was early in the wee hours in the twilight on New Year's. So the real lesson of Heartbleed is don't let friends drink and code.

But if you can read the slide, a couple of these I highlighted, the red one's the one that got the pretty logo, but there are actually four different vulnerabilities that affected Siemens industrial control systems. These control hydroelectric dams. They control nuclear facilities. They control all sorts of really high consequence kinetic impact.

If you heard of the Stuxnet virus that damaged the centrifuges in Iran's nuclear centrifuges to enrich uranium, that was a custom piece of malware targeting similar systems. And I think what should scare us isn't so much that there's flaws in software and open source projects, it's just how pervasively these open source projects have made it into places that can actually affect public safety. Now, to their credit, the reason I call out Siemens is they're a success story. There are about 18 major industrial controls companies, many of which were affected both by Heartbleed and the newer Bash bug, also known as Shellshock, who did not admit that they were vulnerable.

And also to their credit, they had the ability to remediate, and many of these competitors do not. So what it's revealing is as our dependence on third-party and open source code or code of unknown origin, quality, security, and provenance is making it into places that can cause more damage. The practices you're stumbling upon and enhancing in DevOps can actually save the world. Or rather, our failure to bring better practices and better supply chain rigor could have the inverse effect.

Now, what really scares me, and some of you know about my Cavalry initiative, is that the same open source vulnerable code is making it into our bodies, our cars, our homes, and our industrial controls like Siemens. A friend of mine hacked his insulin pump. They put a Bluetooth stack on it. They used third-party code, didn't really do any adversary testing.

It's wide open to hacking, can give a lethal dose of insulin. Several other examples have been discovered. Some other friends of mine hacked a Toyota Prius and a Ford car, and then the following year, they hacked 30 more cars. And with so much software and connectivity in these vehicles, and they aren't really realizing that they're computers on wheels.

Unfortunately, I'd like to tell you they're using custom hardened code, going through enhanced evaluation levels and whatnot, but they're not. Many of them are using multi-year-old vulnerable open source code that's sitting naked and exposed and could actually allow for full code remote execution on some of the cars that you drive. And worse, they don't actually have any way to update the systems. Many cars are pursuing ways to update them.

But this is actually getting pretty serious. And I think part of the reason is we've just become accustomed to the idea that we should put software on everything because software makes things better. But back to my galvanization of metal point, there's a trade. We get benefits by putting software and connectivity, but we need to make sure that the math and the cost benefit is worth it.

Now, a couple of years ago, actually here in San Francisco, some friends of mine thought we'd try to inspire software developers to realize the awesome responsibility that they now carry, and we wrote this thing called "The Rugged Manifesto." Many of you know about the Agile manifesto. And it was just a 10-line attestation. Almost think of it like a Hippocratic oath for doctors. And one of the ways we put it is, if you build bridges and you build skyscrapers, you're building modern infrastructure.

And we can depend on infrastructure. We don't even think about it. Not one of you sat here thinking this building would collapse upon you. We can take for granted the steel and concrete was built in a reliable and trustworthy way.

When you drive to the airport, you're not going to wonder if that bridge is going to collapse. And the simple assertion is we're becoming as dependent on digital infrastructure, on software, as we are on steel and concrete, but it's not nearly as reliable. As Joe Jarzinbec at DHS likes to say, we don't have any building codes for building code. It's a really simple point, but it's true.

And that dependence used to be a nuisance when they failed, but now it's becoming much more serious. And there's one line I like to call out when I think about that Bluetooth stack on an insulin pump that can kill you. It says, "I recognize my code will be attacked by talented and persistent adversaries who threaten our physical, economic, and national security." It's just one of the many sentences in this little Hippocratic oath for software developers. But I wonder if that developer would have put that Bluetooth stack there.

And we ultimately hunted down the person who made the decision and said, "Why the heck is there Bluetooth on that device? Is there a medical and necessary reason?" They said no. They said, "It's like bacon. Everything's better with bacon.

Everything's better with Bluetooth." And then once they did it, all their competitors did it. So, I have pause. So in response to these kind of issues, a bunch of white hat hackers at DEF CON one year ago, we launched a group basically realizing that the cavalry isn't coming. Nobody in the government is really paying attention to this.

We went to the FDA, we went to different organizations, DHS, and because the cavalry's not coming, that means it falls to us. So our basic problem statement is that we're adopting technology. Our dependence on technology is growing faster than our ability to secure it. And as such, our mission was to be an educational foundation and driveBe a voice of reason in technical literacy, educating policymakers, the insurers, and the manufacturers of devices that can affect human life.

So the mission statement is to ensure technologies with potential to impact public safety and human life are worthy of trust. And it's not just for white hat hackers. We want to work with willing partners in these target industries, and we've found several. We have four major projects, medical device security, automotive security, connected homes, and public infrastructure.

Each have different R&D lifelines, different parts of the government and oversight, different market dynamics, different competitive fields. And we've had some healthy DevOps-like experimentation, and much like DevOps, the killer app has been empathy. Instead of judging them or pointing a finger, we're extending an open hand, understanding their level of knowledge and experience, understanding their lingo, and I think many of the things I got was from working with Gene and seeing what was magical about the boundary spanners and the catalysts and the positive troublemakers that he was encountering, meeting all these unicorns and later horses in the DevOps movement. And as such, we published a five-star automotive cyber safety framework.

And I'd encourage you to both sign the petition and also look at it later. But what we really want to drive is, now that cars are computers on wheels, we believe they now inherit some of the same risks that your home PC does. And if you think about, we spend $80 billion protecting credit cards annually, and we still have a breach a week. Imagine that failure rate carrying over to these things that have much higher consequences.

And we're simply saying: Tell us what you do to avoid failure. Tell us you won't sue third-party researchers who want to help you find failure. Tell us you want to notice and capture evidence of failure, that you can have a prompt and agile response to failure, and that you can segment and isolate and contain failure. And that's where you can get it at bit.ly/5StarAuto.

And the "I am the cavalry" is not about Josh, it's attestation you make. So I am the cavalry and so are you, I hope. But let's talk back about software as building code. Do you guys remember the Haitian earthquake?

This is a great infographic. I hope you can study it when you look at the slides. Essentially, about a week after that, there was a much, much larger earthquake in Chile. And this is a direct comparative analytic between the two.

In fact, at the epicenter, they said it was 500 times stronger than the one in Haiti. So the big question I asked, which many people asked, is why were there 230,000 deaths in Haiti and only about 100 or so in Chile? And while there were several contributors, the prevailing belief was building codes. It wasn't the strength of the earthquake.

It wasn't the placement of the earthquake. It was that one had appropriate building codes, appropriate building materials that were defensible and reliable and understood the presence of earthquakes, and one didn't. Now, there are other factors, clearly. But I take that as an object lesson for us.

So as we build digital infrastructure, are we building it from the best possible building materials? When we do our component selection for these third-party and open source components, are we choosing ones from high-quality suppliers? And are we scrutinizing that they are secure and trustworthy supply from those high-quality suppliers? Do we even know?

And in this presentation, I want to give you a glimpse of some ways we can start acting like a supply chain to do that scrutiny. So supply chain in the bigger context. Couple questions. I'm a hardcore security guy, so I'm approaching this like attackers and defenders.

But I ask myself, what are the attackers most focused on? Followed by, what are we most focused on as defenders? And then if there is a gap, spoiler alert, there's a big gap, which activities will have the most asymmetrically positive impact on closing that gap? So if you go to the Verizon Business Data Breach Investigations Report, the DBIR, it's the annual report with the most scientific rigor, and they look at failures.

They look at primarily credit card failures. But every single year, the same headline comes out, which is that the number one attack vector leading to breach records is weak software. In this particular case, they call it web application attacks. So the punchline in this shorter version of a presentation is the number one priority for bad guys is weak software.

So hopefully, the defenders have gotten the memo and our top priority is also weak software, right? Sorry, I got some bad news. So as an analyst, I totaled up all 140 different security product categories and all the spending on them, and basically people security like identity access management and password management, it's about $4 billion. Data security encryption was about five.

Host endpoint security like antivirus was about $10 billion annually. And network security like firewalls, IDS, IPS, about 20. That little thin blue line is application security. The thing we spend the least on is the thing the attackers spend the most on.

We could not be further out of phase, and we shouldn't be surprised that we're having a breach a week despite spending $80 billion. But it actually gets worse than that, because if you peel apart the tawdry amount, the .5 billion that we spend on weak software, we're spending most of it analyzing the code we write. And I started this presentation by saying we don't write code anymore. We assemble it from third-party and open source components.

Our estimate is about 80, 90%. Gartner says it's 90% or more. So I call it the neglected 90 because we're taking for granted that perhaps it's safe and secure. But when you look at the root cause for many of these breaches, like most of the banks went down last July, the root cause tends not to be the code we wrote.

It tends to be the code we assembled from some other shared value, like an open source project. This is not a rant against open source. It's just that logic follows that as we have shared dependence on shared value and shared code base, we also have shared risk and shared attack surface, and attackers have taken notice. So it's not even 80/20 rule.

So I think the bad news is we couldn't be further out of phase. The good news is small efforts on scrutinizing the quality of our suppliers and our supply could have a very large impact on reducing the number of failures that we have, whether it be credit card breaches or safety in our cars and medical devices.Now, Gene and I worked on many different projects, but one of them was my zombie pyramid, which I will not go into my zombie talk right now. But essentially, I use this now, initially as a joke, I use this as the centerpiece of a grad school course I teach at Carnegie Mellon University on how to actually protect things. And the counterintuitive thing is all the security products are these countermeasures at the top of the pyramid, which have the smallest effect on defending things.

The bottom is really where you get tremendous value on the defensibility of an enterprise, a car, a phone, anything. And really at the bottom, it's defensible infrastructure. Are you choosing to fight the zombies in a wooden barn that's falling apart or in a brick building that can be defended? And too often, your fate is determined and preordained to failure by the choices made by a CIO or a CTO on the defensible infrastructure, whether it's defensible or indefensible.

The second one's really Gene's first masterpiece, which was the work that he did on the Visible Ops project. So this one's not about having defensible infrastructure, it's how well do you operate it? Do you have discipline and rigor? Do you know what you have?

Do you know when it changes? And do you have a tolerance for zero unplanned changes? So in that zombie apocalypse, it's really about do your survivors act as a unit and remain calm and have a plan and reduce the chaos and entropy. The third one's about situational awareness, and this isn't about buying security products, it's about having eyes and ears to notice whispers and echoes.

It's do you have floodlights to notice how many zombies are attacking and from which direction? Or is it werewolves or vampires? Because as we know the countermeasures differ, right? Right.

All right. I'm going to send you this talk. You got to cut all their heads off. It's okay.

So that's the safest policy, is cut their heads off. That's the safest policy. Yeah. So only in the context of defensible infrastructure that's well-operated and well-instrumented can we actually have a chance to deploy our limited resources and countermeasures.

And unfortunately, most of the security industry focuses on the top with very little yield. So the good news for you, again, is as DevOps folks, we have the potential to impact IT and operational choices. Okay. So I asked a question, and now I know the answer a few months later, is it open season on open source?

And if you think like an attacker, if you wanted to attack a bank, you usually went to their bespoke custom application code, spent three months finding a flaw, exploited the flaw and got as much as you could, and then you were done. But now I simply need to, as an attacker, I can attack Struts and I can get every single bank. So the amplification of that shared dependence has been noticed and is being acted upon by adversaries. One easy target.

Now, we are the custodians of Maven Central Repository, so we see all the consumption of all the open source code in the world out of our very large repository. And last year we closed with 13 billion unique requests, and that's largely fueled by things like cloud computing or mobile applications. You add on top of that the explosion of IoT, and it's going to get much, much worse. We thought our estimate heading from January was going to be about 18 to 20 billion to close the year.

We're already on track for 25. So it's actually exceeding our expectations, and IoT is just going to make it worse. So the open source consumption goes up, so too does the attack surface. So back to the Struts example, I went to the NIST National Vulnerability Database, or NVD, and I wanted to see, is it really true that with many eyeballs all bugs are shallow?

The common trope and belief around open source code. And that Struts vulnerability that took down every bank last July, that's the one I highlighted. We looked and that code had been there for 7 to 11 years, a minimum of 7 years, possibly 11. So with many eyeballs on a very popular and pretty well-maintained project, it wasn't sufficient to find those bugs.

And then what I did is I mapped every single dot there, is essentially from the history of the project, a dot is a vulnerability. The vertical is how serious it is on the one through 10 CVSS score, or the Common Vulnerability Scoring System. And the trend line is that there are more being found and they're more serious. And this is on just one of many projects that you depend upon.

And granted, it's a large framework, but it's indicative. I've done similar plot graphs for other ones. The one that kills me as a security guy is Bouncy Castle. Its full name is actually The Legion of the Bouncy Castle Cryptography APIs for Java.

And aside from reminding me of my daughter's birthday parties this is a crypto library, and you only use it when you have a security need. You don't do crypto just for fun. So these are people who used it. They needed it.

It was written by security people for security use cases. And it had a CVSS 10 of 10, worst case scenario flaw seven years ago. We all make mistakes. That's not the headline.

The headline is, while it was found seven years ago, and while it was fixed seven years ago, our data shows that 4,000 unique organizations took it, the wrong broken one, 20,000 times, making it into God knows how many applications. Now, I know the answer to this, but you could speculate as to are these organizations unimportant ones that just happen to like crypto, or are they banking sites and high-frequency trading sites? And for large social media sites. So what kills me is this is an entirely elective, avoidable risk.

That if we were simply scrutinizing, the fixed version was right next to it, and yet 4,000 organizations took it. So this is a shame, and if any of our breaches and any of our security failures, especially the ones that can affect public safety and human life, are due to simple, easy-to-avoid mistakes like this, shame on us. And that's why I believe we can start looking at this more like a supply chain, because this is entirely optional. We don't have to be looking for zero-day attacks or exotic Shakespearean, Chinese espionage type attacks.

This is really brain-dead simple stuff. And there are countless examples, and we'll give you as much data as you want. HTTP Client's another popular one. So I believe, and my CEO and our company believes, that we should try to usher in an era where we treat software like a factory or more like a supply chain, where we scrutinize the quality of the suppliers that we do business with.

We measure and inspect the quality of the supply coming from them, and when a defect is encountered, that we have a prompt and agile response. Think of something like-A faulty airbag that deploys in the field while people are driving. If you were a car company and you had a faulty airbag and you continued to willingly and knowingly use a faulty, defective airbag, you'd be sued into oblivion. And we're doing this every single day with faulty open source components.

So I'm working on Capitol Hill a lot. One of the reasons we said public policy is because this has to be a public policy reform. Right now, there's only two things in the world that have no liability for your claims, religion and software. And software's days are numbered.

Whether it's software liability or something else. Essentially, there's a lot of discussion right now that now that some kid could get run over by a self-driving car, that'll probably be the trigger that leads to the reform that's well overdue. But for my own part, I'm pushing for procurement guidelines for the financial services sector. The FSI SAC already has a white paper on this.

I'm doing similar things in other retail sectors and whatnot. But I'm also trying to get the federal government to do a really elegant trio so we don't introduce perverse incentives. And the simple logic here, you can read it for yourself, is I want to know what's in it, I want it to not have known bad stuff, and I want to be able to fix it when bad stuff reveals itself later. So basically, anything sold to the federal government, for example, must provide a bill of materials of the third party and open source components that were used to construct it with their version numbers.

This is simply an ingredients list. It's not the recipe of how you made the cake, it's what's in it. Number two, that list should not contain known vulnerable components, like a seven-year-old version of Bouncy Castle, for which a less vulnerable version is available. There's exceptions if you absolutely have to.

And then must be patchable. And the reason I put that one there, because no one knows why wouldn't a website be patchable? It's because remember I told you Heartbleed made it into Siemens industrial controllers. HTTP client has made it into medical devices and other life and death consequences, and many of them cannot be patched.

On day one of Heartbleed, there were 600,000 discoverable devices infected. Today, there are 300,000. Only about half of them were patched or patchable. We have a long tail of about 50% of the original infection spread cannot be remediated.

So again, tell us what's in it. It can't be known defective. It would be tantamount to negligence or malpractice. And number three, it has to be capable of being updated.

Put that in the face of Bouncy Castle, and they would've said, "Oh, wow, I didn't know we were using a CVSS 10. Let me just slide to the right." So I think it's much more like a supply chain management. And you know what? You know who we have to thank for this kind of an idea so we don't have to invent this from whole cloth?

We got the Deming guy again. Post-World War II, he was sent to Japan to help with the recovery, and many of the root ideas that made it into Toyota quality management came from that man. And you're familiar with many of his practices because you're using them in Lean and you're using them in DevOps in continuous improvement. So I would argue that the four V's that you might find in Toyota supply chain also apply to software.

So I have some sexy data, but I don't think I'm going to have much time to show it to you. So I decided I was going to take drinks for questions at the bar. But if you think about this in terms, bless you again, a part might be something like an airbag, which might make it into several cars, which are compound projects, and they might make it to several consumers. Struts might make it into Bank of America, Bank of Brazil, any bank you want to pick, and then it affects their customers.

But guess what? It's turtles all the way down because Struts could make it in IBM WebSphere, which doesn't fix it for a calendar year. So even if you've systematically eliminated Struts 2 from your vulnerability list, the products you've consumed haven't. So I think this kind of a supply chain visibility and transparency will really help us manage these things down in an economic way.

And then the last thing I will show you is, well, one piece of data. There's a white paper I wrote with Dan Geer on supply chain risk. And, well, the role of DevOps actually, bottom line, is helps at all three layers of this. But there is a white paper I'll point you at from Dan Geer, and what we did in my last 30 seconds here is I decided to start looking at the integrity of the projects that we house in Central to see who does the best job taking care of software flaws as they're found.

And across the entire population, the sad news was when a vulnerability is introduced into one of these projects, only 41% ever get fixed. So at the moment, the state of the union for our supply chain is very poor. Less than half of those flaws are ever fixed, which means you're inheriting quite a bit of risk. When they do fix them, the mean time to remediate, the MTTR, is 390 days, over a calendar year.

And then I look to see if maybe they do a better job on the really serious flaws, the CVSS 10s, and they do a little bit better. That's about 224 days. Now, the reason I say this is on the macro level, the story's very, very bad. But on a project-by-project basis, some of them fix all of their flaws in 30 to 90 days or less.

And my belief is by adding this transparency and visibility, and if you start acting as a supply chain, you will gravitate towards trustworthy, reliable projects. And we have a chance to affect those automobiles, those cars, those medical devices, and finally save the world. So take out your wallet. On one side, I have my family photos.

On the other side, I have my credit card. We spend $80 billion and all our best and brightest trying to secure those credit cards. We're not doing very much of anything to protect the cars, the medical devices, and the critical infrastructure our families depend upon. So Gene said, what can you do to help?

My request from you is help me fix that ratio. So thank you for your time.