Exemplars, Laggards, and Hoarders - A Data-driven Look at Open Source Software Supply Chains

Log in to watch

London 2019

Exemplars, Laggards, and Hoarders - A Data-driven Look at Open Source Software Supply Chains

In a year long collaboration with Gene Kim and Dr. Stephen Magill, we objectively examined and empirically documented software release patterns and cybersecurity hygiene practices across 54,000 commercial development teams and open source projects.

In this session, we will present evidence on the outcomes of that research, highlighting organizational and technology practices that enable exemplar open source teams to deliver 50% more commits, release new code 2.4x faster, and remediate security vulnerabilities 2.9x faster, all while delivering a level of value that makes them standouts in terms of popularity and adoption.

Stephen Magill is a Principal Scientist at Galois, Inc. and a world-recognized expert on programming languages and program analysis, with work ranging from development of high-level languages to static analysis of low-level systems code. He has a Ph.D. in Computer Science from Carnegie Mellon University, and his work has been widely published. Stephen has led several research and development projects, including serving as principal investigator on a number of DARPA programs. Prior to Galois, Stephen was a research scientist at the Institute for Defense Analyses Center for Computing Sciences and a researcher at the University of Maryland. Stephen also serves on the University of Tulsa Industry Advisory Board and numerous program committees and funding panels.

Derek E. Weeks is the world's foremost researcher on the topic of DevSecOps and securing software supply chains. For the past five years, he has championed the research of the annual State of the Software Supply Chain Report and the DevSecOps Community Survey. He currently serves as vice president and DevOps advocate at Sonatype, creators of the Nexus repository manager and the global leader in solutions for software supply chain automation. Derek is also the co-founder of All Day DevOps, an online community of 65,000 IT professionals. In 2018, Derek was recognized by DevOps.com as the "Best DevOps Evangelist" for his work in the community.

Chapters

Full transcript

The complete talk, organized by section.

Derek Weeks

I'm Derek Weeks. I'm Vice President at Sonatype. I'm also co-founder of All Day DevOps. I am joined here this morning by Dr. Stephen Magill, who's been my collaborator on a project for the last 10 months, and we're going to share some great research with you today. So we all know as part of the DevOps community that frequent deployments and release cycles are proven to show that there are terrific outcomes that can come from our DevOps investments. We know from the State of DevOps Report by DORA that we're all releasing 46 times faster than our peers. We have seven times lower failure rates. We have 2,600 times faster mean time to repair any failures that do occur within our environments. But we also know from these reports that our DevOps teams are 1.7 times more likely to extensively use open source within our development efforts. As part of this development effort and the velocity that we're trying to achieve, I've been tracking the use of open source and development across large enterprises for the last five years. Just last year alone, in the Java development realm, there were 146 billion open source components downloaded from Maven Central. If you're JavaScript developers, you're part of the community that is downloading or consuming 11 billion npm JavaScript packages each week from their repository. Now, how this translates into your enterprise, if you're just doing Java development alone, in a study of over 12,000 Java development shops, the average organization was downloading 300,000 Java components each year. These were coming from over 2,700 open source component suppliers that your developers are relying on across 8,200 different releases. But not all of these open source components are created equal, because over 27,000 of those components that you downloaded had known vulnerabilities the day that you downloaded them. Now, part of this is research that we've done, Stephen and myself, Gene Kim, and others at Sonatype have been collaborating over the last 10 months on an extensive research project, and we're going to bring the results of that research project to you today. You are the first ones in the world to actually see this, because the report itself just launched this morning. So what did we do in this report over the last 10 months? Why did it take us 10 months? Well, we examined 36,000 open source development projects to get a sense of their practices, software development practices. We looked at 3.7 million open source component releases. We evaluated behaviors across 12,000 development organizations and surveyed 6,200 developers in the process. We also examined 86,000 applications that were built with open source components, and we're going to share some of what we've learned about the development practices within open source projects and enterprise software development teams that you can then take away to improve your own practices. So I'm going to introduce Stephen, who will walk through some of the first findings on our open source projects.

Stephen Magill

Thank you, Derek. Great. Yeah, so as Derek mentioned, there's this huge open source ecosystem out there, with new components being released all the time, including new versions of existing components. And what we wanted to do was take a deeper dive over part of this ecosystem and see if there were any trends that we could extract that could provide advice or guidance for enterprises that are making use of these open source projects. And in presenting these results, since we are here at the DevOps Enterprise Summit, I want to start with the core DevOps mantra of faster is better. Right? So, as Derek mentioned, we've heard a lot about, both anecdotally and then in some great research by Nicole Forsgren, Jez Humble, and Gene Kim, support for this concept that improved deployment times, more frequent deployments, leads to a number of positive outcomes, including in the dimensions of profitability, market share, quality, and so forth. And so we're all familiar with that from the enterprise side. But one question that we wanted to ask when we were looking at these open source components is, does this trend hold in open source? And there's no reason necessarily to think that it would, right? These are two very different worlds, the enterprise and open source. So on the enterprise side, we can achieve multiple deploys per day. On the open source side, for better or worse, we're stuck in this world of versioned releases and using things like semantic versioning to communicate API changes. On the enterprise side, we have consistent development teams, whereas in open source, that's a more fluid group of developers. And on the enterprise side, we have well-resourced development teams. Or if you're snickering at that, at least predictably resourced development teams. On the open source side, it's much less predictable, much less variable. But there are similarities between the two as well, in particular, when it comes to the sorts of metrics that we might want to track in each world. So deployment frequency on the enterprise side corresponds to release frequency pretty directly on the open source side. They occur on different time frames with different cadences, but those are very analogous concepts. Similarly, in the enterprise, mean time to restore is a key reliability and performance metric. And on the open source side, we have things like vulnerability discoveries, which require the same sort of all-hands-on-deck, push-out-a-new-version-as-quickly-as-possible approach, and so a sort of analogous concept to remediation in the open source world. And then we have organizational performance metrics on the enterprise side, things like profitability and market share. And on the open source side, things like popularity are key indicators. People contribute to open source because they want their software to be helpful, to make other people's development lives more enjoyable, and so they want to see people using that software. So now we can look at a couple of these attributes and say, does this faster is better relationship hold in open source? And look at release frequency versus popularity. And this is one of the hypotheses that we entered this project with. Let's see if we can find data to validate this hypothesis, that projects that release frequently have better outcomes. And, in fact, we find support for this. So if you look at the top 20% by release frequency, that group is five times more popular than the rest of the population, attracts, on average, 79% more developers to contribute to the project, and has 12% greater rates of foundation support. All right. So that's cool. So, that was one hypothesis that we wanted to test. Question? Yeah. So this is a correlation. Yes. Thank you. I wanted to say that upfront. In this research, we're investigating trends and correlations among the attributes. We can't, for example, with this say, whether these projects release more frequently because they have larger development teams, or maybe they attract more developers because devs see them accepting changes and releasing frequently, and they see their code having an impact. We don't know which way that goes. And that will be the case for all of the things that I'll describe. You should view these as descriptive statistics about the population. All right. So before I get into the other hypotheses we investigated, I want to talk a little bit about the dataset. So first of all, we focused on Java projects published to Maven Central. There are about 260,000 of those. And then we applied a number of filters to get down to a core set of components that we felt we could analyze well. And so those filters were, first of all, we looked in the last five years, right? Because development trends, culture, tools, and technology have changed over time, and we wanted to find things that hold today and in the recent past. So we looked at the last five years. We also threw out components that we didn't have enough data about to really draw conclusions. So, for example, we wanted to measure release frequency, the average time between releases. The component has only put one release out there. There's never been a follow-up release. We can't even measure that, right? So we take those out. We similarly take out projects that are not actually part of the software supply chain. They don't use any open source libraries, and they're not used by any other projects, so they're just sort of isolated off by themselves. So when we apply all of these, we get down to a core set of 36,000 components that we looked at in our research. And for those components, we looked at a number of different attributes. Things like popularity, the size of the development team, development speed, release speed, and so forth. For many of these, we have data across those entire 36,000. So, for example, popularity we define as the average daily Maven Central downloads, and we have that data for every component in that dataset. For other things like size of development team, we get that from GitHub data associated with the project. So we only have that for the projects that are on GitHub, and there are about 10,000 of those. And so most of these attributes are self-explanatory. There's a couple at the bottom, though, that warrant a little bit more discussion. So security and update speed are a little bit more complicated because of the complexity of open source supply chains or software supply chains in general. So I want to give a visualization of how those are defined. So here we have an example of three components, A, B, and C, and the dependency relationship between them. So the way to view this chart is time is marching along from left to right. So version 2.2 of B comes out, then version 2.2 of A, then version 2.3 of A, and so on, left to right. The lines show dependency relationships, so for example, version 2.2 of C depends on version 2.2 of B. And then we also have vulnerability disclosure represented here. So there's a point at time at which there's a vulnerability reported against component B. And then B releases version 2.3 to mitigate that vulnerability. So there's this period of time where B is vulnerable, and because C includes B as a dependency, there's a period of time where C is vulnerable. And so we can measure each of these times, but if you think about it from C's point of view, the important timeframe to think about is how long it takes him to respond to that patched version, right? The release of the patched version of B is really the first opportunity C has to now mitigate this downstream security risk that he's imported via his software supply chain. And so that's the key security relevant metric that we measure, and we call that time to remediate or TTR. We also just measure update time in general. And so that's a new release of B. So C takes some time to incorporate that new release of B. That's the update time for B. There's an update time for A as well, even though there's no security vulnerability against that. Every new release, there's some associated time to update, and so we track that as TTU. And then the last attribute we looked at is this notion of stale dependencies. So we often see a project release and maybe some of its dependencies will be updated to the latest version, but others will be behind. And you see that happening here with C, where A version 2.3 has been released at the point where C version 2.2 comes out. But C is not using that. They're not using the latest version of A, so we record that as a stale dependency. So those are really the three key metrics that we track from a update hygiene perspective, right? Time to remediate, which is the security relevant portion of these metrics, time to update, and stale dependencies, which are just general update hygiene metrics. And, I want to focus on the security relevant part for just a bit, because of what Derek was saying about the prevalence of vulnerabilities in the supply chain and how that trickles down into users of those open source projects. So, if we look at the time it takes these projects to apply security relevant patches, the median time is about six months, which is already not great. And it gets even worse if you look sort of at the right of this figure, at the 95th percentile, we see that 5% of projects take three and a half years or more to apply a security relevant patch. And these are not projects that just never applied the patch. They did eventually apply it. It just took them three and a half years to get there. So clearly if you're thinking about projects to use in your software, you want to be over at the left-hand side of this figure, right? And one question we wanted to ask when we saw this was, okay, if you look at these projects that are really good at attending to security and applying security patches, do they just tend to be good about staying up to date in general? Because if they do, that's a useful thing to know, because then you can look at update behavior more broadly. And not every project has a security vulnerability that they've had to mitigate, right? So there's much more data about update behavior in general than there is about how you respond to vulnerabilities, right? So that could serve as a good guide when selecting components. And actually, we do see a correlation between update behavior in general and update behavior for security relevant updates. This is actually a plot of how quickly projects apply non-security relevant updates versus how quickly they apply security relevant updates. And we see a reasonable correlation here. There's a.6 correlation coefficient between these two data values. And you certainly see projects that fall on one side or the other. They are a bit better about security or for whatever reason end up performing better on non-security. But if you dig into the data little bit more, we see that 55% of projects have an MTTR and an MTTU that are within 20% of each other. So they're sort of close to this line in this figure here. And if you look for projects that manage to stay up to date from a security perspective while not updating dependencies in general, so they do very good at remediating vulnerabilities, but don't keep the rest of their dependencies up to date, it's a small population. Only 15% of projects end up exhibiting that behavior. So the conclusion to draw from this is that most projects stay secure by staying up to date. A common behavior is to just stay up to date in general and as a consequence, be secure. So that was a second hypothesis that we entered into this research with, and we found data to validate that. Another hypothesis that we came in with was that projects with fewer dependencies will stay up to date better. And intuitively, this seems to make sense, right? If you only have two or three dependencies, it should be pretty easy to keep them up to date with the latest version. Certainly easier than if you have 10 or 15. In fact, we actually found the opposite. So components with more dependencies actually had better update hygiene. They stayed more on top of their dependency version updates, and to statistically significant levels. And so this was so counterintuitive and intriguing that we dug a bit deeper, and found that actually the reason this occurs is because components with more dependencies also tend to have larger development teams. And if you look at large development teams, just having a larger development team is associated with a faster MTTU rate and a faster release frequency. So you can see here's a plot of number of dependencies is increasing as you go to the right, size of development team is increasing as you go up. This is a smooth plot, so you can see the trend line better. But there's this correlation between dev team size and dependency number. And again, we don't know which direction it goes. Maybe you need more developers to manage all these dependencies, or maybe every developer brings his own favorite dependency, and you end up with four unit testing libraries. We haven't investigated that yet. All right. Hypothesis four was that more popular projects will be better about staying up to date. And we really wanted to look into this one because so many people use popularity as a proxy for security, right? Everyone else is using it, so it must be a good project. It must be secure, it must be useful, it must be reliable, right? And again, this was one where we did not find support in the data for this. So first of all, there's plenty of popular components with poor update hygiene, but there's always those outliers, right? But more interestingly, we don't see any sort of correlation between these two attributes. And even if you look at the most popular projects, you say, "Okay, I'm just going to look at the top 10% by popularity," those are not statistically better with respect to update behavior, than the general population. So if you take one thing away from this talk, don't choose your components just based on popularity, right? Fold something else, other things into your consideration in those sourcing decisions, because popularity on its own is not a great guide. And so we wanted to get-- So those were the hypotheses we tested. In addition to those hypotheses, we wanted to just break the data set up into groups based on this update hygiene and look to see what sort of behaviors and other associated attributes we see with these different subpopulations. And so we identified five different groups of interest, two that have exemplary update behavior, so these are the small exemplar and large exemplar, and then three that are not exemplary with respect to updates. What was interesting in the exemplar category, so projects that really have exemplary update hygiene, they're always staying up to date, they're at the cutting edge, they remediate vulnerabilities quickly. You saw sizable subpopulations there, that had small development teams as well as populations with large development teams. And so, in the small category, the average dev team size is just one point six developers. So these are very small projects, but still managing to stay very much up to date. In the large category, you see on average nine-developer teams, exemplary MTT. These are very likely to be foundation supported, which is interesting. They're also much more popular than the rest of the population. And so this is sort of the open source industrial complex, if you will, or open source foundation complex, I guess. So really doing a great job delivering value there. We have a category of laggards, which are the bottom 20% with respect to update hygiene, and then we identified two other interesting classes. So, there's the features first class, which is, they're updating frequently, so they have sort of that update bandwidth to stay up to date with dependencies. They're just not focusing their effort there. They're releasing new versions, releasing presumably new features and so forth, but not maintaining their dependencies. And then there's this cautious group, which is kind of interesting. So they stay generally up to date with their dependencies, but not at the bleeding edge. They tend to adopt updates a little bit later after maybe they've been vetted by the community. And so the sizes of the classes are there in parentheses. And this is just a nice graphical depiction of what I showed you before. So we have the different subpopulations in different colors. You can see the exemplars over here at the left. So this is a plot of popularity versus update time. So at the left, you see projects releasing frequently, and they tend to be more popular. And the exemplars in particular are generally more popular and release frequently. And then, another thing to note is what I said before about hypothesis four, not all popular projects are exemplary. So you can see the prevalence, the big spread of red dots across there, and some very popular projects that have very poor update hygiene. All right. So I'm going to turn it back over to Derek to talk about the enterprise side of this. So as you're thinking about consumers of open source, how do they behave?

Derek Weeks

Great. Thanks, Stephen. Yeah. We've been looking in depth at this for weeks and months now, and I'm still like, "Oh my gosh, this is so awesome," looking at just Stephen present it. So as we are going through the data, one of the questions that we had, we're looking at open source project development teams, and we wanted to understand, how does this behavior actually correlate to enterprise or commercial software development teams? Are they updating dependencies and have this kind of rigor that we're seeing in open source projects? So back at the beginning of May, we went out and surveyed 652 developers and commercial development teams about their practices. And we asked them, how frequently are you updating? Do you have processes in place to update your dependencies? Is there a process to remove vulnerable or troublesome dependencies? Do you apply automation to any of these practices? And I think surprising to me, I actually expected 10% of the population to come back and say, "Yeah, we're totally doing this. We have a practice." But what you see from this is these are the answers of people that within the survey agreed or strongly agreed that these practices were in place. So, 38% of people have a regular update practice for daily work associated with dependencies. 50% have a process to support a new dependency coming in. 37% are applying automation into these environments. So not only was this behavior very visible to us in the survey, but we are able to use a couple of the different questions within the survey to identify exemplar behavior within this population. And the cool thing about this was the exemplars, when you looked at the practice of updating dependencies as part of daily work, they were 10 times more likely to do this than their counterparts. When we were looking at using some form of process to update dependencies, the exemplars were 11 times more likely to be doing this. When applying automation to managing dependencies and updates and compliance within these environments, they were 12 times more likely to apply automation. Now, the interesting question that we ask associated with this, that Gene and Stephen brought into the survey, was how happy were developers with these processes in place? And the exemplars were 3.2 times less likely to say updating dependencies were painful, or 2.6 times less likely to say updating vulnerable dependencies is painful. And what we see is that when you have these practices in place, when you're climbing this mountain every day, the journey is really easy. You're not huffing and puffing along. This is okay. I'm feeling better. Yes, it's part of my daily work. But we're doing it in stride. If you're doing this once or twice a year, trying to update your dependencies, you hate this kind of work. So there's a high correlation between what's happening in terms of regular frequency of updates with the satisfaction of the people that have to do the work. So one of the other things that we did is earlier in the year, in January, we went out and surveyed 5,500 developers and DevOps pros. And as part of this, we asked them a question in this survey about, do you have an open source governance policy? And if yes, do you follow it? Now, the cool thing about this, when we looked at mature DevOps organizations versus those with no DevOps practice in place, there was 62% of developers in mature DevOps organizations said, "We have a policy and we follow it," versus 25% in the non-DevOps shops. But what this really told us is that where automation was visible, and we asked them what kind of tools that they were using, where automation was visible, it was a lot more difficult for the developers to ignore that automation within those environments because the tools are presenting information about there's a new dependency or there's a new update or there's a new version available, or you're not compliant using that version. So they're able to act upon the information through automation. Now another thing that I mentioned is we looked across 86,000 applications that were using open source components as part of a managed software supply chain. Within this, the first thing that we wanted to know is how old are these components? Because that would tell us are enterprises actually updating their components to the latest versions? And what we found was about 50%, just over 50% of the components that they were using had been created since 2017. So they're about three years old. And the other 50% it goes on for a decade of old components in those infrastructures. But when you take these components and you look at the vulnerability defect density, what you see is that components that are less than three years old have a defect density rate of 9.3%, which is 65% lower than the older components, which generally have about a 15% defect rate. Now, we also compared this against applications that were in unmanaged software supply chains. And in those applications, we studied about 500 of those, they're not as easy to get a hold of, but they had about 21% of the components in those applications had known vulnerabilities associated with them. So where managed software supply chains and observability were being applied into these, and manageability were being applied into these supply chains, there was a dramatic reduction in the presence of known vulnerable components. So I'm going to invite Stephen back up as we wrap up the discussion here about three quick takeaways that we can bring to you and your organizations. So the first part, when we share this data with people, first of all, it's like, wow, this is interesting. I haven't seen this before. I want to dig into it a little more, but if I'm going to use any of this information, where do I start within my own organization? And the first answer that I have given people for years is observability. If you don't know that you're using open source components, you need to get some kind of visibility to that. So you need to open up to what are we putting through our DevOps pipeline? What is in the applications that we're building with? What components are we consuming? How many of those are we consuming? And what's the quality of those components, right? Right.

Stephen Magill and Derek Weeks

And on the quality perspective, I mentioned that popularity is not a great guide for selecting these components, so what is a good guide? Well, the time to update metrics that we tracked, I think are a good one. Release frequency is also highly correlated with time to update, so you can look at release frequency as a possible proxy metric there. But I think that the thing to keep in mind when you select open source components is that you're inheriting their approach to mitigating vulnerabilities in their dependency, right? The supply chain traces back through those components, and if they're using vulnerable components of their dependencies, you're sort of stuck with that. So you want to be picking projects that you know have good hygiene with respect to adopting these updates. And then this last point is if you have open source releases that your company is putting out there, and you want to be in this exemplary category of releasing frequently and updating your dependencies, these are sort of the guidelines to shoot for in terms of behavior there. And then, if you're contributing to some open source projects out there, think about updating a dependency as a useful contribution. It's like contributing documentation. It doesn't always get the most notoriety, but it's an important contribution to make to these projects. Yeah, and think about just as you're trying to release faster, and you have these higher deployment frequencies and higher meantime to remediate from failure within your enterprises, think about who are the suppliers that you're actually working with. Because these open source projects aren't just, hey, it's a component. These are external suppliers of software that you're using to build or assemble your own applications. So what are the qualities of suppliers that you want to rely on, and what kind of metrics can you apply to that kind of practice? So one of the other things that we wanted to share with you was the report itself. So my email address is very simple. It's weeks@sonatype.com. My out of office message is on today. If you just email me, there's a link in that out of office message which goes to the report. You don't need to sign up on a website to download it or whatever. It's on SlideShare. You can download the PDF. You can read it for yourself. I think one of the key things that we'd love you to do while we're at the conference, myself, Stephen, Gene Kim as well, being involved and collaborating on this research, is that as you read the report, now all of the charts that you saw within the presentation are all in the report. We just copied them into the slide deck, is questions from you about what else do you want to know? What would be helpful in your organizations? What do you want us to put in next year's report of questions that we can really dive into that would help your organization apply this kind of knowledge into the enterprise? Yeah, I think we got some good results, but we're just scratching the surface in terms of what's there to be discovered in this data. So, please reach out with suggestions for things to investigate. Thank you. And then, just a quick note for more discussion on this, we're going to have a panel discussion tomorrow afternoon. I think it's 11:40, so just after the morning keynotes break out, we're going to have Jane Groh, Stephen, David Jones from Credit Suisse is going to join us as well. I'm going to moderate the discussion, but to hear more insights on the data and the research and how enterprises can apply it to upskilling, observability techniques, and so forth, that we'll cover there. So thank you very much. We appreciate your time.