A Data-Driven Look at Open Source Software Supply Chains
In a year long collaboration with Gene Kim and Dr. Stephen Magill, we objectively examined and empirically documented software release patterns and cybersecurity hygiene practices across 54,000 commercial development teams and open source projects.
In this session, we will present evidence on the outcomes of that research, highlighting organizational and technology practices that enable exemplar open source teams to deliver 50% more commits, release new code 2.4x faster, and remediate security vulnerabilities 2.9x faster, all while delivering a level of value that makes them standouts in terms of popularity and adoption.
Chapters
Full transcript
The complete talk, organized by section.
Derek Weeks
Derek Weeks: Okay, we're live. Hey. Hi, everyone, and welcome this afternoon.
I am Derek Weeks. I'm a vice president at Sonatype. I'm also co-founder of the All Day DevOps conference. I'm joined today by Dr. Stephen Magill. He is CEO of MuseDev and principal scientist at Galois. We are going to talk to you about some research that we've been collaborating on with some others over the last 10 months or so, or year or so by now.
An organization's journey to excellence begins once it ceases to sacrifice quality for speed.
Now, wait, we're at a DevOps conference, right? We talk about speed all the time, and we know that high-velocity, high exemplary DevOps practices deploy 67 times more frequently according to DORA and the State of DevOps report. We know they have seven times lower change failure rates. We know that they're 2,600 times faster to recover when failures happen within their organizations. We also know from the State of DevOps report that these organizations are 1.7 times more likely to extensively use open source within their environments.
Over the past five years or so, almost six years now, I've been studying the use of open source components within software development around the world across tens of thousands of organizations.
One of the things that we saw last year, and we've documented this in a report we're going to share with you, is that in the Java development realm alone, about 10 million Java developers last year consumed 146 billion download requests of Java open source components. If you're a JavaScript developer, or you have JavaScript developers in-house, six and a half million JavaScript developers around the world consume an average of 60,000 components a year, and they are downloading, as a gross population, over 11 billion npm packages a week.
The consumption of open source is prevalent in DevOps practices as well as non-DevOps practices. What this means is when we look at the average enterprise, like your own, we studied over 12,000 enterprises using open source within their development practices. Just in Java alone, the average organization is downloading 313,000 Java open source components on an annual basis. They are downloading these from over 2,700 different suppliers or open source projects within the community. These are projects or suppliers that they are relying upon to write code for them and to source externally because I don't want to write the code myself. And we're relying on over 8,200 different versions of those projects or individual releases from those projects within those downloads. But we know not all of those downloads are created equal. In fact, almost one in 11 of those downloads had a known security vulnerability at the time that it was downloaded into your enterprises.
What does this really mean from a, we're consuming all of this open source? It means that 85% of the applications that we are building are composed using code from these external suppliers, code that we didn't write ourselves, that is making us more efficient and allowing us to be much faster in our development practices as a result.
It was part of this knowledge of this massive consumption that's happening and the speed of development and efficiency that's happening that paired Stephen, myself, Gene Kim, Bruce Mayhew, Gazi Mohammed, and a number of other security researchers and data scientists to pull together the 2019 State of the Software Supply Chain report. We spent almost a year collaborating on this research, and we're going to share a lot of that research with you today.
To give you a sense of what research it was, we walked through a software supply chain, which every one of you has within your organization or relies upon, where you have open source projects that contribute code to internet-based warehouses. That code is then downloaded from those warehouses into your software development teams and built into the finished goods and software applications.
We looked at over 36,000 open source projects for this research. We looked at over 3.7 million releases across that software or across those components. We also studied over 12,000 organizations developing applications using open source components. We surveyed over 6,000 developers this year for this study. We also evaluated over 86,000 applications that were built using open source components to get a better idea of who the best open source projects were and who the best suppliers are of this code.
I'm going to hand off to Stephen Magill now, who's going to walk you through the first part of the research from the report, and then I'm going to come back and cover the second part.
Dr. Stephen Magill
Stephen Magill: Okay. Thank you. I'm going to walk through the analysis that we did of the open source ecosystem and some of the results that we found there.
The first thing I want to talk about is this notion of faster is better. We're at the DevOps Enterprise Summit, and as Derek mentioned, we have all these stories about how speed and release velocity, deployment velocity, are really tied up with a lot of positive outcomes from a technology and business standpoint. We've heard that in anecdotes during the experience reports that we've heard at the conference this year, and then we've also heard it empirically and rigorously validated by Nicole Forsgren's research and her work with her team.
This faster is better premise holds in the enterprise, but does it hold in open source? Can we find the same sort of signal, the same connection between velocity and positive outcomes in the open source world?
There's no reason to think that this would necessarily be the case. The enterprise and open source community are two very different worlds. On the enterprise side, we can achieve multiple deploys per day. In the open source world, it's more about versioned releases, using semantic versioning to communicate API changes, and pushing out new code on a several-month timescale. On the enterprise side, you have a consistent group of developers. There's some turnover, and people switch teams and so forth, but by and large, you have a predictable set of developers. It's much more fluid on the open source side, with developers coming and going and many just contributing a single code change. On the enterprise side, our dev teams are well-resourced, or if that sounds ludicrous to you and you're snickering, at least predictably resourced. Maybe the budget's too low, but you know it'll probably be that same number or close next quarter. On the open source side, those resources are highly variable. It's much different doing project planning when you don't know how many developers you're going to have next week.
They are two different worlds. On the other hand, there are similar metrics on each side. We can find analogous attributes on the open source side that correspond to some of the things that we're interested in and track on the enterprise side. Deployment frequency, this key metric on the enterprise side, corresponds in some sense to release frequency: how often are open source projects releasing new versions? The timescales and the cadences are very different, but they're analogous concepts. Similarly, when we talk about organizational performance metrics on the enterprise side, things like market share and profitability, I would argue the open source analog for that is popularity. Open source contributors contribute because they want their work to have an impact. They want to improve other developers' lives, and that means getting their code used, which corresponds to popularity. On the enterprise side, mean time to restore is a key metric of how well the organization responds to incidents and downtime. On the open source side, a very similar situation occurs when you have a security vulnerability that's reported and you have to push out a new version that mitigates that vulnerability. It's the same sort of all-hands-on-deck, let's-push-a-new-release-as-quickly-as-possible scenario that happens when you respond to an incident in the enterprise.
To answer this faster-is-better question, we can look at these analogous attributes on the open source side, and we can compare release frequency and popularity and see what sort of connection there is. Going into this research, this was our first hypothesis: do projects that release frequently have better outcomes?
We did find supporting evidence for this in the data. Projects that release frequently, if you look at the top 20% by release frequency, are five times more popular on average. They have 79% more developers, and they're supported at greater rates by open source foundations. These are all statistically significant differences in those attributes. I'm stating correlations here. We haven't looked into the causation aspect of this to see whether these projects are more popular because they have more developers or they attract more developers because they're more popular. We don't know which is the leading or lagging indicator, but that's one connection that we found.
We also looked at these more security-relevant metrics, things like mean time to restore and time to remediate vulnerabilities. I'm going to spend most of it talking about those security-relevant and update- and responsiveness-relevant metrics.
First, I want to say a little bit more, following on to what Derek said, about how we constructed our dataset. We started with Java components that were published to Maven Central. That was the starting set. Then we filtered out components that we didn't really have enough data about to analyze in a productive way. For example, we're looking at things like update frequency, so a component has to have published an update. If it only ever released one version and that's it, that's the only version of that component, that's not helpful to us. We filter those out. We also look just at components over the last five years because development trends, technologies, and tools have all changed over time, and we wanted to find interesting correlations in the data that hold for the current development environment.
We also filtered out components that just aren't part of the software supply chain. They don't use any open source libraries, and they're not themselves used by any other components. They're isolated, not part of the supply chain, not part of a dependency tree, and so we filter those out. We do all of that selecting for what we need to collect the attributes that we wanted to analyze, and we get down to this core set of 36,000 components. That is what we focused on.
For that 36,000, we had enough information to compute a variety of attributes. We looked at popularity, measured as the average number of downloads from Maven Central for that component each day. We looked at the size of the development team, the development speed, so how often code was committed to these repositories, release speed, presence of CI, and so forth, and then a couple of security- and update-related metrics, so security and update speed. Most of these we have for all 36,000. A couple are based on GitHub metadata, so size of team and development speed are based on information that we get from GitHub statistics about a project. We only have those for projects that were hosted in GitHub. There were about 10,000 of those.
These last two metrics, security and update speed, are a little more complicated, but they're really the core attributes that we studied here. They're both measures of how quickly an application responds, how responsive a group of developers are in various scenarios. One is more security-oriented, and one is more oriented toward staying up to date.
To demonstrate these, I've got a graph of a typical set of software releases. We have three components here: A, B, and C. You can view time as marching along from left to right. Dependencies are solid lines. C depends on A and B. In particular, version 2.2 of C depends on version 2.2 of A and 2.2 of B. We have a couple of interesting events here, this vulnerability B event, indicating that at some point in time, someone discovers a security vulnerability in component B. There's some period of time now where B is vulnerable. Someone knows about a vulnerability there, and it hasn't been patched. Then we're assuming that version 2.3 of B actually mitigates the security vulnerability. They patch it. They put out a new version that doesn't contain that problem.
C, because C depends on B, also inherits this vulnerability, and it has a certain vulnerability time. But from the perspective of C's development team, the core time period to focus on is the time where B has released a new version, that version patches a security vulnerability, and then the clock is ticking for C to incorporate that themselves. That's the first point at which they can fix their upstream vulnerability. We're going to call this remediation time for C. We can compute this remediation time for individual updates and average it for components, and talk about this TTR metric, time to remediate.
That's a security-oriented metric. We then have a general update-oriented metric, which is time to update. Since B has a release, it takes some time for C to adopt that release. That release happens to be security relevant, but there are other releases that aren't security relevant. A here publishes version 2.4; again, the clock is ticking to see how long it takes C to adopt this new version of A. When we consider that time period for all updates, whether security relevant or not, we call that time to update, TTU.
Then we also look at what we call stale dependencies. A component might release a new version but not update all of their dependencies. Those dependencies that are lagging behind, we call stale, and we account for those as well. Here A has released version 2.3. It was published when C released version 2.2, but C didn't update to that newer version.
Those are the key metrics: time to remediate, time to update, stale dependencies. We're going to explore a number of questions about those metrics and how various open source projects behave with respect to update hygiene, security hygiene, et cetera.
I want to start with the security-relevant one. This is time to remediate, that security metric I was discussing before, and this is just a graph of the TTR behavior of the entire population. We can see, first, this median TTR: the dot at the far left shows 180 days median remediation time, which means that 50% of the population takes more than six months to remediate a security vulnerability in a dependency. That's already not great. It gets even worse if you look at the far right. The top 5% of projects take over three and a half years to adopt security-relevant changes. Those aren't projects that just never adopted a security patch. It came out, they did eventually adopt it, it just took them three and a half years.
Clearly, if you're selecting projects, if you're selecting dependencies to fold into your software supply chain, you want to be identifying projects that are attending to this, that are keeping on top of updates, not just to their direct dependencies but to everything in their supply chain.
One question we had is: this section of the population is good from a security perspective, but are they just good about updating in general? Are they just paying attention to security, or are they just keeping up to date? Which behavior dominates? What we find is that they're actually closely connected. This is a graph of the time to adopt security-relevant updates versus the time to adopt non-security-relevant updates, and you can see there's a correlation here. The correlation coefficient is 0.6 in this case. There is a sense in which the two track each other.
We can also dig a little deeper or slice it different ways and find that if we look at MTTRs and MTTUs that are within 20% of each other, trying to characterize similar behavior in terms of security and non-security-relevant updates, 55% of the population falls within that. If you imagine a cone, a slice near that diagonal line, those are the projects we're talking about there. In particular, if you think about the opposite of this, are there projects that don't update frequently but still manage to stay secure? We don't see a lot of that behavior. If you look at the projects that manage to maintain better-than-average security behavior while having worse-than-average update behavior, only 15% of the population falls into that category.
That's hypothesis two: projects that update dependencies more frequently are generally more secure, and we found a variety of evidence for that. That's an interesting finding because it gives you another way to evaluate the quality of projects. We're all interested in security; that's the top-line concern, and so we can evaluate that by looking at security update behavior. But a lot of projects never have a security vulnerability reported against them. We have no data for those projects and no way to evaluate them. Whereas we have update behavior data across the board. Every project, we know how frequently they're updating. If we can say frequently updating is a good proxy for quality and security, that gives us another thing to latch onto to include in our set of criteria for how we evaluate these projects.
Hypothesis three was that projects with fewer dependencies would stay more up to date. This seems intuitive: you have less to keep on top of, so you'll be better at getting it done. We actually found the opposite. Components with more dependencies are actually better at keeping them up to date. This was so not what we expected that we dug a little bit deeper and found that what's going on is actually projects with more dependencies tend to have larger development teams. Projects with larger development teams tend to be better at keeping things up to date and attending to these project hygiene concerns. You can see this relationship here. I've graphed the number of dependencies versus the average size of the development team, and you can see that increases as dependencies increase. The more dependencies you bring on, the more pizza you have to buy.
Hypothesis four, and this is I think the most interesting finding of the report, is: if you focus on one thing or take one thing away from this talk, think about this. We went into this thinking probably more popular projects will be better about staying up to date. They'll be better about adopting security updates, et cetera. Again, we found no evidence for this. First of all, there are plenty of popular projects that have poor update hygiene, but that's not really surprising. There are always outliers. But if you dig a little deeper, you find that popularity doesn't even correlate in any sense with update hygiene. Even if you just focus in on the most popular projects, say the top 10% by popularity of the population, there's no statistical difference between the update behavior of those projects and the update behavior of the rest. Popularity is not a good proxy for update hygiene and security. If there's one takeaway, don't base your decisions just on popularity. Look at other things. Consider other factors.
To see some of the behavior that's behind that and get more of a sense of how different groups of project maintainers behave, we broke the data down further and looked at various clusters to see what behaviors were in common between different types of projects. We broke it down into five different categories: two that have exemplary update behavior, in the top 20% of update hygiene, and three that are not. Among the exemplars, there's representation from both small and large teams. There's a sizable set that has small development teams, an average of 1.6 developers per team, and are still staying very on top of updates. Then there's the large exemplars, which have on average almost nine developers on their team. They have exemplary update behavior. They're very likely to be foundation supported, and they're high on popularity. This is like the open source industrial complex, or open source foundation complex: Apache Foundation, Linux Foundation, those supported projects, those big projects that are really setting a high standard for quality.
Then we have the laggards, which are behind in MTTU, have high dependency count, and are just not keeping up. One of the most interesting classes is this features-first class. I talked about these projects that are popular but are not keeping up from a security perspective. Some of those are in this category where they're doing frequent releases and so they have the release bandwidth, essentially, to stay up to date from a security perspective, but they're not spending their effort there. They may be prioritizing features or something else. Whatever they're doing, they're not attending to security. They're just prioritizing features. Then there's the cautious group, which has good update hygiene, so they keep up to date generally, but they're not at the latest version. You see them adopting updates basically a version or two behind. They're not falling behind and staying completely out of date. They're generally staying secure, but they take a more cautious approach.
This is that data represented graphically. I've got the different groups tagged with different colors. You can see, first of all, that exemplar category is all the way here at the left. As I said, these release quickly and tend to be more popular. If you're sourcing projects for your open source supply chain, you should try to draw them from here. This is a representation of what I was saying with popularity not being a good guide. You can see that in how far this box spreads to the right. On the x-axis is average days between release: how quickly do you release updates? How up to date are you staying? On the top is popularity. You can see there are some popular projects that are not great from an update hygiene perspective.
Now I'm going to turn it back over to Derek. I've been talking about the open source supply side of the equation. Derek's going to say more about the consumer side and what's happening in the enterprise when we look at how they deal with open source dependencies.
Derek Weeks
Derek Weeks: Cool. Thank you. Thanks, Stephen.
One of the things that we found as we were going through the research was we saw all these open source projects that were updating frequently, and we saw this exemplary behavior. But we wondered: what are developers doing in your enterprises? How are you behaving with dependencies?
While we were in the middle of doing this research, we said, let's go out and survey a bunch of developers, which we did. We surveyed 658 developers, and we asked them about how they are managing dependencies. Do you have a process for managing your dependencies? Do you have any automation that you're using to manage dependencies? Do you have a process in place for removing troublesome dependencies when problems arise in there? Are you using the latest version of dependencies in projects that are out there in your environment?
The surprising thing about the survey was how many organizations and developers said that they had these practices in place. I think we all kind of expected about nine or 10% of these organizations to say, yes, we're doing this. I think we all felt like these answers might be more aspirational versus what they're actually doing, or they're representing that they have a part of a process in place to do this. It might not be the most mature process, but there is some part of a process that exists or some piece of automation that exists to support us. It may not be comprehensive in that realm.
But when we looked at these practices and began to identify the clusters of exemplary behavior versus non-exemplars, what we found out is that in the exemplar behaviors, they were 10 times more likely to schedule updates of dependencies. They were 11 times more likely to have a process in place to update dependencies. They were 12 times more likely to have automation in place to support these practices. The cool thing about this, and the finding at the top of the mountain here, is that when it came to updating dependencies, whether there was a security vulnerability in place or it was just the practice of updating dependencies, it was a lot easier for these organizations. They were less likely to consider that activity painful if they were doing this frequently.
When you're climbing the mountain every day, when you're updating your dependencies every day and you have these practices, it's pretty easy, or it feels easier to do this. If you climb the mountain once a year, you're trying to update your dependencies once a year or once every other year, it's going to be a difficult trudge up the mountain. That's just like trying to do multiple deploys a day: deployments get a lot easier. When you're doing one deploy every six months or one deploy every year, it's a lot harder. We're seeing the same behavior within enterprises and how they're managing their open source components and dependencies.
The other thing we found through the research, and this was part of the survey, was we went out and surveyed over 5,500 developers. In this one, we asked the organizations to identify: are you using DevOps practices? Do you consider yourself to be mature in your DevOps practices versus having no DevOps practice? Of those organizations, we asked them: do you have an open source policy in place? If you do, do you follow this process? Where automation exists more in these environments, that was also part of the survey, we found that where automation existed, it was more difficult to ignore. Developers are aided by information about the components, what's a good component, what's a bad component, and they're two and a half times more likely to apply those governance policies in those organizations where automation was more present.
The other thing we wanted to understand is what open source components everyone is using out there and what the quality of those components is. One quality that we looked at across these 68,000 applications was the age of the components. Within this chart, 51% of the components are three years old or younger. They've been developed or released in the last three years. But half of the code within the applications that you are all building is three years old or more. That means you're relying on parts from suppliers that have been out for a long time.
This makes a difference because when we look at the vulnerability defect ratio within these components, you see that the components that are younger than three years old have a 9.3% defect density, and those that are older than three years have a 15% defect density. If you just had a rule in place that says your developers can use any open source components they want, as long as they're three years old or younger, then you can reduce your security defect density by 65% just on that practice alone. You don't have to say, use more secure components. You just have to say, use the latest, newest versions of these components, and you will, by default or by consequence, remain more secure as part of that.
We also saw in this survey that exemplar DevOps teams were relying more on tooling to tell them about security information and security issues within their applications versus the laggards or those that didn't have DevOps practices, who were not relying on tools as much. When tools are present, one, they're alerting developers to more information. We saw in one of the previous slides they're more likely to follow that information that's provided by their tools and by the automation, and therefore staying more secure as a result.
At the end, the part of the research that we showed as well is in managed software supply chains where you're looking at the quality of these components and the attributes that are being consumed among the enterprise. They're staying 55% more secure, or have a lower security defect density, than those in unmanaged supply chains, where 20% of the components in the applications developed in unmanaged software supply chains had known vulnerabilities when the applications were built. We went from about 8% of the downloads being known vulnerable to 20% of the components that were used in the end applications being vulnerable.
As we wrap up, we wanted to offer some quick takeaways from, okay, you got this data, what do you do? There's more data even in the report. The first takeaway that I'll offer is you have to start with observability. You have to know what you're using. If you don't know what open source you're using and what you're consuming, you cannot do anything to change picking the right quality from the right suppliers. You have to have an active view on what you're consuming within the enterprise and where it is within your applications.
Stephen Magill: Pay attention to the criteria that you're using to select these components. Don't just use popularity, as we were saying. A better proxy is maybe release frequency, things like that.
I'd ask everyone to be good open source stewards. If you're making contributions to open source, think about updating dependencies. I was surprised in the research by how many popular components just have very out-of-date dependencies and transitively are importing security risk. Pay attention to fixing that up. As you contribute to open source, if you put a project out there, aim for four releases with 80% of your dependencies up to date. That will put you in that exemplar category and ensure that you're one of these better-performing open source projects and not introducing vulnerability into the supply chain.
Derek Weeks: The quick and easy answer, if you want a copy of the slides or the State of the Software Supply Chain report or the 2019 DevSecOps Community Survey, my out-of-office message is on. If you email weeks@sonatype.com today, it's only on today, it has links to the report so you don't have to register to download it or anything like that. I tested it, so it does work. You can find those and download those and read the research yourself. We didn't cover the complete body of the research in this presentation today.
Thank you very much for attending. We really appreciate it. I know the other session's coming in, but we'll be available for questions. The other thing I'll say, just because this is recording, my out-of-office message is not on all the time. If you're watching the video at some later date, please say, hey, I was watching your presentation at DOES in Las Vegas. Could you send me those slides so I know what you're referring to? There's always someone that just sends me an email blank, with no reference, and I'm like, what is this about? So just a note for those watching the video. Thank you.
Stephen Magill: Thank you.