Exemplars, Laggards, and Hoarders: A Data-Driven Look at Open Source Software Supply Chains

Log in to watch

London 2020

Exemplars, Laggards, and Hoarders: A Data-Driven Look at Open Source Software Supply Chains

In a year long collaboration with Gene Kim and Dr. Stephen Magill, we objectively examined and empirically documented software release patterns and cybersecurity hygiene practices across 54,000 commercial development teams and open source projects.

In this session, we will present evidence on the outcomes of that research, highlighting organizational and technology practices that enable exemplar open source teams to deliver 50% more commits, release new code 2.4x faster, and remediate security vulnerabilities 2.9x faster, all while delivering a level of value that makes them standouts in terms of popularity and adoption.

Chapters

Full transcript

The complete talk, organized by section.

Dr. Stephen Magill

Hi, I'm Stephen Magill. I've been doing academic research and software analysis, security, and programming languages for more than 15 years. First as part of my PhD work at Carnegie Mellon, and then at other universities and industry research labs. Over the last few years, I've been getting more and more interested in the practice of software. In particular, open source development practices, how enterprises approach software, and how to best contribute to these communities by improving tools and practices, and how this all comes together to generate modern software. And,

Gene Kim

my name is Gene Kim. I've been studying high-performing technology organizations for 20 years, and one of the funnest things I've ever gotten to work on in my career was something called the State of DevOps Report with Dr. Nicole Forsgren and Jez Humble, which resulted in the "Accelerate" book. And it was so fun to be able to take many of the things I learned there and apply it to a separate project, which I will describe in just a moment. So this is a reprise of a presentation that Dr. Stephen Magill and I did at GitHub Universe last year, and I thought this was so relevant to the technology leadership community that I asked Stephen if he'd be willing to present this again with me here at DevOps Enterprise.

So in terms of problem statement, back at GitHub Universe, Nat Friedman, the CEO of GitHub, said that 99% of all new software projects use open source in some fashion. And then Erica Brescia, their COO, said, "We are inviting thousands of developers into your code, when you use open source dependencies, into your living room." And Steve and I were in the audience and we were laughing going, "Is that a good thing or a bad thing?" Right? Are they going to trash the place or are they going to help with the kitchen project? And so that's actually what we wanted to understand better, what practices result in good security outcomes in the software supply chain.

So on the next slide, this is just a brief thumbnail of the State of DevOps research. Over six years, we benchmarked across over 36,000 respondents. Really trying to understand what does high performance look like, and what are the architectural practices, technical practices, and cultural norms that result in great performance. And the punchline is that we found these group of IT performance metrics that were dominant. Deployment frequency, deployment lead time, change success rate, and mean time to restore.

And so the goal of this study is really to understand what structures and practices correlate with exemplary outcomes, such as fast time to update, fast time to remediate security vulnerabilities. And the question was, will we find a lot of the practices that we found in the State of DevOps report also applicable in open source projects? And so the amazing opportunity that came up was when a couple of years ago, some friends at Sonatype reached out to me with what I thought was an amazing opportunity, which was to be able to analyze the data in Maven Central, which struck me as an incredible opportunity. So for those of you who don't know, Maven Central is to

Java as what npm is to JavaScript, PyPI is to Python, and Gems is to Ruby. And so it is the second largest package repository in the world. And this was especially interesting to me because my favorite programming language is Clojure, and it leverages all the amazing Java components inside of Maven Central. So it just struck me as something I got tremendous joy out of doing, and it was amazing to work with the team at Sonatype, and it also delighted me that Stephen has a particular fondness for Maven as well.

Dr. Stephen Magill

Yes. Haskell was maybe my first true programming language love- ... but Scala as a statically typed, mostly functional language, is definitely close to my heart, and being able to leverage everything in the Java ecosystem, certainly amazing. So, it's a great set of components, a great data set to dig into from an analysis perspective.

Gene Kim

Absolutely. And so this is what resulted in collaborating with Sonatype on the state of the software supply chain. And so we worked with not just Dr. Stephen Magill, but Bruce Mayhew, an engineering director there, Ghazi Mahmoud, and Brian Fox, who was actually one of the inventors of Maven and Maven Central, which was a treat beyond words. So really the hypotheses that we want to focus in on-

Dr. Stephen Magill

Mm-hmm ... were these. Hypothesis one. Yeah. Sorry. Yeah, so we had a number of things that we wanted to test. We wanted to see if projects that release frequently have better outcomes. This is something that we see in how enterprises operate. Do we see it in open source? Do projects that update dependencies more frequently, are they generally more

secure? Do projects with fewer dependencies stay more up to date? Seems like it should be easier to stay on top of things when you have fewer things to keep up with. And then, are more popular projects better about staying up to date? These are all sort of intuitive outcomes that you would expect given our experience- ... our day-to-day experience developing software, but do they actually hold? When you look at the data, does it back these up? And yeah.

Gene Kim

And so we had the opportunity to investigate this by digging into this Maven Central data set, which, when I pulled these numbers, it was up to over 310,000 Java components. There's of course multiple versions of each component, so when you look at the actual releases, you're over four million different releases, and a whole lot of those are associated with GitHub repos, which just gives us another data source to draw on. We can go in and look at that metadata from GitHub to learn more about those projects. And amazingly, almost 9% of those have known

vulnerabilities. There are vulnerabilities known against those components, and so we dug into that vulnerability data as well.

Dr. Stephen Magill

And in particular, what we focused on here was the dependency structure of the components and how do they treat their dependencies, how often are they upgraded when they're someone else's dependency. And this dependency structure is really complex and rich. When you think aboutAll the libraries that you're pulling in and all the libraries that they're pulling in transitively and so on, it's this huge collection of things that your application code becomes this small, tiny little bit of the system. And so we thought it was really valuable to look at the security posture of

the rest of that system. What do you see in all those dependencies and transitive dependencies, and what should that tell you in terms of how you should treat the dependencies that you bring in and things you should consider there? So yeah, if you're way down here it's a small part of this ecosystem. So, to look at this, we started with all components published in Maven Central. We focused first based on time, so we looked at the last five years, because practices change over time, development

trends change over time, and we wanted what we discovered to be relevant for software development as it's done today. And then we looked at components that are actually part of that software supply chain. So if you look at dependency structure- You might want to advance the slides, Stephen. Yes. There we go. Yeah. Yeah. So if you look at the dependency structures, they are connected into that tree of dependencies. They're used by someone or they're using some open source component.

And then we looked at components that follow Maven Central guidelines for versioning. So, we have to be able to tell when a component is actually upgrading and what are major releases and minor releases. So having correct versioning posture was important for doing the analysis. And then we, like I said, we want to look at the dependency structure and transitive dependencies, and so we required at least all the dependencies of a project to also satisfy all these. So you do all that filtering to get really a good data set that you can analyze in a variety of different ways.

And that leaves us with 13.6% of those components that we started with. And so these were really the components that they're part of the supply chain, they're sort of following best practices, so then we can dig into what differentiates them from a security perspective. And we looked at a number of attributes here. So across this data set, we collected information on popularity, which we measured as average daily central repository downloads. How often are these pulled by users of that library? How frequently do they release? Average number of commits per month, size of team, presence of CI.

Those three are all things we actually get from that connection to GitHub metadata that I mentioned earlier. We also looked at foundation support. Is there an open source foundation that is providing resources to these projects? And then we looked at security data as well as this update lag, so how they treat their dependencies.

Gene Kim

Yeah. So one of the areas of data was those that we could actually grab from the repository that can merge with our data set. And so I'll go through kind of the major ones. So the popularity metrics included, as Stephen mentioned, the number of daily downloads from Maven Central. We also use this amazing data set called Libraries.io. And so we could actually, it's amazing, for each repo, we could actually look up the number of GitHub stars, forks, and pull requests. So that's the sort of a proxy for popularity as well. And then we use an internal Sonatype, got access to the

Sonatype enterprise data in terms of how frequently certain repositories were updating through their Nexus IQ product. We also were able to go into the GitHub repos and look at commit activities. And so, written in my favorite programming language, Clojure, it was so fun to be able to go and get the number of commits per month. And then also look at the measure, the number of developers on those teams, as measured by number of developers who would commit in the given month. I'm sorry, for every month, measure how many developers are committing, and then we can compute what the average

size of the committing developers are. And then, we also look for presence of CI on those GitHub repos. So any trace of Travis configuration file, Jenkins, CircleCI, and so forth. And, that'll be the last time I mention the CI presence. It just stands to reason that repos that had continuous integration would perform better than those who didn't. But that was not the case. There was actually no difference in any metric of those who had CI and those who didn't, which was very surprising to me.

So that's the first of many big surprises in this project, and so that's the last time I'll mention that.

Dr. Stephen Magill

But what did perform well were Gene's Clojure skills. So the efficiency advantages of Clojure are real. That script came together very quickly. So we also gathered project level data, looking at, as I mentioned, the support that a project might have. Is it foundation supported? Does it have those extra resources that can really help make a difference in terms of how much effort they can put into development? We looked at a number of dependencies, which we took as the maximum count of dependencies for any component. We looked at stale dependencies, and I'll say more about what a stale dependency

actually is a bit later. And then this release period, so the frequency metric as a key metric here, how quickly and frequently does a project release?

Gene Kim

All right, last thing in terms of background. So I had mentioned the key performance measures from the State of DevOps report. And so one of the things that we did was just kind of explore what the parallels would be between this work and State of DevOps research. So deployment frequency, we kind of said, all right, this sort of analogous measures are commits per month, releases per month, and commits per dev per month. Deployment lead time, we wanted to look at PR lead times. In other words, how long did it take for a PR to actually get closed? Issue resolution time, deployment success rate. And we kind of map that to breaking changes when you update a

dependency, do actually that break functionality? Mean time to restore maps to mean time to remediate security vulnerabilities, mean time to update when new components become available and the stale dependency profiles. And then organization forms might map to popularity stars and so forth. So we didn't get a chance to go through all of these, but it did kind of give us a mental model of how do we sort of-use and map the state of DevOps research and thinking to this project. Oh, so hypothesis number one was exactly that. So one of the decisive findings in the state of DevOps research

is that those organizations that deploy more frequently because they have better deployment lead times definitely have better performance. So the first question is, will we find the similar findings in this data set as well? And the answer is, Stephen?

Dr. Stephen Magill

The answer is yes. Very much so. So projects that release frequently have better outcomes. And what we mean by better outcomes is they're more popular, so in terms of organizational performance metrics, you're contributing to open source so other people can use it, you can make the world better. So what better measure of impact than popularity, right? So they're five times more popular on average. They also have larger development teams. And we didn't

get into the causal connection between these things. These are correlations that we're talking about here. So we don't know if they're releasing more frequently, so developers see that their code changes get in and get out there, and so that attracts more developers, or the larger team itself is what lets them release more frequently. But those are definitely connected. And then they also, in general, have higher foundation support rates. So, the next thing we looked at was the security side of the equation.

So those are outcomes in terms of usage, popularity, but what about security? And so here, we collected a number of security relevant metrics. And so just to give you an image of what this looks like, here is a sample dependency structure for some made-up components. So we have A, B, and C. C depends on B and A, and there's several version releases of each of these components. And so if you think about a vulnerability coming out in component B, in that red hexagon there in the middle, that vulnerability happens at some

point in time. At some later point in time, B releases a new version that corrects that vulnerability. So we're seeing B version 2.3 fixes that security vulnerability, and then at some point later, C adopts that- I think you mean build ahead, Stephen. Yeah. Thank you. At some point later, C adopts that version of B. Right? And so, there's this period of time where B is vulnerable, and because C uses B, there's some period of time where C is vulnerable.

But really, C isn't able to remediate this vulnerability until B does their job and fixes the security vulnerability. And so what we count as the remediation time is really that period. So when does the fixed version of B become available, and how long does it take C to then apply that fixed version? So we call that the remediation time or the update time. We use the term update time to refer to any update. So if C is adopting this new version of A, that counts as an update, even though A, in this example, doesn't have a security vulnerability.

And so the terms to keep in mind are remediation time and update time, where remediation time refers just to those updates that are security relevant. We also have this concept of a stale dependency. So, it's often the case that some version comes out, but it's never adopted by C. So C continues to use the old version of A, and so we consider this dependency to be stale at that point. So these key metrics that we track are time to remediate, time to update, and stale

dependencies. We look at the median of these, actually, because these data sets are log normal distributed, and so median makes the most sense there. And this is the data that we found.

Gene Kim

Oh, yeah. It's not good. So this is a visualization of what the update behaviors are. And so if you look on the X-axis, that's time. And so the clock begins whenever there is a vulnerability detected, and the Y-axis is at any given point in time, what percentage of the population on average has updated. And so the way to read this appallingly bad profile is this. The median time to remediate vulnerabilities is 180 days. So that's like a half year. The mean is even worse. It's almost a full year. And so how long does it take for

95% of the population to update? It's 3.5 years. And so we actually had to cut the graph off at three years because otherwise you wouldn't even be able to see the curve. And so I'll just take a moment to say that's much worse than I thought. And then, so here's another way to look at it is, let's take a look at the bottom left-hand corner. Who's updating quickly? And what factors cause that? So one of the things that was a problem is that it's actually kind of a sparse data set. And so we actually looked at update behaviors.

And it fits a very similar curve. It takes a little bit less time. The median is 130 days. But you can see that there's a very similar profile between MTTU, mean time to update, versus mean time to remediate security vulnerabilities. And my friend Stephen says, "Oh, yeah, they both have a log-normal distribution." Which gets to the next point.

Dr. Stephen Magill

Yeah. So, right. So there's clearly the population level, the distributions line up here, in terms of how quickly it takes for a certain percentage of the components to update or remediate. If we also look at, for an individual component, their median time to update versus remediate, that matches up very closely. So this is on the X-axis, we have time to update for non-security relevant updates.

On the Y-axis, it's time to update for security relevant updates. And we can see that there is a correlation here. It's not perfect.But there's 0.6 Pearson correlation coefficient here. And so, by and large, projects that update more frequently tend to be more secure and vice versa. So in fact, if you dig in a little bit deeper, 55% of the projects have an MTTR and an MTTU that are within 20% of each other, and only 15% of the projects that have worse than average

MTTU manage to maintain better than average MTTR. So this gets to this question we had of what about that mythical project that they only update when there's a security vulnerability, right? They keep things tight, but they're really paying attention to security. There just are not a lot of those projects. And so, yeah. So most of the projects fit within this cone that's close to performing the same on a security and non-security relevant update basis.

Gene Kim

All right, so that was hypothesis two actually, that updating more frequently, it corresponds with a higher security level, and, that was validated. Yeah, so, sort of all of that data points to this fact that security and update behavior tend to go hand in hand. So that actually mirrors something that Jeremy Long, the inventor of the OWASP Dependency-Check project said is the best way to stay secure is just to update your dependencies. And he said that so many of the-- He had a great example of, it was the OpenFaces project. They had a vulnerability that was actually fixed in 2006, but this first CVE was

disclosed in 2018, and that's when all the Bitcoin miners took over these vulnerable components. So it was actually the publishing of the CVE that actually led to the exploits. And so had they just stayed current, they would not have had any problems. So again, it just points to saying the best way to secure is just integrate updating dependencies into developers' daily work.

Dr. Stephen Magill

That's right. Yeah, that version you're upgrading from might not have a known dependency right now, but that doesn't mean one won't be discovered later. And so if you just update, then you're protected. So hypothesis three was that projects with fewer dependencies will stay more up to date. It's sort of easier to manage a less complicated mess of dependencies. Of course. But surprisingly, this was not the case. And it wasn't just that there was no connection between the two, it was actually the opposite connection of what we expected.

So components with more dependencies actually have a better MTTU on average, and it's statistically significant. So that's weird, right? So we dug into that, and discovered that actually there's another factor at play here. So projects with more dependencies also tend to have larger development teams. And so, this maybe explains part of why having those extra dependencies, it can be managed, right? They can still manage to stay up to date, because they do have more developers contributing here. And so, we graphed it, right? Here's the connection on the x-axis, number of dependencies, y-axis, size of team.

You can see the team growing, and moving to the point where you have to buy an extra pizza. All right. So hypothesis four was that more popular projects- Oh ... will be better about staying up to date. Yeah.

Gene Kim

Oh, so actually to me, just a little commentary, right? So this is-- I got to echo Stephen here, right? This is so surprising, right? That more dependencies mean they're actually better up to date. And so, which way-- What's the causality? Is it that having more developers causes more dependencies, or do the number of dependencies create so much work that you have to get more developers? It's just super interesting, and yeah, so let's just say that's an open question. Very surprising.

Dr. Stephen Magill

That's right. Is everyone bringing their favorite unit testing library, and you end up with five different ways of testing or, what's going on there? Definitely something to dig into in the future. All right, you want to talk about popularity, Gene? Oh, yeah. Well,

Gene Kim

sure. So, hypothesis four is more popular projects will be better about staying up to date. And maybe just to set the context of this is that, whenever you have a problem to solve, you have logs you want to generate, or you want to aggregate data, statistical, use a statistical package. In general, right, you kind of pick the most popular project as measured by the number of stars or forks or something, right? I mean, that's the way I do it. So it sure stands to reason that popularity would be a good proxy for how much you can trust the software component. Makes sense to me.

Dr. Stephen Magill

I agree. It seems intuitive, so let's look at the data. Yeah. Uh-oh. Yeah. And what we find is that actually not all popular projects are exemplary, and release fast and stay secure and up to date. So in red here is those projects that do not have good update hygiene. They're not keeping their dependencies up to date. On the y-axis is popularity on a log scale, so things get much more popular very quickly. And you can see what's in this box here is this really large group of very popular

projects that are lagging behind when it comes to update behavior. Yeah. So refuted, I guess, right, Stephen? Yeah. That's it. Yeah, there are plenty of popular components with poor MTTU. And even more so, popularity does not correlate with MTTU. So, the most popular projects are not statistically different from any others with respect to MTTU. Yeah, so I mean, this is kind of a problem for me because this heuristic I've been using, in fact, I just used it a week ago, right? Of when I go shopping for components to use to solve my problem,

I don't have a lot of great alternatives besides looking for the number of stars and forks. And so let's chalk that up for future work. Yeah. All right. So then, we did some clustering based on this. What can we see if we look at this data set and we try to break it apart into which projects are close to each other, which are farther apart in terms of their update behavior and how similar that is? How does this break down? And we basically seeFive different clusters here. There are two exemplary

clusters that are performing very well in terms of update behavior, and we divided those further into small teams versus large teams. So you have large teams that are really keeping up to date, performing super well, average of nine developers per team for the large ones. And then you have a large group of these really small single dev, two dev projects that are also doing a great job staying up to date. So you don't have to have a large development team to stay up to date with this. Those are very different behaviorally from what we're calling the

laggards. These are the projects that do not have great update hygiene. They're in the bottom 20% in terms of update behavior. And then there's two really interesting classes that are differentiated in terms of sort of quirks about their behavior. One we're calling features first, so they actually release frequently. They have, if you will, a lot of release bandwidth, right? But they're not using it to stay up to date. They're just maybe adding new features or improving performance or something. They're not keeping on top of their dependencies and updating those. And then we have what we're calling the cautious crew, which is they

actually do sort of maintain a cadence of update, but they're like a version or two behind. So they sort of wait. They don't want to be that first mover. They wait for other people to take the step.

Gene Kim

Yeah, my favorite one is the feature first one. It's like they seem to have the technical practices to do high rates of releases, but they don't care about updating the dependencies, so I think that's a very promising cluster because they're probably easiest for them to become an exemplar. The other interesting cluster to me was, in my head, what I call the open source industrial complex. These projects with the large exemplars, large teams with 10 active developers or more constantly committing, and it's probably part of their day job. So what's interesting to me is that the examples are not just the domains of these OSS industrial complexes. Examples can be found both

in large groups and small groups, as Stephen said. Oh, yeah. So Gene, you want to talk about the survey data that we collected as well? Yeah. So this was really fun. So it was actually we saw this, and we wanted to actually put out a survey and with very short notice, just to see if we could test some hypotheses in terms of the behavioral, how did they think about it, and what are the practices that led to these great behaviors? So we were able to push out a survey to that went out to 658 respondents, and what was amazing is that we were able to cluster them into high, medium, and low amounts of pain

associated with their updates. And so when you compare the high pain versus low pain, updating dependencies is one third is-- when you compare high pain to low pain, they're three times more likely to say updating dependencies is painful. Updating vulnerable components, two and a half times more likely to say it's painful. They are 10 times more likely to schedule updating dependencies as a part of their daily work. That's amazing. They strive to use the latest version. So it's 11 times more likely to have

some sort of process to add, remove, or dependencies, I'm sorry, add dependencies, and 10 times more likely to have a process to proactively remove problematic dependencies. And they have 12 times more likely to have automated tools to monitor policy compliance around dependencies. So when you see a difference like that, it just really says there's something really there that says that behaviors are different that leads to these different outcomes.

Dr. Stephen Magill

Yeah, and what we see here, I think really is that the practices that we expect to go together and to support staying up to date and staying secure, we are, at least in this self-reported survey data, seeing those track together, seeing those come together. And we're doing actually a lot of research right now and digging through the data on this year's version of the report. And the one thing that we're focusing on is this side of the equation, this usage side, and what people are discovering there. So,

really excited about all the work that we were able to do together with the Sonatype team that fed into this report. If you're interested in seeing the report, you can go find it on Sonatype website or just email me at that address. I'm happy to just send you a direct link to download it. And as I said, we're working on the next iteration right now and studying that consumer side. So we looked a lot at the open source producers in what we're reporting here. Now let's dig even more into the consumers of that open source software, and in particular, look at their practices and the impact that has on

various outcomes. So in the future, we want to also start looking at breaking changes. Like what is it, really look at the connections between the consumer side and the producer side. So what practices when it comes to open source library authors, what practices of theirs can help the consumers more effectively use those libraries and more effectively stay up to date? And so breaking changes might be one of those. There might be other practices that also really help that whole supply chain work smoothly.

Gene Kim

Perfect. And if you go to slide help we're looking for, we'll post this in the Slack channel. They're specifically looking for help for future research in terms of how to detect breaking changes. Is immutability of APIs, does that help? Get data on pull request lead time and issue resolution time, and find some sort of authoritative source for foundation support. So we'll post that in the Slack channel and, Stephen, take it away.

Dr. Stephen Magill

Yeah, that's right. So quick takeaways. What practices can you adopt based on this? Integrate updating dependencies into your daily work. It's not about picking and choosing, just always be up to date. Contribute updates back to components you use. Don't make decisions based just on popularity. And tell us what hypotheses you're interested in, because like I said, we're continuing to do data analysis here.

Gene Kim

And as Stephen mentioned, we're deep in the midst of analyzing data for year two of this collaboration, and it's super exciting. So stay tuned for that.