For Better Security, Stop Wasting Developer Time
The current economic climate has the entire industry asking, how can we save time and money? For any organization focused on software, that question translates to - how can we reduce technical debt and developer waste? In this talk, I will present results from the latest Sonatype State of the Software Supply Chain report, which, simply put, answers this question. Hint - it’s not just the introduction of AI. By exploring open source consumption behavior, we see development practices are still widely inconsistent - ultimately creating more risk, unproductive developers, and loss of time and money. I’ll provide an update on open source usage and best security practices based on a year’s worth of data from Maven Central and hundreds of survey responses - and what we can all learn from stopping unnecessary waste in our development practices.
Chapters
Full transcript
The complete talk, organized by section.
Stephen Magill
Great, thank you.
I'm Stephen Magill. I'm going to be talking about the latest State of the Software Supply Chain report, which was just released yesterday. I've been involved with this report ever since 2019, when, like Gene mentioned, I collaborated together with him and Sonatype on some analysis of Maven Central data. We discovered some interesting things about open source security practices, exemplary open source projects, and what goes into that.
Then the next year, we did some work on a developer survey and discovering what separates high-performing teams from teams that aren't being as productive. Then he started working with a different Steve on an amazing new book that hopefully you all got a copy of last night. I know I can't wait to read it.
Since then, we've been keeping that spirit alive, that research program that Gene started when he started collaborating with us, really trying each year to find new data, ask new questions, and see what we can discover about development practices and what advice we can distill from that for the community.
I'm really excited to be talking today about the themes that emerged from this year's research. One of those themes is the importance of respecting developer time and trying not to waste that, to make the most of that. And the other theme is innovation.
So 2023 was really a year of remarkable innovation across the technology industry, but in particular in open source. And these two are related because you need to be efficient. You need to move quickly if you're going to make the most of the innovation that's happening in the industry, right? We've all seen process get in the way of productivity.
I want to start with the innovation piece. What we have here is a graph of new open source projects and their growth rate over time. This is over the last three years, and you can see there's a dip during the pandemic years, 2021 and 2022, in the growth rate. It was still growing. There were still more projects coming out, new projects, but it was growing less quickly than it had been in the past, which was surprising to me because we all had more time, right? We weren't commuting, we weren't going into the office. Surely some people were taking that time and using it to contribute to open source projects.
But we didn't see that. So one explanation potentially is that this is a reflection of the growing importance of commercial activity in open source. So many open source projects are maintained by corporations, or at least contributed to substantially via corporate activity. We saw fewer new companies starting during the pandemic, so maybe that had an impact. I don't have data on that. That's just a hypothesis. I'm really interested if anyone has alternative explanations. We could explore that.
But whatever the explanation is, that's recovering, right? The pace of innovation has increased over the last year. So we're seeing a higher growth rate in new open source projects than we have been in the past.
Another data point for the innovation that's happening in open source comes from the number of maintained projects and what's happening with this set of maintained projects. These are projects that have active contribution. There's regular contributions to the code. They're responding to issues on their issue tracker. And if you look across open source, there's about 11% of projects that qualify as maintained. This is according to the Open Source Security Foundation Scorecard project. So they have a check. They analyze over a million open source projects each week, and one of the things they check for is, do they qualify as maintained?
So looking at 2022 versus 2023, in the last year, there's been an 18% reduction. So 18% of the projects that were maintained in 2022 no longer qualify as maintained, right? Activity level has decreased to the point where they're not in that set.
At the same time, 12% of the projects that are currently maintained are new. They're new to that designation. They existed before, but maybe it was a small project, had one or two contributors. But in the last year, interest has grown. People have sort of swarmed to that project, and it's getting more activity now.
So this really shows the agility of open source and how developers and maintainers can shift their attention to new projects as technology evolves and as the industry moves forward. I think it's really incredible to see this happening, and it's really happening at a rate greater than I expected. I didn't expect 18% reduction in maintained projects, this sort of large amount of churn. I think that's really interesting.
One area that we see churn happening is in the AI and large language model space, right? No discussion of innovation in 2023 would be complete without talking about AI.
This graph shows the demand for large language model libraries over time. It's over the last year, actually a little less than a year. And this is actual enterprise usage. So these are the number of enterprise apps that are using these libraries as a dependency. We can see at the top, the blue line is the Hugging Face Transformers library. It's had over a 3x increase in usage over the last year. And then the orange line there, that's LangChain. It didn't even exist a year ago, right? And already it's had remarkable growth in usage.
So we can really see open source embracing this new technology, making it available, and then industry picking that up and using that innovation to accelerate business value.
We also see increase in machine learning more broadly. So this is a graph of traditional machine learning libraries, things like scikit-learn, TensorFlow, and you can see they've also had an uptick in usage. So it's really a general AI renaissance that's been happening, driven by the developments in generative AI.
However, attackers are also innovating. So this is a graph of malicious supply chain attacks over time. And this is things like malicious contributions to open source projects, some hacker sort of getting involved in the project and contributing code that they know is vulnerable. It's also things like typosquatting attacks.
We've seen remarkable increases in this. We've been tracking this for the last few years, and every year, the growth rate is just astonishing. In 2023, there was a three-times increase in the number of attacks versus what we saw in 2022.
Put another way, that means that within the last year, we've seen more attacks on the open source software supply chain than we saw in all previous years since we've been tracking. So attackers are innovating. They're using our software repositories, especially npm and PyPI, as laboratories, essentially, and running what amounts to a world-class research program on exploitation and vulnerability. This is a real danger. This is something we want to avoid, right? A bad sort of innovation.
And so that's the first of the challenges that I'd like to talk about. How do we embrace this innovation without exposing ourselves to that risk?
One answer is sort of good old-fashioned dependency management. So this is monitoring your dependencies to see when vulnerabilities are discovered in them, and then updating those dependencies, bumping the version to remediate that security risk. This is really the bread and butter of any open source risk management process.
So we took a hard look this year at what we can do to save time there, to make that more efficient. And there's a lot of time going into this, so there's potentially large savings. If you look at the average Java project, it has 150 dependencies. Those dependencies, on average, release 10 times per year. So that's 1,500 updates to consider per year for an average enterprise Java project. That's an astonishing amount of time.
And when you look at how that time's being spent, one area of inefficiency is just how upgrades are performed in this example. This is a project, an example project. Say it starts at version 1.1 of some library, and then the developer updates to 1.5. That's what that first line is showing.
Meanwhile, 1.9 is the latest version, right? That green dot is sort of the current version of the library. We often see that happen and then very shortly thereafter see another update from 1.5, maybe to 1.8. So they bumped the version again, but they're still not at the latest, right?
So if your goal was to get to 1.8, you could have done that in one shot. Maybe you should have gone ahead and updated to the latest version, right? This idea of breaking these updates into steps and taking sort of baby steps instead of just moving to the best version when you're doing that update work, that's wasted effort. That's time that you don't need to be spending.
And this happens a lot. If you look at the average Maven Central component that's downloaded, it has, on average, 10 superior versions available, right? So people are not necessarily getting the best version when they do that update work. And so there's savings there.
Another thing to look at is, when do you update? You don't need to necessarily update every time a new version releases, right? And I think most people try not to, but we still see a lot of this.
What this graph shows is update upgrade urgency. This is for an example project, and along the bottom are different versions of the library, and they're color-coded by how good that is. So what does good mean? Well, it means it's not vulnerable. It means you're not falling too far behind what the crowd is using, right? You're not languishing too much. It's not a prerelease version or a beta version, right?
And so dark green is best, but green generally is good. And it only starts to get iffy when you get into the red and orange, right? So if you're sitting on a green version, you don't necessarily need to be updating. You can wait until there's something prompting that.
Usually what prompts that update is the release of a vulnerability, right? You learn about some vulnerability that applies to what you're using, and now you need to do some update work.
And so we looked at the impact of noise in those vulnerability notifications, right? We said, what if the tool that you're using to monitor your vulnerabilities has a 25% false positive rate? That might make this graph look like this, right? Where now, some of those versions that are actually green, they're fine to stay on for a while, they're being flagged as red. You need to move off of those. There's a vulnerability to remediate.
That's just clearly wasted work, right? It turns out it's not vulnerable, but you think it is, and so you're doing that update work. What does that cost you?
And so if you look at the two combined, using a vulnerability scanner with a low false positive rate, what does that get you? And then doing the optimal update when you actually go to upgrade those vulnerabilities, if you do those two things combined, you can save 1.5 months of time per dev team per year. So that's six dev weeks per team per year.
Just imagine what you could do with that time, right? So we really can be more efficient at this process. And that's the first sort of stop wasting developer time finding.
The second place developer time and efficiency comes up is in responding to recent regulations. So I gave a whole talk at DOES Amsterdam on the regulations that are coming down the pike. If you're interested, go check that out. It's in the video library that you all have access to with your registration.
But I want to focus on a couple of regulations that aren't coming. They're actually here. These are regulations around software bills of materials. Vendors that supply to the federal government and medical device manufacturers now have to start providing SBOMs.
What does that mean for us? Well, potentially it means more work. But let's see.
First of all, I want to point out that it's not only federal government. It started in the government with those requirements, but it's percolating down to the industry as a whole. We did a survey this year and found that 42% of respondents said they were being asked to provide SBOMs for their software. They were asked for them by their customers. And then 38% were asking their vendors for SBOMs.
So it's happening on both sides of that contractual relationship. We need to start producing these things. And so new requirements, new things we have to produce.
How many of you ever thought, "Oh, boy, now that I have all these extra things to do, I can move so much faster. This is just really going to accelerate my delivery," right? No.
But there's another way to approach this, which is: never let a good crisis go to waste, right? So we have this potential crisis. Oh, we have this extra work to do. How are we going to manage this?
But what if, instead of approaching it as this unavoidable cost, we say, okay, we can deal with this. We can have this not make an impact on our development speed if we invest more in automation, right?
So if we have forward-looking leaders who see the value in that, we can potentially come out of this sort of regulatory environment with more efficient development processes. But we have to hold the line, right?
It's so easy to fall into the trap of saying, you know, we only have to produce an SBOM every month or two, or when we're releasing, and we'll just put it on the sprint, right? We'll have a developer do that every now and then. We'll keep it as a manual process. You do that, it tends to stay as a manual process time and time again. And then we're wasting developer time again, right?
I loved Nicole Forsgren's talk yesterday about developer experience, or DevEx, and the importance of giving developers time to go deep into tasks and giving them interesting, engaging tasks, right?
There's nothing, I think, less engaging than dependency management. That is definitely grunt work. A developer does not want to just spend their time managing versions of dependencies and fixing whatever build errors that produces.
So we need to make this more efficient for them. We need to make it less painful. And if we do that, then we can free up the time that we need to harness all this innovation that's happening in the open source ecosystem.
So with that, thank you.
I want to encourage everyone to go read the report. There's a lot more in there, including comparisons across ecosystems of open source practices. It turns out Java projects are 70% more likely to be doing code review than JavaScript projects. That's interesting. There's a bunch more stats like that in there, including a bunch more about AI and what we're seeing happen in that space.
I'll be at the Sonatype booth after this. If you want to come up and chat, I'm happy to talk more about this and hear your thoughts on what you're seeing.
Thank you.