What does log4j teach us about the software supply chain?
Dr. Stephen Magill was the CEO and co-founder of MuseDev, and is now VP of Product Innovation at Sonatype. He has spent his career developing tools to help developers identify errors, gauge code quality, and detect security issues. Stephen is a world-recognized expert on program analysis and has led multiple large-scale research initiatives including DARPA projects on privacy, security, and code quality. He also served as research lead for the 2020 and 2021 State of the Software Supply Chain reports. Dr. Magill earned his Ph.D. in CS from Carnegie Mellon University, and his BS from the University of Tulsa. He is a member of the University of Tulsa Industry Advisory Board and has served on numerous program committees and funding panels.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
Thank you, Mick. And by the way, Steve Spear and I will be talking more on this topic at the end of day three.
Okay, the next speaker is Dr. Stephen Magill, whose primary research area was static code analysis. He was the founder of MuseDev, which was acquired by Sonatype and is now their VP of Product Innovation.
So Stephen and I, we have a common love for functional programming languages like Haskell and Clojure, and we got to work together on a variety of projects, including the State of the Software Supply Chain Report, where, thanks to Sonatype, we got to explore the dependency update behavior in the Maven ecosystem.
So Maven is to Java as npm is to JavaScript, gems are to Ruby, and so forth.
It was so fun because we got to explore how people use dependencies, how they migrated from one version to another, and what open source projects did when critical vulnerabilities were published against them, and see how quickly they fixed it, and then see how those updates propagated through the dependency chain.
It was such a fascinating project and I learned so much about the software ecosystem that we depend on every day.
So Dr. Magill will be talking about the software supply chain with a very specific focus on Log4j, something that I suspect was unfortunately very relevant to all of us late last year. Here's Stephen.
Stephen Magill
Thank you, Gene. It was super exciting work. And it was exciting because we had access to some really incredible data, right? All this data about Maven Central and the usage of components in the Java ecosystem.
We were able to ask some really interesting questions about how open source projects manage their supply chains. And then it led to some interesting advice and best practices around what you can do to be more secure in your use of open source.
And so, that was all really exciting and involved a lot of cool experiments and analysis. And then at the end of 2021, we had this amazing natural experiment of Log4Shell.
So this was a zero-day vulnerability in the Java ecosystem. It affected the Log4j component, which is a super widely used logging library. And it really provided this amazing stress test of our ability to manage the software supply chain.
And I use the word stress test quite literally because it was a very stressful event. It landed on a Friday before a weekend, very close to the holidays, affected many components across almost every organization, and led to this huge scramble to deal with this and get it patched and get the patches out there quickly. And we'll hear more about what that was actually like on the ground from Paul. Paul's from Morgan Stanley and is going to talk about their experience with Log4j.
But what I'm going to talk about here is sort of the high level: what did we learn from this experiment, and was it consistent with what we had found so far in the software supply chain research?
And so I want to start with the question of, was Log4j, this huge event, like plate tectonics or quantum mechanics, where it changes how we have to think about things, now we have to throw away our previous view of the world and how it works? Or is it more like the CERN Large Hadron Collider, the particle accelerator that's doing these advanced physics experiments, but sort of largely validating the hypotheses that we had about how the world works?
And so what we found is that we're in Large Hadron Collider territory, right? So we've seen confirmation of the sorts of effects, the sorts of trends, the sorts of things that are effective that we saw in the software supply chain research. We've seen those manifest in the community's response to Log4j. And so I'm going to talk about this. I'm going to talk about four different concepts from the reports and how we saw that reflected in the Log4j response.
So the first is this concept of exemplars and laggards, and this has been there from the very beginning, the work that Gene and I did in 2019, where we discovered that there's not just sort of a general approach to supply chain management. There's really a lot of individual approaches. Different teams have different focus areas. Some are very focused on attending to their dependencies and keeping them up to date, others not so much. And so we really see the population break down into these clusters, and there's a cluster of exemplars that update very quickly. And then there's laggards that are sort of not attending to their supply chain at the same level.
And that's displayed on this graph here. What this is just showing is what percentage of the population updates their dependencies within a certain period of time. So within 20 days of a new version being released or within 30 days, right? And so over at the lower left is those exemplars, they're updating in tens of days when new versions come out. And then you can see the laggards at the top that are taking months or even years to update their dependencies in some cases.
And these are large groups. Each one is 20% to 30% of the population.
We see the same thing in the Log4j response. And so, I'm going to come back to this graph several times, so let me explain it real quick. What you're seeing here is a record of downloads of various versions of Log4j from Maven Central.
So Sonatype hosts Maven Central, and because of that we have visibility into what people are downloading, what components are in demand, which ones aren't, which ones are being pulled down, and how that shifts over time. And so we can use that to analyze what is happening out there in the community in terms of shifting away from certain versions and over to other versions.
It's a bit like monitoring COVID by looking at wastewater treatment plants, right? Minus the sewage. I like Maven Central. It's great.
So what we can see here is very quickly, there's a contingent of projects that upgraded. So the red versions here are the vulnerable versions. And so you can see within one to two days, so the very first column here is the first day after the disclosure of the vulnerability, 40 to 50% of projects had moved on, right? They had updated and adopted secure versions of Log4j.
But then it sort of plateaued, right? So then there was this much longer timeframe where the laggards are sort of slowly updating, and then you sort of see slow progress and then another plateau. So, it's sort of a very real-world manifestation of what we had seen about having exemplars and sort of laggard cohorts.
Concept two is all about staying secure by staying up to date. And so the idea here is that, yeah, you can sort of just pay attention to big events like Log4j, right? And update then when you hear that you have to because there's this sort of critical patch that you need. Or you can just stay up to date as a matter of practice, right?
And what we find is that those exemplars, those teams that are best at updating and staying secure, they do so by adopting a culture of just keeping dependencies up to date. So, staying up to date leads to the security.
And so we see that again here in the Log4j adoption. So another thing to know about the Log4j vulnerability is, because it was a zero-day, the security community had not seen it until it was publicly disclosed.
We got to see the vulnerability research process play out in real time. And so, the security community, immediately the Log4j maintainers sort of got a patch together to fix the initial vulnerability that was disclosed.
But then as the community took a deeper look, they discovered, oh, there's some additional issues that need to be fixed as well. In this case, under these conditions, with this configuration, under these sorts of deployments, right, there's additional issues. And so there were a series of patches. There were actually five patches that came out in less than a month following that initial disclosure.
And we were really interested to see, are people going to adopt these subsequent patches? Because the first one was the highest criticality one, right? That affected everyone. And then the subsequent ones were sort of more niche, right? You needed certain conditions for them to apply to you.
And so, would people do this calculus of, am I actually vulnerable or not, right? And what we found is that they didn't, right? They just adopted these new versions. And so you can see each of these colors, so the non-red colors are various patch levels of Log4j, and you can see those get pretty consistently adopted, right? There's certainly some people who only adopt the first, but it's down there, sort of very low, right? Most people have moved on at this point to the latest, which is 2.17.1.
And we've seen that in the software supply chain work as well. So this is a graph from last year's report showing Spring Framework and showing how people migrate to various versions of Spring Framework. And there's a lot of stuff in here. You can go back and re-watch and pause if you want to read sort of all the call-outs. The main things here are in orange, which is the thing to know about this graph is time is marching downward, so each row is a week. So it's a week of updates and it's someone decided to update their project, their version of Spring Framework that week, then there's a little square here showing what version they moved to.
And the new versions that come out, those are on the right of the graph. So, there's a new column added each time a new version is released. And so that right part of the graph that's circled there is really, that's the cutting edge, right? That's the new versions that are coming out.
And what you can see, the fact that those squares are all very dark indicates that there's a lot of activity there. There's a lot of people moving to those versions. And so we see that's essentially this cohort that is staying up to date, sort of at or very close to the edge of what's current.
And so that means that you get security. You get a certain amount of security from that. Log4j was a bit odd in that it was a zero-day. It was not known by the security community before it was announced.
Usually, vulnerabilities get discovered by, say, the white hat security research community. They work with the maintainers, not in public, sort of privately work with the maintainers to get a patch out there, and then the vulnerability is disclosed later. And so in those cases, if you are at the front there, then you will already be secure when that vulnerability is disclosed.
And so there's essentially this vulnerability buffer, right? You can think about, there's the people who are current with respect to their dependencies, and then there's these old versions that have known vulnerabilities that you sort of don't want to be on. And what's not depicted here is that, but you can sort of imagine over time, this wall of red, so the red is vulnerable versions, it sort of marches to the right, right? Because new things get discovered in prior versions, those eventually get patched, new versions come out.
And so, if you're close to where the red is, you're much more likely to need to react to a disclosure, right? And so keeping this buffer helps you be proactive versus reactive.
Concept three is that transitive dependencies matter. And so, this is one thing that's difficult about, again, something like Log4j is if you're using Log4j, you can update your version of Log4j, no big deal, right? But if you're using a package that itself uses Log4j, you're kind of dependent on that project to adopt the update.
And so you kind of have to wait for them to update their dependencies so you can update your version of that package so that you can be secure.
What we saw with Log4j was the community really worked quickly to update all the various uses of it. So these are commits to GitHub that mention the CVE, the Log4j CVE. You can see a bunch of them happening on December 9th, which is when it was first announced.
And we also found in last year's research that the community in general is getting much better at remediating these issues and at sort of keeping up to date with respect to their dependencies. So what you see here is a by year graph, sort of a histogram of how quickly do various projects update their dependencies. And so to the left is better, and you can see sort of each curve, as the years march on, is getting a little bit higher and a little bit farther to the left.
And what that shows is, first of all, the amount of open source is increasing because the height of the curve here is sort of proportional to the number of projects doing releases. But also the update speed is improving. So it's moving to the left. It's taking less time for these projects to push updates when their dependencies get updated. So that's great. That means the capacity is there to deal with vulnerabilities and transitive dependencies.
Concept four is that some dependencies just never get upgraded. And I want to ask a series of questions and think about this. So how long did it take to get to 90% remediation? And I have the answer here already. Infinity, or well, it's not done yet. Maybe we'll get there, but the clock is still ticking. All right.
And this is for Log4j. So what about 80%? Same story. What about 70%? Okay, finally we get an answer. To get to 70% of downloads being non-vulnerable versions took 52 days.
And so you can see that here on the graph. It happened, there was a spike on January 31st where most of the downloads, 70 plus percent of downloads were non-vulnerable. There was some slipping after that. This can change based on who's automating what and various changes in CI processes. But it sort of reaches this steady state of 35% of downloads being vulnerable. And that hasn't been changing for a couple of months now.
So you can imagine these projects probably are just going to stay vulnerable.
And actually we've seen that in the supply chain research as well, more broadly. So when we looked at this pattern, when we showed these projects, this cohort, staying up to date, doing really great, making good choices when they update their dependencies, that was for the dependencies they were updating.
There are actually 75% of dependencies that were never upgraded. And so that's sort of disappointing. That's something to be aware of. Make sure that you don't have dependencies that are languishing like this. Make sure everything is getting attention when it comes to this practice of keeping things up to date.
The takeaways in terms of what to learn from this, what do you apply at your organization? The first is the stay secure by staying up to date principle. That goes a long way to keeping you secure is just having this practice, this culture of keeping dependencies up to date.
And as open source update performance gets better, so that graph we saw of open source getting better and better at dealing with this transitive dependency problem, that becomes more and more effective because if you're keeping your dependencies up to date and they're keeping theirs up to date, then you're protected in your direct and your transitive dependencies.
And then make sure you're updating all your dependencies. Make sure you're not one of these teams that has a majority of their dependencies that just never get love. Never get updated. Do all these, and you too can be an exemplar. So that's where you want to be. Sort of modeling what it means to have really good software supply chain practices.
And then I have some additional guidance when it comes to zero-days. So all of those things I just mentioned, those will keep you generally healthy, work for the majority of sort of responsibly disclosed vulnerabilities.
Zero-days have sort of an extra thing to consider, which is when they're announced, you have to be reactive. I showed that graph of if you have this vulnerability buffer, you can sort of be proactive and plan your work in terms of updating. Doesn't happen with something like Log4j. It's an immediate fire drill.
And so how do you manage that fire drill? What helps with getting a good outcome?
The first is inventory. If you have a full software bill of materials for all your applications, that helps you answer the first question that comes up in this remediation step is you need to find out where am I using this? What application? First of all, are we even using Log4j? I think there are a lot of organizations where that's the first question. What logging libraries do we use? And then if we do use it, where, what applications, what teams need to be aware of this?
And then if you can have the ability to centrally monitor consumption, so you can use something like an artifact repository, a cache, a proxy to pull in your dependencies so that you have awareness of what's coming into your organization. That gives you a lot of visibility about where you are, and about how the remediation is progressing.
And then to actually do the remediation, continuous monitoring and remediation guidance, the more that that sort of software supply chain guidance is integrated into development workflows, the more it can happen, just as a matter of practice and just automatically. You can let the development team know, oh, go check your report, and follow the remediation guidance. You don't have to get involved individually with producing that guidance for each team.
And then a big part of pushing those out into production is having mature DevOps practices using CI/CD and having that ability to fix an application and then deploy it very quickly to remediate that vulnerability.
And in the reports, again, this was last year, and then we also did the year before, we did a survey of practices to see where are organizations with respect to these things. And by and large, these were areas of maturity that organizations reported that they felt good about.
Certainly there's a range, but they were clearly areas that were being prioritized. So I think the industry is moving in the right direction in terms of appreciating the importance of these things. It's just a matter of making sure that they are rolled out consistently, that they are getting the attention that they need.
So if you can get all these practices in place, then the next time the next Log4j comes out, it can be less of a fire drill, maybe it's still a stress test for the software supply chain in general, but less of a stressful situation for you and your employees.
Thank you.