54,000 Tales - A Data-driven Discussion of Exemplar Development and DevSecOps

Log in to watch

London 2019

54,000 Tales - A Data-driven Discussion of Exemplar Development and DevSecOps

Dr. Stephen Magill

Principal Scientist · Galois

David Jones

Head of Developer Tools and Services · Credit Suisse

Derek E. Weeks

Vice President · Sonatype

Jayne Groll

CEO · DevOps Institute

In collaboration with Gene Kim and Dr. Stephen Magill, we spent the past 10 months objectively examining and empirically documenting software release patterns and cybersecurity hygiene practices across 54,000 commercial development teams and open source projects. We also surveyed 6,000 development professionals in 2019 with regard to the state of DevSecOps practices and maturity.

Join this rapid-fire panel discussion to hear DevOps practitioners, researchers and visionaries from market-leading companies who are committed to transforming how the best software is created. We will also discuss DevOps, its relationship to cybersecurity and how the best organizations are mitigating threats at scale.

Chapters

Full transcript

The complete talk, organized by section.

Derek Weeks

Hey everyone, welcome. You are in for a treat today. I'm Derek Weeks. I'm vice president at Sonatype. I'm also co-founder of All Day DevOps. Today we're going to be talking about some research that we've been doing over the last 10 months, and walking you through some tales and information about software development practices across open source projects as well as commercial software development teams.

This research and these practices come from the State of the Software Supply Chain Report for 2019. It was released yesterday by IT Revolution, Galois, and Sonatype. My email address is up there because I'm imagining most of you haven't seen the report. If you want a copy of the report, my out-of-office message is on. You just email. You don't have to put a subject line or anything. You'll get a link back that goes to SlideShare. You can download a PDF. You don't have to register or anything for it. If you do want a copy of the report, it's just weeks@sonatype.com. I'll put that up at the end as well.

Everyone in this room that has a software development team has a software supply chain. As part of this year's research, we looked at the different aspects of the supply chain, including 36,000 open source projects that we investigated. We also looked at 3.7 million open source components. We looked at behaviors across 12,000 different commercial software development teams and how they were consuming open source components, as well as containers flowing through their supply chains and tens of thousands of applications. We really tell the story of what's happening within your software supply chains to give you a better feel for that.

On the panel, I'm honored to have Jayne Groll, CEO of DevOps Institute, with us; Stephen Magill, who is Principal Scientist at Galois and also CEO at MuseDev; David Jones, our friend and Director at Credit Suisse. Gene Kim, while he's in the program and spent 10 months with us doing the research and writing the report with us and spending countless hours on it, is running the conference, so can't be with us here today. But he's certainly here in spirit, and the research and data in the report that you'll see, he played a very significant part in.

To give you a feel of what's happening in software supply chains, just to get a sense, I told you you all have one, but you might not realize how active they are. In Java development alone, there are 10 million Java developers around the world. Last year, those developers consumed 146 billion download requests of Java open source components. If you have JavaScript developers on staff, six and a half million JavaScript developers download 11 billion JavaScript packages a week. It's about 50,000 JavaScript packages annually for those developers. Also, in this environment, there are 20,000 new open source component releases every day across the open source ecosystem.

For your enterprises, what this means is the applications that you're building are about 85% to, in some cases, 95% composed of these open source components. You're not writing all these applications from scratch, but building with these parts. The impact is, if you just look at Java development alone, the average organization, when we looked across 12,000 organizations, was consuming 300,000 Java parts annually, or Java components. This comes from over 2,700 open source projects that you are relying on as external software suppliers. Not all of those parts are created equally. Some are old, some are outdated, some have security vulnerabilities. In this case, 8.5% of them did on average.

As we think about these consumption patterns out there and the demand for open source components that are fueling the velocity that we want to maintain, if you want to do anything with managing this environment, you first need to get visibility to it. I want to turn to David and ask him: as a large enterprise, you have 25,000 or so IT professionals in the organization. How do you gain visibility to what's happening with software supply chains that are feeding the bank? If any people in the audience are working in large enterprises, how do you approach even getting visibility to this if you didn't know you had a software supply chain before walking in here?

David Jones

Yeah, sure. It's very challenging scaling up something that works quite easily on a small team and then building it across an estate of users that's tens of thousands of users. What I've found over working across several places, several enterprises, is that you have to do some quite fundamental things before you can even create that observability.

If you have many pathways of software into your organization and you have many places where you can build and deploy from, it makes the challenge more complex in creating that observability. What you need to do is make sure that every path into your supply chain and every path out of your supply chain has a way that you can manage that and a way you can create that transparency. This takes time.

You need to make sure that all your assets, all your binary assets, are in a place logically that you can look at, that you can observe, and you can see what's in there. You need to see how things are built, how things are constructed, as best you can, because in larger estates, it's harder to cover 100%. But as best you can, you want those builds to be consistent across your estate so you can observe things in a consistent way. This gives you an idea of how you can start to make informed decisions.

From there, you can look at how you could introduce controls or measures or whatever it may be. But first of all, it's being able to see what's going on within your environment.

Derek Weeks

Follow-on question. When you got to Credit Suisse, it probably wasn't the first question that you asked, but somewhere along the line, you had to say, "We have a software supply chain. How do I gain visibility to it?" Was that a question that had already been asked before you arrived, or did you have to go and find ways to get to that observability?

David Jones

Yeah, I think it's the same in most enterprises. You don't get asked to build observability. You get asked specific questions. It could be you're asked a risk question: do we have a lot of vulnerabilities in our software? It could be that you're asked architectural questions: are we using many different types of the same solution? Do we have a lot of technical variance? From there, that's when you need to create that observability.

You get asked point questions, and in the process of trying to work out how to answer them, you come to the conclusion that you need to build a view across your estate, across your pipeline, across all your software.

Derek Weeks

Yeah. You also talked about centralizing your build so that you can get visibility on this in terms of what you're actually consuming, what is composing your software.

David Jones

Yeah, I would think it's the same in a lot of large enterprises, where over the years, teams have grown up and built their own automation in their build process. You get a lot of different approaches across the organization. Sometimes that coalesces and you get areas where people are doing similar things, but really what you want to try and do is provide solutions that anyone can use to get that consistent approach, and then try and build in benefits that encourage people to do that.

As I said, again, when you're talking of user groups that are tens of thousands, it's very difficult to get that 100% coverage. But all the time, what you're trying to do is just make that problem smaller and do this in an efficient way that reduces the friction for developers. That's always the aim.

Derek Weeks

It's always kind of the starting point: if you don't know what you have, then trying to manage any part of it is useless because you don't know what's flowing where into your organization.

I'm going to get into this conversation. You have all of these parts, hundreds of thousands or maybe millions of parts flowing through, whether they're containers or open source components, build artifacts and other things, but not all of these are created equal. This is something that we studied within the research across the open source projects. Gene Kim and Stephen Magill did the brunt of this research for open source projects. Within the research, you'll see that they identified five behavioral clusters of exemplary practices, small and large development teams, as well as some that were laggards or focused on, "We're going to develop new features first," over building in or remediating security vulnerabilities. There were some top performers in the research that you did across these 36,000 open source projects.

Some suppliers are good. Some suppliers are not so good. Stephen, you've done this research. How do you go and tell, if you know which parts you're using, how do you begin to tell which things are good and which parts are bad?

Dr. Stephen Magill

Yeah. We took as our primary quality metric a measure of how quickly components would update their dependencies. When you think about your software supply chain, it's not just about the direct open source projects that you're dependent on, but those projects have their own dependencies, and so on, transitively in a pretty deep dependency tree. You end up inheriting any vulnerability anywhere in that tree. It gets inherited by your project. But you're generally only maintaining your direct dependencies, so you're inheriting their approach to managing their own supply chain.

That's the primary metric we looked at: how well do they manage their dependencies? How quickly do they update them when new versions come out? Initially with an eye toward security, looking at security-relevant updates. When a vulnerability is reported in a dependency, how long does it take them to move on to a fixed version of that dependency?

We found a wide variance there, anywhere from the best projects taking a small number of days to update to the worst being three or more years to make that update. We actually found that update behavior in general was closely correlated with update behavior for security vulnerabilities. By and large, most projects stay secure by just staying up to date. They're applying these patches quickly, regardless of whether it's security relevant or not. In the report, we give a list of the top 20% by update hygiene, by how quickly they stay on top of things. You can certainly go look at that.

We also found that while the best projects from an update-hygiene perspective were more popular, popularity by itself was not a good guide to performance according to those metrics. When you're sourcing components, there are reasons to consider popularity: there's documentation, there's community support, and so forth. But you shouldn't make that the only metric that you consider when you're looking at these projects. You want to make sure they have good hygiene practices, good release frequency, and so forth, so that you're getting other benefits by importing them.

Derek Weeks

Part of what we've been talking about in this research that came along, and I think a surprise for all of us, when we talk about an application may have 100 open source components in it, or in JavaScript, 200 or 2,000 components in it, and each of these has dependencies on it, it sounds like, boy, the more components you have, the more dependencies you have, the more complex it gets, the harder it is to maintain. Tell us about what you found on the impact of the number of dependencies on quality and other attributes.

Dr. Stephen Magill

Initially, we went into this research with a number of hypotheses, things that we thought would be true that we wanted to look to the data to see if they're supported or not. One of those hypotheses was that it would be easier to stay up to date if you had a small number of dependencies. Three or four dependencies, that's pretty easy to manage. Actually, we found the opposite was true.

If you look across the population, those projects with more dependencies were actually updating their dependencies more quickly and staying up to date better. That was so surprising and unintuitive that we dug a little bit deeper and found that actually what happens is there's a correlation between the size of the development team and the number of dependencies. The projects with more dependencies also tended to have larger development teams, and that's really where the release velocity, the staying on top of updates, and so forth comes from. If you look at large teams broadly, they do a better job at staying up to date with those metrics.

Derek Weeks

The interesting thing in this report as you go through it is we do state what the hypotheses were, and then we state whether those were actually validated or rejected, and some things, like you mentioned, were obvious or not so obvious.

I want to go back to the simple takeaways. We're talking about open source projects and dependencies in all of this and what's in your software, but you have to recognize that these open source software projects are really the suppliers to your organization. Your organization has chosen not to write this code from scratch, and you're working with 2,700 different suppliers out there. Quality actually has to come into play at some point if you're not going to write the code yourself. How do you select these components, and is it on popularity or other metrics that come into play?

As we move in the report, a big part in chapter three covers the open source projects. We also looked at commercial practices in terms of whether commercial development teams were updating their dependencies. Did they have processes in place to update their dependencies? Were they applying automation or tools that help them manage the compliance of these components that were being used?

The interesting thing that was found in the report was that there were exemplar groups within this survey that we did back in May of 650 developers. Basically this exemplar behavior looked at practices of: we have a process in place; we update dependencies regularly; we have a practice for removing troublesome dependencies. The more frequently we were doing this, the easier it would get. The exemplars would say, "I'm considerably less likely to consider this behavior painful. I'm considerably less likely to consider updating vulnerable components as painful."

I want to get into a discussion with Jayne. As we think about: here are practices, you know you have visibility to components, you know that these things are good or bad, but how do we bring the right skills into the organization to manage our development practices in a way that you just can't say, "Use good components," and all of a sudden tomorrow everyone's organization is doing that? What do you have to think about from the skills perspective?

Jayne Groll

If you think about supply chains and we talk about components, there's very much a human element. Sometimes I think we forget about the human element, particularly when we start to talk about some of the key aspects of DevOps. Underpinning all of this are going to be people. You mentioned, Stephen, the size of the development team. Well, those are human beings. We use the word human, by the way, so that everybody knows because regardless of anything else, we're human.

When we look at kind of man versus machine or human versus machine, we need to make sure that we have to update our humans as much as we have to update the components. There is a cycle to it. It doesn't happen organically. It doesn't happen via YouTube. It doesn't happen necessarily off-hour, because there is a sort of assumption that the humans are going to be able to organically update themselves, so that there are auto updates that happen overnight, the same as they happen with technologies. There has to be intention there.

When you look at the exemplars, particularly when we look at the higher-level exemplars, even the smaller exemplars, I'm certain that when you peel away those layers, you're going to find intentional upskilling. The elements of the supply chain are only going to be as good as the individuals that create the recipes, that understand what needs to be achieved through the software supply chain. Also those that are helping to amplify the transformations because they're the ones that know what components, what needs to be updated.

A lot of it happens automatically, I understand that. That's part of our shift left. But there really does have to be an intentional effort to do that. Some of that's going to be mentoring, some of that's going to be through formal training, some of it's going to be through micro training. We're a really big proponent of micro education, where you learn one thing and then you take it forward, and then you use building blocks and you learn the next thing.

At the end of the day, the message is: don't forget there's a human element. Don't assume the human element is just going to upskill organically, because I don't think that's necessarily where we all want to be. I also think when we start to look at these different categories, you're going to see somewhere in there, and maybe you can share a little bit more about what you found when you were exploring these hypotheses, in terms of what organizations did differently, perhaps, as far as their human element of the supply chain went.

Dr. Stephen Magill

Beyond size of development team, we don't have a lot of insight into how these teams work. I think in future years it would be great to. That data we got from GitHub, by looking at the GitHub projects and the statistics there. There certainly is finer-grain data about commit frequency of individual developers. I think we'd like to look at the code and see how complexity of code relates to some of these outcomes.

In all of this, it's useful to say we got some really cool results this year, but I think we're just scratching the surface of what's there to be had from this data. When you were speaking, another thing that came up is it's not just about sourcing the right components, but you need to be using them correctly as well. There's a sort of good and bad ways to use libraries and to architect things and so forth. That really does become about the development team and their skills.

Jayne Groll

And sourcing the right humans.

Dr. Stephen Magill

Yeah.

Jayne Groll

Right? I think that has to be a consideration there as well.

Derek Weeks

David, how does a big bank like yours think about bringing the right skills to the people? Are you bringing them tools that are just easier to use, and because of that, does upskilling happen organically, or do you have to bring people through training exercises or retraining exercises within the organization? And are the newest developers that you're hiring easier to train than the people that have been at the bank for 20 years?

David Jones

Wow. Lots of questions there. Answer that in 30 seconds.

I think across all large organizations, especially talking about tens of thousands of users, you're not going to know very many of them. You're going to know just a small subset, and quite often the most vocal users as well. These tend to be the people who are most interested in what's going on.

When a lot of people talk about a tools-versus-culture type thing, and this gets discussed quite a lot, I think quite often you need a minimum set of tooling. It's like a prerequisite. There shouldn't be any debate. You need the right tooling to begin the discussion. Beyond that, it's people doing the right thing. It's neither one or the other. You can definitely spend too much time on either and still reach failure. You need a combination of both to be successful.

I don't think there's any difference in people based on years of experience once they know what they're doing. I think individuals are just very different. You get some people who care a lot about this type of topic, and some people not so much. I don't write much code these days. When I was actively developing, I was incredibly lazy, so I probably resolved things just by doing something that made my job easier. I suspect that paradigm is true with a lot of people. If you offer them something that's going to make their lives a little easier and it has the right result, that's probably a good thing to try.

Derek Weeks

Playing on that and going back to that mountain image that was in the report, part of making things easier that we were seeing from some of the data is the more frequently that people were doing this kind of update behavior, and knowing this is part of my daily work and we have processes around it, the easier it was becoming for them. Those people that were saying, "We only update once or twice a year through these practices," were finding it very difficult. If you climb the mountain every day, it's easier. If you climb the mountain once or twice a year, you're huffing and puffing by the time you get to the summit. I think that's just kind of part of the behavior that plays in.

Another topic that I know we all have different opinions and viewpoints on is release speeds. We're about DevOps. We're about velocity. We want to move faster, faster than our competition, release new capabilities to market faster. But it also plays into: can we move faster and stay secure at the same time? This is one of my favorite charts from the report. Quickly, what it shows is those that are on the left side of the chart are releasing fastest. The further right you are on the chart, the slower your release cadence is. The further toward the top of the chart, the more popular you are. The exemplars that we found within these different clusters are the blue dots and the green dots.

What we found is these that were updating fast, that had good release frequencies, good repair frequency, good use of various technologies and practices, were the more popular components. But it didn't mean that the most popular components, as you mentioned before, were necessarily the best. You couldn't choose just on popularity alone within this. As we move fast, what choices do we make? Picking the most popular thing out there doesn't always serve us well.

This is a question that David asked before the panel as we prepped. As things go quickly, does it really matter if software contains security issues or issues at all?

David Jones

Well, there is an argument. If you move quick enough, no one's ever going to find the problems in your software. You just release ahead of people finding issues. If you've got 200 deploys a year, who knew it was in deploy 152?

As with everything, it's a balance. You can definitely get things very secure and not move at all. But you also find that things like risk or architectural management, everything that goes into how you want your software to look or how you want it to perform or the quality or whatever it may be, all these things take effort. If you're able to assert those things quickly and through the entire development life cycle and through the delivery pipeline, then you're able to do these things. You're able to become more secure, and you build better software, and you get the results you want.

As I said before, quite often what we're looking to do, or what everyone should be looking to do, is to accommodate lazy people like me. Get the result that I'm looking for with all the other benefits as well, and look at reducing that risk or getting the best architectural result. Also, some of the things we're looking at in terms of security actually introduce best practice. We want to have more standardization of how our infrastructure looks. We want to make sure that things are built using code or configuration, and that's versioned and immutable so we can see what's happened over time. Doing these things will naturally help you accelerate how you're deploying software, how you're delivering, and how you're building things. You can very much build better software and build it more quickly.

And yeah, you always have to care about what that software looks like.

Derek Weeks

By doing that, are you spending a lot more time on that if you've built these things into the tools? "I don't have time to spend on security because I need to really meet my release deadline," right? Am I spending more time on security?

David Jones

I think as with any problem, you can solve it many ways. You have to be very careful. If you solve a problem an incorrect way, you're going to slow everyone down, and they will hate you. You have to take these problems and figure out how you do them with the least amount of friction, or at the same time enabling people to do the right thing or get something they want back. I think you can move quickly and get a better result. I think it's very hard to go slowly and get better results.

Derek Weeks

I'm going to shift over to Stephen. Stephen, in the report, you and Gene are doing this research on these projects, and you find out that those projects that update their dependencies most frequently are also consequently updating vulnerable dependencies in that practice. If I'm just updating quickly and updating my dependencies, I don't have to worry about security because I'm kind of covering it consequently.

Dr. Stephen Magill

You do have to be careful there. I think of it as updating quickly and releasing frequently and everything gives you this capacity for running experiments and improving various aspects of the software. But you have to be making good use of that.

If you're using those deployment opportunities to say, "Can we push this dependency forward in terms of versioning, or can we fix this security vulnerability and still meet our requirements and everything works?" then that's good. That's the sort of good practice, good use of that release capacity.

But we did see in the dataset, actually, there's this cluster that we identified as features first, that is actually releasing very frequently and moving their software forward but completely neglecting their dependencies and keeping them up to date. Thankfully, it's a small portion of the population, but it does exist, and you want to make sure you're not taking that approach. You're making optimal use of this capacity.

Derek Weeks

The clusters are really interesting to observe. What are they doing, and what are they not doing?

Jayne, years ago when I read Continuous Delivery by Dave Farley and Jez Humble, part of what they talk about in chapter one is build quality in. How do we have to think about equipping the people in our organization with that mantra of not only go fast, but build quality in? The Phoenix Project talks about it as well. Don't pass known defects downstream is part of the First Way of DevOps. Talk about that a little.

Jayne Groll

We're really moving toward the evolution of what we call the T-shaped professional. If you're not familiar with the T-shaped professional, the stem of the T is your deep competency. We want you to keep elevating your deep competency, but we really want you to fill the top of your T with a broad base of other knowledge. More broad-based knowledge like security.

If you're a developer, you have to have some core security capabilities so that when you're writing code, it is a security-first approach. Security is everybody's responsibility. You have to have some testing capabilities so that, again, you can become a test-driven developer, among many, many other types.

For the overachievers in the room, you don't have to be pie-shaped or broom-shaped or comb-shaped or whatever, because everyone's going to try to get deep competencies on everything, and that's almost unachievable because sooner or later you have to leave something behind.

I would encourage everybody, and I think the report really demonstrates, that the concept of T-shaping really has validity in this context because you could have those core security capabilities broad-base so that you could start securing earlier on the whole mantra of shift left. We can't shift left if people can't shift left, if skills don't shift left. That's part of your journey, and upskilling is being able to identify what do you need to put into the top of your T.

Derek Weeks

It's a good point. The report talks about the processes, the practices, the guidelines, the thought patterns, the automation that plays into it. It's not just one-dimensional in that regard.

Again, for those of you that came in late, if you want a copy of the report, you don't have to register. Just email weeks@sonatype.com. My out of office is on. There's a link in there that allows you to download the PDF. If anyone tried it and it didn't work, let me know because there's like 80 people in here that are wondering how to get that.

But yeah, you can have the report. It's really great. Jayne, Stephen, myself, a bunch of others spent a lot of time on it. Once again, I appreciate the panel. David, Stephen, Jayne, thank you very much for this quick exploration, and we're around for questions as well for any of you that have them. Thank you very much.

Panel

Thank you.