Embedding Security: How We Use Automation to Reduce Time and Effort for Cisco Developers to Secure their Products

Log in to watch

US 2021

Embedding Security: How We Use Automation to Reduce Time and Effort for Cisco Developers to Secure their Products

In this session, Chet Burgess, Principal Engineer at Cisco, will chronicle how the Developer Experience team for the Cloud & Compute Business Unit helps to improve security for their containerized applications.

As Cisco increasingly used containers for both SaaS and on-premise software, they needed to adapt the processes and tooling used to secure their products. The Developer Experience team leveraged automation to streamline security checks and remediation within existing DevOps processes, making the software development teams they supported more efficient.

In this session, Chet Burgess, Principal Engineer at Cisco, will discuss:

- The role of a Developer Experience team in making developers more productive

- Why the Software Bill of Materials was a critical foundation for security

- How the team automated security checks from development through to ship/production

- How they complied with internal mandates for OSS compliance

This session is appropriate for all skill levels and will appeal to those looking to spearhead new DevSecOps initiatives within their organization. Attendees will take away proven strategies for improving the software development process with actionable recommendations on how to implement them.

Chapters

Full transcript

The complete talk, organized by section.

Chet Burgess

Hello and welcome. My name is Chet Burgess. I'm a principal engineer in the developer experience team at Cisco Systems, as part of the Cloud and Compute business unit. Today, I'm going to talk a little bit about how the developer experience team helps to embed security and automation to make it easier for our developers to do their job.

First, let me talk a little bit about what the developer experience team is and what it does. Our goal is to enable our engineers to deliver greater product value by focusing on reducing the friction that can be caused by things like tooling, process and procedures, and compliance, which we're going to spend some of this presentation talking about.

We do this by applying an engineering and product-level approach to what we do. By this, I mean we view ourselves as delivering a product to the other engineering teams, and we treat them very much as our customers. We listen to their feedback, and we work with them in partnership to make it easier for them to deliver their products.

Just a couple of the things that the developer experience team does or supports. We run a number of engineering labs for our product development teams. This is everything from the physical racking and maintenance of the servers to doing the networking and supporting the operating systems or the virtualization layer.

We also help support their CI and CD pipelines. This can involve things like running Jenkins and running very large build farms. We also support various developer services. This can be anything and everything from artifact repositories to supporting software or endpoint scanning solutions as part of security. Finally, we develop and release automation continually to help enable all of these activities and, in general, make our developers' lives easier.

The developer experience team is a very small, dedicated team of engineers, and we come from a very wide background. Based on the stuff I just told you that we did, we obviously have everyone from data center specialists to networking specialists, DevOps specialists, compliance people, as well as project managers. Each member of the team tends to have a very wide background and has a lot of experience doing a couple of different things so that we can easily switch between the types of tasks that we're asked to do.

Finally, we support the delivery of various types of product offerings. Some of the product teams we work with ship very traditional-looking software, where it's something that's put up for download and the customer downloads it and installs it in their data centers. We also support teams that deliver software-as-a-service solutions, and we even support teams that deliver hardware products.

Earlier, I talked a little bit about compliance, so let's dig into that a little bit more. Broadly, compliance can mean a lot of different things, but I like to boil it down to three primary obligations when we're talking about compliance.

First: ship only the software you need. This one's pretty simple. Don't include software that you don't need in your product. As an addition to that, make sure that the software you do need is configured appropriately to only enable and do the things you need it to do.

Number two: keep your software up to date. Again, this is fairly simple. If you ship a component, you need to make sure that you have a way of shipping updates to that component so that if there's a bug or a security issue with that component, you can quickly deliver an update for your customer.

Finally, one of the parts of compliance obligations that I want to dig into a little bit more is what's called open source software compliance.

Open source software is something that we're all familiar with in DevOps and across all of IT. Most of us use open source software all the time and every day, oftentimes without even realizing the components that we're using are open source or fully even understanding what open source for that component actually means. Open source software exists within a legal framework that creates obligations for both the developers and the distributors. What this means is every piece of open source software you use comes with its own license, and those licenses have requirements that you have to meet as a developer or a software distributor.

Not all licenses are as open or as free as you might think. There's been a recent trend in both what's called source-available licenses and what I like to call open-ish licenses. An example of these open-ish licenses would be what's been happening between, say, Elasticsearch and AWS of late.

Additionally, licenses can differ based upon how you intend to use the software. What I mean by this is, of course, that some licenses allow you to do something with a SaaS that you couldn't do with, say, a traditional piece of software that is just linking against that, or vice versa. Something you can do when your software just wants to link against an open source component may not be permissible if you're then delivering that as a service.

Finally, I want to call out that the Linux Foundation has a great site set up to really help people begin to understand what open source software compliance is all about. They have sections set aside for developers, distributors, and legal specialists, and it's a very great resource if you haven't checked it out or you have additional questions about open source software compliance.

Cisco faced this problem a few years back where we said to ourselves, "We have all these new container-based products and container-based platforms, so how do we meet these compliance obligations for containers?" We set out to try and answer that question for ourselves.

One of the early things we decided was that the SBOM, or software bill of materials, was going to be key to being able to meet all of those compliance obligations. After all, if we don't know what is inside of the containers we're shipping, how do we know if we're only shipping what we need? How do we know if it's up to date? And how do we know if we're complying with the licenses that those components have?

Additionally, one of the keys to having a good software bill of materials is that it has to be a complete inventory of all components you ship. What do I mean by all components? Components can be broken down into two categories.

There are first-order components. This is what your product directly uses. Many developers or engineers know how to answer this question, either off the top of their head or because they have a configuration file that they store in their source code control system that basically defines the dependencies of their software. In the case of something like Python, this is typically something like a requirements.txt file that has a list of the Python modules that the product you're shipping needs.

What most of these solutions do not include is what we call the second-order, or greater, components. This is the list of components that those first-order components need in order to function. Or, as I like to call it, it's turtles all the way down.

The best example of this is: let's say you want to go install libfoobar into your favorite Debian operating system. What do you do? You SSH in, you run apt-get install libfoobar and hit enter, and then it says, "Oh, I want to install these 12 components." Well, what happened there? You only asked for one component. What of the other 11? Those are the second-order dependencies. Those are the things that libfoobar needs to do its job. An accurate SBOM has to encompass all of these components because all of them could have security issues and have their own unique license obligations.

So what were the compliance challenges that we faced specific to the container-based products when we got started several years ago? First and foremost, there was lack of container support for the existing tooling that we had. Obviously, we've been shipping software and other products for decades, so we knew how to inventory those types of products, and we knew how to do compliance on them, figure out what the licenses were, and keep the software up to date. But a container was a new type of artifact for us. It was a new distribution method, and our existing tooling didn't really have support for it.

Additionally, the container ecosystem in and of itself provided a couple of challenges for us. Oftentimes what happens is someone just posts a container to their favorite registry, like Docker Hub, and then a bunch of people download it, but they don't really know what's inside of that container. So of course, there was unknown content in there.

Also, we find that many of the publicly posted containers out there have a lot of unnecessary software installed on them. A lot of people come from a virtualization background or even a hardware background where they think, "Oh, I need to install my operating system, and then after I've installed my operating system, I need to put in the software components that I need in my application." Containers can work fundamentally differently, and they enable us to dramatically reduce the footprint of the software that we ship inside those containers, thus reducing the surface area for attacks.

Finally, a lot of containers are built, published once, and then the developer moves on. Or they continually release updates to their software, but they keep using the same exact base month after month, year after year, resulting in some of the base components getting out of date.

What were our requirements as we approached this problem? First and foremost, as I said, we decided that the software bill of materials, the SBOM, was going to be key for us. We wanted to have really good SBOM reporting. We wanted a tool to be able to generate an SBOM that contained all the components, the first- and second-level ones, then do vulnerability analysis on that SBOM and be able to do some amount of open source software license analysis on that SBOM.

We additionally needed to be able to scan and catalog all of our releases. We needed to have retention based on the type of artifact we were scanning. If it was a dev build or a pre-release or a beta build, we only needed to keep that for a couple of weeks usually, maybe a few months, versus a full release that we wanted to keep in perpetuity.

Additionally, we needed to make sure that we could support the full life cycle of the artifact. Again, development artifacts don't live very long, but a release may be released and supported for quite some time. So we needed to make sure that as long as that release was available for download and fully supported, we were continually monitoring it for vulnerabilities, ensuring that we knew what the issues were, and being able to report those to customers and release updates.

Finally, we needed our solution to integrate with the existing tooling inside Cisco that was already designed to do that vulnerability management and license compliance.

What was our approach to solving this problem? One of the things we decided early on is that we wanted to leverage existing tooling for SBOM and vulnerability data. There's a huge industry out there for analyzing your software products of different types, including containers, generating software bill of materials, and then doing vulnerability analysis on top of that by pulling data from all the various vendor feeds, NVD, GitHub, and other sources. We wanted to survey the market and find a leading tool that would be able to do this for us that we could then integrate on top of.

Then what we decided to do was write the middleware to integrate with the existing Cisco tooling. It made more sense to us to do that ourselves than to try and find a vendor that could both provide a really great tool for SBOM and vulnerability data and then do custom integration for us.

We obviously looked at a bunch of different products years ago, and we finally decided to choose a suite of products that comes from a company called Anchore. They provide our core container scanning solution. Anchore has a number of offerings that they make available, and we use most of them.

Anchore Enterprise, which is the first service we got started with, is a persistent service that runs and can monitor images that are stored in Docker registries, generate software bill of materials for that, and do continual ongoing vulnerability analysis.

More recently, there are two new tools that have been released, Syft and Grype. Syft is a tool that's designed to generate a software bill of materials from a number of different artifacts. It primarily supports doing that from a local image or a saved image in the form of a TAR file or from a registry. But it can also analyze things like software distributions that are unpacked on disk.

Additionally, there's the Grype tool. Where Syft is about generating software bill of materials, Grype is about taking those software bill of materials and then giving you vulnerability analysis for what's in that content.

Finally, Anchore makes available a number of different integrations for Anchore Enterprise, Syft, and Grype to allow you to easily integrate with things like GitHub, Jenkins, and the other various components that exist within your CI and CD pipelines.

The Anchore solution is able to inventory and monitor a number of different types of packages to generate that SBOM and tell us about CVEs. This includes the standard OS packages, things like Ubuntu, Debian, CentOS, and Alpine. It knows how to interrogate the package managers there and figure out what's installed. But it is also able to analyze the file system of those images and report back on other Python components, Java components, Node and RubyGems that it discovers outside of the package manager and include those in your software bill of materials and report the vulnerabilities on this.

Additionally, Anchore Enterprise supports a form of what's called self-discovery, where you can basically write a hint file that you include in your container and provide an additional list of things that you want Anchore to include in your bill of materials and do vulnerability analysis for. This enables you to package up some of your dependencies in different ways that may not be discoverable or easy for Anchore to figure out. Finally, Anchore Enterprise supports a very rich policy engine that you can write different rules in to enforce different types of container best practices.

Why did we choose Anchore? Several years ago when we surveyed the space, there were obviously several players. There are a lot more now. We put several of them through a series of evaluations, and the reason we chose Anchore is the following.

First and foremost, it had a very easy-to-work-with and very clean REST API. This was critical to us because one of the key things we had decided was that we were going to write the middleware integrations with the existing internal Cisco tooling. We needed a service that could serve up the SBOMs and the vulnerability data to us in a very clean and easy-to-consume way. Our preference, of course, is REST APIs.

Additionally, when we compared the various solutions, Anchore produced a very accurate software bill of materials and had a very good accuracy rate on the vulnerability data and the vulnerabilities it was reporting.

As I previously mentioned, Anchore has a number of integrations that were available out of the box that enabled us to quickly start integrating parts of it into our ecosystem as we evolved. They have a great support system. We've had tremendous success opening support tickets with them.

Finally, they truly wanted to form a partnership with us. They wanted to understand what our needs were and how they could evolve the product moving forward to help us meet our goals. That's something that we value very highly, since that's the approach we try and use internally with our engineering teams and our products.

I'm happy to say it's been a great partnership with Anchore. They've really listened to us. The product has continued to evolve in a great direction. They've added a number of features for us that have added tremendous value to what we do.

What did we do with Anchore then? We ended up creating a couple of different implementations. The first thing we created is a service that we called Helios. This is an in-house RESTful service that we use for cataloging and reporting on releases.

What happens is, when a Git tag gets pushed into the various repositories, that triggers our pre-release or release workflow. Depending upon the format of the tag, we determine: is this a pre-release, or is this a release? Then based upon that workflow, we can take different actions. We take that release and all the images and the containers that make up that image, and we send that into Helios.

What Helios does is pull those images out of the registries, send those images over to Anchore, and wait for Anchore to do analysis of the SBOM. It can then pull that SBOM, and the other thing Anchore does for us is it continually does that vulnerability analysis for us on those images, and Helios gets a continual feed from Anchore of new vulnerabilities.

Helios also allows us to do the state management for the releases. Helios understands things like RC versus released versus EOL and understands how to communicate that to the other Cisco internal tools to indicate what the status of a given release is.

As I said, Helios synchronizes with all these internal Cisco tools for us so that it can accurately reflect our SBOM, as well as help us do the license compliance in our internal systems. Additionally, because it has all that vulnerability data, it helps us with vulnerability management. And of course, it provides real-time reporting for both the software bill of materials and the current vulnerabilities.

The next thing we wrote was a piece of software that we call Minerva. This is an in-house service that is designed to do trending analysis for vulnerabilities only. Obviously, there's a lot of focus on making sure that our software is secure, and there's a huge industry and a lot of people out there trying to find vulnerabilities and exploits all the time.

What this service does is enable us to do real-time and historical reporting on the vulnerabilities we have in various releases and components. It can chart what a release looks like over time as far as how many vulnerabilities it had at release and how many it has a week later, two weeks later, three weeks later. It can chart individual products as well as individual components release over release. This helps us get a feel for how different engineering teams are doing as far as improving their security posture.

Minerva pulls all the software bill of material and all the vulnerability data from Helios, as well as some vulnerability data from various other Cisco internal tools. It can then crunch all that and determine the vulnerabilities that exist in our product and how we want to address them. It integrates with the other Cisco internal tooling for vulnerability reporting, and it automates the engineering ticket creation for remediation of these vulnerabilities.

After Minerva has done its analysis and determined that a vulnerability is a valid vulnerability that would actually impact our software, it can reach out to Jira, create the ticket in the appropriate Jira queue, set all the right attributes to indicate which image and component has the issue, what the CVEs and vulnerabilities present with that are, what fixed versions, if any, are available and need to be installed, and indicate the time to remediation based upon the severity of the issue identified.

Finally, we've started shifting left as we've started to integrate pull request-based scanning. In this case, we're using a combination of stuff provided by Anchore and some integration we wrote ourselves. There's a Jenkins plugin provided by Anchore that enables Jenkins to do direct scanning against an Anchore Enterprise instance. We then wrote an in-house wrapper that will parse those results that are available in the Jenkins build and post that as a comment back to the GitHub PR.

Today, this is a non-voting, informational-only vulnerability report. As a test, I cooked up an image that I knew had an issue in it and sent that through a fictitious PR to get it scanned. What you see here at the bottom is what the actual comment in our GitHub Enterprise instance would look like if any image had a particular issue. Here it tells you that this image had a vulnerability that's classified as high or greater. It tells you the name of the image. In this case, I made up a fictitious OpenJDK for OpenShift and installed a few things in it. You can see what the CVSS score is, as well as the CVE ID and the package that has that vulnerability.

What's next for us? Obviously, we don't think we're done yet. There's obviously room to improve.

On the Helios side, we want to be able to start ingesting software bill of materials that are produced directly out of Syft. As I mentioned earlier, Syft has a number of different scanning modes, one of which is you can ask it to scan a directory on disk. Rather than having to have something packaged as a container, we can start scanning other types of artifacts that are just existing in a file system. We even think this will enable us to start doing limited scanning of full VM-based images. We want to get to the point where Helios can take either an existing Docker image or it can take a pre-generated software bill of materials from Syft.

We also want to enable better SaaS support. Anchore itself has what's called a Kubernetes Runtime Inventory plugin. This is a neat feature where you install a little agent on your kubelet worker nodes, and it continually monitors the images that are present and running as pods in your Kubernetes cluster and reports this back to Anchore. We want to start enabling this for some of our SaaS offerings and having that be able to be reported back through Helios so that we can integrate that with other Cisco tooling.

Finally, we expect that there's going to be a lot of evolution as it relates to software bill of materials. Earlier this year, the Biden administration here in the United States issued the executive order on improving the nation's cybersecurity. It's a fairly comprehensive document about cybersecurity as a whole, but one of the key elements in there talks about the need and future requirements for vendors to provide accurate and up-to-date software bill of materials for all solutions delivered to the federal government.

Additionally, there are a number of competing standards to try and standardize what a software bill of materials should look like. The two that have the most support are SPDX and CycloneDX. We expect that there will be a lot of work done in the next year or two in this space as we figure out, and the industry figures out, exactly what the executive order means and how as an industry we're going to meet that. We expect that there will probably become a standardized software bill of materials, possibly around the SPDX or CycloneDX standards.

What are our next steps for Minerva, which is our vulnerability management system? One of the big things we want to do is start moving more heavily into risk-based assessments. A lot of our assessment today and scoring and severity is based purely on the CVSSv3 scores and the different elements that go into making up that score. We want to broaden that and look at integrating additional tooling that provides a risk-based approach to evaluating that, that goes a little bit deeper than just the CVSS score.

Additionally, we're looking forward to incorporating a bunch of new data that's available to us out of the latest major release of Anchore, Anchore 3, including things like vendor disposition. Vendor disposition is something some vendors, Debian or Red Hat, will look at a CVE and evaluate it and say, "Yes, it's valid, but it's of such low severity or low risk, we are choosing not to fix it." They will flag that as a won't fix. We want to start incorporating the concept of won't fix along with a risk-based assessment into our workflow so that we can more accurately determine: is this a vulnerability that we should take action with and can take action with?

Finally, the PR-based scanning is probably where we have the most work that we want to look to do. We want to move to using Syft predominantly to generate a local software bill of materials that we then send to our Anchore Enterprise instance for vulnerability analysis. By moving to Syft, as I've said earlier, this should allow us to expand to be able to support non-container-based artifacts with this workflow.

We want to make our PR-based scanning a voting gate job once we incorporate some of that risk management. Finally, we want to figure out a way to add a license check or a license compliance check to that PR-based scanning. Today, all of license compliance is done following a set of rules that exist in external tools or other tools internal to Cisco. We want to work on bringing that and shifting that left into the PR scanning process to give us an early view on any potential license issues we might have.

Thank you. I hope this has been somewhat informative, and I hope you've enjoyed this presentation, and I hope you enjoy the rest of the week and the wonderful conference. Have a great day. Cheers.