VendorDome: CTO’s Dilemma: More Code, More Problem?
VendorDome: CTO’s Dilemma: More Code, More Problem?
Chapters
Full transcript
The complete talk, organized by section.
Stephen Magill
And welcome to the VendorDome. We're all really happy to be here. I've participated in a few of these VendorDome events in the past, and it's a really great opportunity to interact with subject matter experts, learn about trends in the industry, and just get your questions answered.
We have a great panel of experts with coverage across the spectrum of concerns that arise in modern software development processes: first-party code and static analysis of the code that you and your developers are writing; third-party code, the open source code that forms the foundation of so many applications; infrastructure as code, the infrastructure descriptions that live in the codebase and define the operational context of your applications; and container scanning, which provides visibility into that end product as it's deployed.
Please put questions in Slack about any of these topics: first-party code, third-party code, infrastructure as code, container scanning, trends in the industry, what you can do in your environment, things you're seeing, and questions about best practices. I have discussion points with me, but it's much better if it's an interactive session driven by all of you.
I'll start with myself. I'm Stephen Magill, former co-founder and CEO of MuseDev, which you might remember from previous DevOps Summits. MuseDev was acquired by Sonatype in March, so I'm now VP of Product Innovation at Sonatype and working on integrating those Muse technologies into Sonatype's products. My background is in static analysis. That's what Muse focuses on, and that's what I did my PhD work in. I'm particularly interested in how we can make static analysis tools less painful and more useful to developers, and make them a smooth, low-friction part of the development process.
Brian Fox
Hi, I'm Brian. As Stephen said, co-founder and CTO at Sonatype. I have a long background in open source development, involved in a lot of projects at Apache, including the Apache Maven Java build system. At Sonatype we've been the stewards of the Maven Central Repository for a long time, so we've had a long view on how open source dependencies have evolved, how usage has evolved and grown, and more recently how software supply chain attacks have been unfolding. That's a topic I've been speaking about for about four years. With the unfortunate rise of SolarWinds and Codecov and everything else that's going on, it's finally getting attention, which is both good and bad: bad how it came about, good that it's getting attention.
Josh Stella
Hi, everyone. Great to be here. My background is a little different. I've spent my career in and out of national security environments in the DC area. At Fugue, we do security for cloud deployments and cloud-based systems, starting with infrastructure as code static analysis all the way through analysis of running environments for misconfiguration errors and so on.
I've spent about 30 years as a software developer and software architect. Prior to founding Fugue, I was a principal solutions architect, I think the third or fourth solutions architect in the public sector practice at AWS, focused on national security, DoD, and IC environments. I've spent the last seven years purely working on this new software-definable infrastructure we have in cloud and the security and correctness concerns around it.
Tracy Walker
Hi, everyone. Tracy Walker with Neuvector. I'm a solution architect with about 25 years in the software industry as a developer, in operations, and in consulting. Here at Neuvector, we're very focused on container security, full lifecycle container security both from a scanning perspective and for running environments in Kubernetes, where we're able to see network traffic, identify threats, and really become a security plane within your Kubernetes deployments.
Q&A
01Where do you start, and what can you trust?
Stephen Magill: With modern supply chains, including tens or hundreds of dependencies, codebases growing in size, infrastructure becoming more complex, and containers pulling in even more third-party software, it can be hard to know where to even look for vulnerabilities. Or if you flip that around, what can you trust? Where do you start? Where do you start when you think about security in your domain, and is there any root of trust that you can link to? Maybe starting with you, Josh.
Josh Stella: I'm not a big believer in trusting without verification. You really need to fully understand what is actually going to get deployed. Our focus is primarily on infrastructure as code and cloud infrastructure. In the data center days, infrastructure was relatively static and you deployed applications to it. In the cloud, the applications drive the infrastructure, often in real time. There's constant mutation of infrastructure environments, and all the big hacks in cloud leverage misconfiguration of those resources.
The press often focuses on public S3 buckets, but that's an oversimplification of most breaches. They're more sophisticated than that, and they often chain different misconfigurations together. In the cloud world there is never a moment where you can really relax. You have to check everything in advance. You have to dereference whatever code you're writing in terms of what it's actually going to do out there. You can't just look at cloud components in isolation; you have to look at them combined with other resources in the cloud. If your containers are deployed into an insecure network configuration, you need to know that context.
Also, most things you're building in cloud are mutable. There are immutable kinds of resources, but most are mutable and there are lots of ways to mutate them. Even once you've deployed, maintenance windows and other paths often allow changes to be made, even if you think it's all going through the CI/CD pipeline. So trust, to me, in cloud infrastructure, is about knowledge: keeping knowledge fresh and current, being aware of what's going on, and not thinking you've gotten it right once and therefore it's still right. Constantly verify and stay on top of things.
Stephen Magill: Tracy, as you look at containers and operation, what do you see in terms of trust and the vulnerability surface there?
Tracy Walker: Echoing what Josh said: trust but verify. When we talk with customers doing a lot of scanning early in development, we've seen a pattern of customers starting to use two scanners, because different approaches to scanning and identifying vulnerabilities are a good thing. Especially when we're talking about trust and sources of truth, you need the ability to verify what you're seeing and do it in an automated fashion.
If we're going to add more security, it makes sense to let the machines help us automate a lot of that security. We can identify network traffic, maybe using the network traffic itself as the source of truth. You need actionable, accurate, complete information that you can take action on, and confidence in the sources of truth you're using, so that if you see something anomalous in a zero trust environment, you can investigate it. The more automation that can make that possible, the better.
Stephen Magill: Brian, when we're talking about trust, your topic earlier was supply chain attacks and whether you can even trust upstream sources of content when it comes to third-party code. Maybe you could comment and expand on that.
Brian Fox: There are two interrelated things. First, you need to understand what components are in your software and where they came from. That's an old message many of us in DevOps and DevSecOps have been carrying for what feels like a decade, and yet so many companies don't do it. Last year's State of the Software Supply Chain study said about 50% had a process, which is up from the teens several years ago, but if you were choosing a car or getting on a plane, would you be satisfied if only 50% of the parts had been vetted by the manufacturer before the assembly line put them in? I don't think so. We shouldn't be happy with that number in software.
Second, many software supply chain attacks, especially as they evolved over the last year, focus on your developers and development infrastructure. Early attacks like Struts focused on consumers of open source as common-mode failure targets. Phase two attacked open source publishers, with malware aimed at stealing credentials to publish to public repositories, not necessarily at minting Bitcoin or stealing end-user data. The third phase is attacks like the Verkada camera incident, where attackers went in through a Jenkins server and then into the rest of the organization. It wasn't about getting into software distributed to the end user. It was about using development infrastructure to directly get into the company.
I hammer that in talks because I still see many legacy application security programs designed to protect the cars, but not the factory. Knowing what's in the car and making the car better is something you should do, but doing that alone doesn't stop an intentional attack on the factory. That's where the game is really happening right now.
Josh Stella: We see the same thing in cloud infrastructure. Some people say, "If we've secured production, we're okay." No, you're not. If I'm a hacker, dev is much more attractive because the liberties granted in dev create many more opportunities, particularly with less monitoring than production. You often see breaches where a test environment was hacked and the consequences were severe. The traditional security mindset is to protect the cars or build a fortress around prod, when dev and test are very attractive targets.
Brian Fox: It's magnified in cloud-native development because development infrastructure probably has the keys to all the prod kingdoms. You may have many boundaries around prod, but if automation deploys it and someone gets into that automation infrastructure, game over.
Josh Stella: Even if you're good about credentials hygiene, which is rare and hard, cloud has attack vectors that didn't really exist in the data center. People use snapshots of disks and databases, effectively cloud backups, to instantiate new databases in other accounts or outside production. Standing up a new database cluster is now a way to steal data from a database. In the data center, a hacker might not even know what backup solution you used, but in cloud those backups may be sitting there in S3. That becomes a very attractive surface.
02How do you close dev and staging gaps without killing innovation?
Stephen Magill: How do you close these gaps in the developer environment and staging without killing innovation? How do you balance that?
Brian Fox: That has been our mission over the last 10 years. We try to let the organization define policy in a way that can be automatically detected and evaluated against components. Developers need the ability to choose components without approval on every single one because there may be thousands, tens of thousands, or hundreds of thousands in an organization. There's no way to deal with that manually.
With recent supply chain attacks, we've had to take different approaches that understand the behavior of a typical project: who's committing to it, what types of dependencies they use, where releases are from, when releases happen. Then we can identify potential behavioral shifts that indicate something is fishy. Those algorithms were instrumental in early detection of what later became dependency confusion attacks. Our system was flagging those since last summer, before the research came out in February, precisely because they had malware-like behaviors.
When a new version appears in a repository, it's not sufficient to wait weeks or months for the community to figure out that it was a bad release or somebody hacked in. By then you're exploited. We're trying to approach this like a credit card company that can detect a suspicious transaction while you're still standing at the register and not allow it to go through.
Stephen Magill: One thing we found last year in supply chain research was that it doesn't have to be a trade-off between productivity and security. There are companies that achieve both day to day. It comes down to process and tooling. If it takes three months to go through a heavy approval process to import a new library, that won't work. But if you have automated processes, you can still be agile while being secure.
Josh Stella: For decades we've had development tools like compilers, interpreters, and debuggers that tell engineers when we're wrong, usually around functional concerns. Security was this other concern in the organization, often handled by non-programmers manually using tools and SIEMs. There was a big gap between those two. If you bring the same kind of feedback loop into the toolset developers are using for security, you can return security information in a developer-friendly way.
A lot of people who aren't developers think developers have a master plan in their heads and then commit it to code. We actually try stuff until it works a lot of the time. If you add security as one of the checks on what you're trying, and into your definition of working, then you can bake this stuff in sooner. I'm optimistic about the future of security because automation can let it get baked in earlier and more fully.
Tracy Walker: In Kubernetes, there are many benefits from microservice container orchestration and how it abstracts away the network, but we've shifted a lot of security responsibility onto developers, asking them to write network security policies and pod security policies as they approach deadlines or the end of a sprint. We need automation so developers don't have to do that manually. Use machines, behavioral analysis of network traffic, and processes that are running to collect sources of truth, generate security policies, and apply them automatically to other environments in the pipeline. That's the kind of automation necessary to create a zero trust environment and limit anomalous behaviors you want to investigate later.
Josh Stella: The hackers are automated. The time between putting a public IP address or DNS record on the internet and that endpoint being examined for vulnerabilities is less than seven minutes. If you're not automated in understanding your vulnerabilities, the first people who know about them are hackers, not you. Security is a knowledge war. It's about being aware of what you have and denying that information to hackers. The last mile is the CVE exploit or misconfiguration breach, but the rest is finding what's there. If you're not using automation to understand your own security posture, you're inviting headlines.
Stephen Magill: That's an interesting thought: when you launch a new product, your first user will be a hacker.
03Is one tool enough?
Stephen Magill: A question asks whether this is even possible to tackle. It seems like an impossible task when you look at the number of dependencies many projects work with. The tools are getting better, but there still seems to be a gap between tools and the problem. Can you be okay with a single approach or tool, or do you need multiple layers and defense in depth? What are you trading off if you don't do that?
Tracy Walker: No single tool is usually adequate because the people you're working against are not using any single tool. They're using anything they can, whether it's an exploit in open source components or tooling you're using, or probing what you have that's publicly facing. It comes back to what you know. It's a knowledge war: what you know versus what they know.
There are many CVEs that don't get patched, and knowing that you have those exploits is knowledge you should have. Probably the most important thing you can do to fight the unknown is make sure you understand the known: the known behaviors of your applications, network connections, and processes, so you can establish a zero trust perimeter. Zero trust doesn't always have to mean blocking everything. Sometimes people need accurate, actionable information. It doesn't necessarily mean you block everything, but you become aware of anomalous activity, because those are the clues you'll follow to identify what's happening in your environment that you didn't want happening.
Josh Stella: The goal of writing software isn't to make secure software; it's to make useful software, with security incorporated so the usefulness manifests safely. Security is often seen as a tax, but doing it well doesn't have nearly as much tax as doing it poorly. Doing it poorly and manually is awful. I had to run applications through NIST 800 certification and accreditation processes, and it was painful security theater. But if you automate and give useful tools, you can go fast, leverage cloud computing, innovate quickly, and compete without being as exposed.
I also hear a lot about risk management: how much risk are we willing to take, and how much effort should we put in? But the blast radius is getting radically larger in these attacks and breaches. The Uber breaches, for example, apparently had credentials in a GitHub repo useful in production for a couple of years, with a large data blast radius. The notion that we can understand the risk and blast radius may be getting obsoleted. In modern cloud infrastructure, you don't have the same segmentation based on TCP/IP networks that existed in the data center.
04Why not just use cloud-native security tools?
Stephen Magill: Cloud environments provide their own tools. AWS has capabilities around securing the infrastructure. To what extent are those tools insufficient? What are the gaps? Why not just use the cloud-native tools that come with your environment?
Josh Stella: There's a reason I left AWS to found Fugue. AWS builds building blocks. If you look at their documentation on how to integrate the dozen or more security services they have, it's non-trivial and requires deep domain expertise. There are also things cloud service providers don't do fully because their business is selling cloud capacity. They want to accelerate utilization.
Many default behaviors in cloud seem nuts from a security perspective. For example, when you build a VPC network in AWS, if you don't specifically define egress rules, the default is egress on all ports to anywhere. From a data exfiltration perspective, that's a bad idea, but it's the default. I love AWS and worked there, but you have to understand their motivation: they want you secure and safe, but primarily they want to sell compute and data services.
At Fugue, we manage millions of resources across clouds. Most organizations at scale may not build single applications that span cloud providers, but they have applications in different cloud providers. You need a way to understand security posture across them, and none of the cloud service providers are motivated to do that for competitors.
Stephen Magill: Tracy, anything come to mind when it comes to behavioral analyses in cloud ecosystems that maybe leave gaps in scanning?
Tracy Walker: Absolutely. It goes back to the information war: what do you know versus what do they know? It's important to know which vulnerabilities you might have in your chain. Scanning for vulnerabilities and looking for CVEs is an IT debt wheel. We're constantly spinning on remediation, updating libraries, and patching. You have to do those. It's like a rearview mirror in a car: if you rented a car and it was missing the mirror, you'd probably get a different car. It's important for safety.
But you have to go another level. Before those CVEs are publicized, they exist in environments all over the world. You have to go beyond the historical remedial approach to IT debt and fixing vulnerabilities, and start looking at application behavior. In zero trust environments, not necessarily blocking everything, but squelching all normal behavior that you know and trust and have seen in development, QA, staging, and production. If you can establish that known behavior, it's easier to identify anomalous behavior: network connections, bash running on containers, container escape. Multiple layers and multiple tools are the only way to establish known versus unknown.
Stephen Magill: That answers the second part of the question about unknown unknowns. If you're not assuming anything about the compromise level of some component you're interacting with, that zero trust approach says it doesn't matter what I don't know about this service; I have certain protections.
Tracy Walker: Exactly. We can't always judge whether it's good or bad from the beginning. But understanding that it's anomalous, whether it's an internal actor or external, means we can reduce the noise enough to find the needles in a smaller haystack.
Josh Stella: In cloud misconfiguration, if you're noticing behavior, it's too late. In networks, watching behavior is exactly what you want to do, and I agree with Tracy there. But for data exfiltration out of object storage, there is no traversal of packets over a customer-viewable network. There are only GET statements in a web log, and object stores are very good at serving content.
For unknown problems in cloud resources, one example is changes to machine identities. In cloud, identity and access management is how you build trust relationships and networks between infrastructure components. If something changes its identity or allowances, that's something to look at right away because it is effectively lateral network movement or privilege escalation. Many people still aren't aware that this identity-based network is a fundamental part of cloud security.
05Embedded systems and software bill of materials
Stephen Magill: We have a question about embedded systems. There you may have more knowledge of the environment and more control, but the supply chains can be involved and complex. With IoT and embedded systems, you're often running much more open source software than you realize. Brian, maybe you have comments on software bill of materials and SCA in embedded systems.
Brian Fox: The embedded space is its own microcosm. It's probably most invisible to downstream consumers. Embedded can mean medical devices or medical tools in hospitals, not just cameras and smart plugs. These devices can have significant impacts, and because they're distributed as hardware, people are used to inspecting physical build quality: how the car doors sound, how it looks, how the panel fits. But you can't see what's in the software, and the software probably matters more if airbags don't deploy because the computer never told them to than whether the door sounds hollow.
In those instances, because software is distributed on hardware, the bill of materials is even more important. It's difficult enough to inspect a commercial package and figure out what's in it and what vulnerabilities it might drop on your network. There's zero visibility into the hardware part. For that reason alone, software bills of materials and those initiatives will become even more profound. The trick is always trust but verify. Probably the only thing you can do is trust in this instance, so that's something we have to grapple with as an industry.
06DAST, cloud services, and changing application boundaries
Stephen Magill: We had another question about where DAST is going in the future, specifically scaling it and speeding up runtimes. One thing that comes to mind is the trend of relying on cloud more for development, testing, security, and scanning as cloud-based services rather than traditionally on-prem offerings. That opens opportunity to optimize on the tool side, scale horizontally, and do things not feasible or cost-effective for an individual company. You see it in mobile app testing across tens or hundreds of devices. I think we'll see the same with DAST and other runtime-oriented security measures.
Josh Stella: The application boundary is changing radically. The more cloud native you get, the more the boundary shifts. Fugue, for example, is mostly built out of functions as a service, mostly Lambda on AWS. I think of cloud as giant distributed computers: the AWS computer, the Microsoft computer, the Google computer. When you're talking about truly distributed architectures, cloud has unleashed a million distributed application architects on the world. Where does infrastructure end and application begin? These become interesting tooling questions when you're trying to knock problems down to manageable chunks.
Because it's all software defined, real cloud is API. There's no human in the loop for real cloud services, which means we can use the knowledge of computer science and software engineering developed over decades to evidence correctness and do good things that we simply couldn't do when humans were shoving things in racks.
07What security automation belongs around Jenkins and build servers?
Stephen Magill: Going back to the dev environment and staging, there's a question specifically about what kind of security automation you would put in place around Jenkins and build servers.
Brian Fox: The obvious part is all the things you would think about in a normal production scenario. What's less obvious is that the Jenkins server is building code, which in some ecosystems means installing dependencies executes scripts. That's an injection point for remote code execution. Running unit tests is also executing that stuff.
The new attacks here are not necessarily trying to go in the front door of Jenkins and leverage misconfigurations in its API, although that happens. Those are traditional production runtime problems. I'm seeing more upstream supply chain attacks intended to get dependencies into the thing Jenkins is building, and then as it installs or runs a unit test, that's where the attack happens. That's the entry point many people don't think about because they think, "It's our code. We have to trust our developers not to do silly things, or we'll never get anything done." But what about the 10,000 other developers you don't know, whose names or motivations you don't know, who represent all the open source software you're pulling?
For the most part, it's not the open source maintainers doing something nefarious. There are very few cases where someone on the project did something nefarious. It's credentials getting stolen, commits introducing intentional bugs that are hard to detect, or a typosquatted component that developers inadvertently pull in. Just that act can execute code on their system, and within the last month we've seen attacks trying to put backdoors on developer machines. If that isn't caught, gets committed, and runs on Jenkins, the same thing happens there. These servers are running code from people you never thought of. That's the eye-opening moment that should make people reevaluate how they're thinking about this.
Tracy Walker: Brian Fincher mentioned the US Air Force Platform One in Slack. That's a great example of what the US Air Force and DoD are doing: pre-hardening, pre-scanning, and pre-vetting not only vendor applications but open source tools, operating systems, and everything used to build containers. They pre-vet those so they have a certificate to field within the DoD and can assure people, "Come and use the tools out of this pot because we've already scanned these and made sure they're good." Many companies may already have infrastructure teams pre-vetting things, but it's a team effort and sometimes you need those early pre-vetting steps before bringing things into your environment.
Brian Fox: You probably need to reevaluate blast radius, like Josh was talking about, and minimize it on developer machines and CI infrastructure. Make sure the part running unit tests can't get to prod. If you're kicking off Terraform scripts from the Jenkins build automatically, and it's the same thing first touching all of these things, that seems like poor separation of concerns. It may mean more sophisticated credential management. It may also mean containers, VMs, or building in the cloud so you can control and quarantine those environments. The mental model is: if someone you didn't know except through email said, "I want to send you some code, and I'm going to have you build it and run it," and you absolutely had to do that, how would you make yourself safe? It probably looks like what security researchers do when investigating viruses. Anything other than that leaves doors unclosed.
Josh Stella: One small note: don't move your production data into dev or test, please. We see that. It's obvious to say, but it still happens and it's still bad.
08Product strengths and closing
Stephen Magill: What Brian was talking about highlights the challenge for tools in this space: making more rigorous security approaches low friction. We had a question to end on: what do each of you see as your product's strengths? Tracy?
Tracy Walker: Our product strength with Neuvector and Nexus Container, our partnership with Sonatype, is behavioral analysis using network traffic and container processes as sources of truth. There are many sources of truth for monitoring applications and doing security, but using actual network traffic, validating at layer seven, and seeing network attacks as they come in is a capability Neuvector was built to give back to developers and operators. They were losing that when they started using orchestrators like Kubernetes, OpenShift, and similar platforms.
That ability to use network traffic as source of truth means we can build security policies automatically based on behavior. We're not using settings or other sources to build those policies. We're using the best source of truth, network traffic, which lets us be accurate with those policies and enforce them, or at least create zero trust bubbles.
Stephen Magill: Josh?
Josh Stella: We're all about securing your cloud environments. We have an exciting partnership with Sonatype and Nexus where we're able to use our technology to scan Terraform templates and infrastructure as code in advance. One unique Fugue advantage is the ability to work throughout the entire software development lifecycle. The same policy you're using through Nexus to scan a Terraform template can be constantly used in production once that template has been used to build things. Also, if you want to understand how to secure things, study what hackers are doing. We do a lot of that. A lot of the attack surface and approaches hackers use in cloud are not obvious and require research, and we bake that into the product.
Stephen Magill: Brian?
Brian Fox: Our strength is that we've always approached and architected our solution and the data we provide as a developer-first problem. Even though legal and security largely funded this for a long time, we knew from early consulting that developers would make or break its success. We've architected a developer-first control plane for bringing these types of data together. The acquisition of Muse brought that further with a developer feedback loop that's native and conversational in pull requests.
With partnerships with Neuvector and Fugue, we're going toward taking wider, disparate sets of data that different parts of the organization care about but need developer compliance to act upon. These are among the first ways we're expanding the platform, but there are nearly infinite opportunities. A lot of niche systems pushed onto developers will not work. Systems designed for lawyers are rejected by developers; the same thing happens with security. We're trying to normalize all this data, prioritize it, and bring it to developers where they are so they can make the right decisions that get everybody safer and happier.
Stephen Magill: Music to my ears, which is why I'm glad to be part of Sonatype now. Muse's focus was always delivering a great experience to developers and taking static analysis, a traditionally painful process that didn't offer a lot of value, and making it more palatable, useful, and transparent so issues get fixed as an ongoing part of development. Providing visibility into impact matters too: are bug rates going down, are issues getting fixed, what is the impact of this rollout on software security and quality?
There's a lot of exciting work combining all these areas of concern, from code you're writing to the containers it's running in, that full spectrum, and bringing it together in a way that works with the development process and lets people continue to have agility. Going back to the question at the beginning: how do you maintain security while not impacting innovation? It's about process and tools, and I think there are great solutions here. Thank you all for joining me. This was a great discussion. We'll keep the conversation going in Slack. We're all on the DevOps Enterprise Summit Slack, so feel free to reach out with further questions. Thanks for spending this time with us.