Our Journey to Shift-left Security with Infrastructure as Code

Log in to watch

Amsterdam 2023

Our Journey to Shift-left Security with Infrastructure as Code

Sr. Cloud and Security Research Engineer · VISMA

As organisations are increasingly hosting theirs services in the cloud, Infrastructure as Code tools(IaC) are highly used in automating the provisioning of the cloud services. Those tools can introduce security weaknesses and risky changes to the cloud platforms which started to become a highly attractive attack surface for the hackers. This presentation is a study about IAC security of 22 Visma projects having theirs cloud infrastructure hosted in GCP, AWS and Azure. The aim of this presentation is to make practitioners aware of vulnerabilities that can appear in their infrastructure when using IaC but also it shows what we have learned in our journey to Shift-Left security for the cloud.

Chapters

Full transcript

The complete talk, organized by section.

Romina Druta

Hello everyone. I'm Romina Druta, senior cloud and security researcher in Visma. If you had asked me one year ago whether I was going to speak publicly, I would have said no, but things happen, and my manager asked me to speak about this project outside my company, so here I am.

I usually forget to tell what I'm doing, so I will not skip it this time. I started as a test engineer testing mobile applications, then became a software developer, and then moved a little bit into operations. I can say I have been in the whole DevOps lifecycle.

I work for Visma. For those who do not know what Visma is or what we do, we are a top-five software company in Europe and number one in cloud ERP in Europe. We have a wide network of distributors and partners, and we operate across Europe and Latin America. What I love in Visma is that we have a large, diversified technology stack. We are not forced to use only one type of tool. We analyze tools and then decide. My presentation is about how we analyzed three tools in order to make the best decision for infrastructure-as-code scanning.

In numbers, we have more than 15,000 employees, of which about 5,000 are developers. We have 1.4 million happy customers. We like to count the payslips and invoices each month to see whether we are going to beat another record. We provide services in accounting, financial management, HRM and payroll, and e-government. In 2022, Visma created 2,056 million euros of revenue.

On cloud software, we focus on hosting our services in the cloud. In 2022, we had 1,700 million euros of cloud revenue. About 90% of our R&D is spent on cloud solutions. We host our solutions on all three public cloud providers: GCP, AWS, and Azure.

This is a helicopter view of our Visma Security Program, which we call VSP. In order to create business value, we need at least these four assets: people, infrastructure, applications, and solutions. These are exposed to threats. Without threats, maybe we would not need security.

To reduce the risk of being attacked, we offer teams, companies, and products a layer of security. For people, we do a lot of security learning. We do phishing simulations for developers and offer secure-code training. For infrastructure, we have other services that I will not go into now. We also have solutions like GitHub, GitLab, and Coverity, which are hosted by us, and these are also tested with programs like bug bounty or manual application vulnerability testing. In the end, we also have applications, which are code-based. If we have code vulnerabilities, we are exposed to threats.

Here is more detail about how we do security for applications. If we consider applications as assets, with threats and risks, we offer programs such as data-protection self-assessment, security self-assessment, static application security testing, and so on. What you see in yellow is what we call shift-left security. At the end, you also see that we have an index. We need the index because we want to see each product's level of security. If you are at a gold level, you are good in security. If you are platinum, you are not so good. We try to encourage teams to reach a good level of security. These results are publicly shared with teams in the company, and we encourage them to share their index results with clients as well.

For those who do not know what shift-left security is, in my words it is the process of preventing security vulnerabilities from arriving in production. In other words, you position the security process starting at plan and design of your application, not after your code is in production. This boosts the security of your environment and also trains developers on how to secure their code and how to write good code.

Here is the well-known DevOps lifecycle. For code and build, we use static application security testing, software composition analysis, dynamic application security testing, and so on on the left side, before we release and deploy, in other words before arriving in production.

Now I am going to talk a little bit about infrastructure as code. We have all these assessments and security programs in Visma for application code, but we saw that a lot of teams use infrastructure as code to provision their services in the cloud. We asked whether we had tools or services to scan this code as well.

We started to see what security risks infrastructure code propagates to the cloud. We saw poor configuration of IAM policies, network misconfigurations like publicly exposed resources, and secrets-management issues. If you have a secret that is deleted and you cannot recover it, that is a problem, so you have to enable configurations to keep your secrets safe enough.

We also saw many publicly exposed resources. You may have heard about incidents with S3 buckets where customer data leaked for companies that had their resources in AWS. Given that we try to automate everything through infrastructure as code, we also want to reduce risk by adopting this shift-left security approach. After we design infrastructure or write the code, we scan the code. If the infrastructure as code fits security best practices, we create the resources in the cloud. Otherwise, we go back to rewrite the infrastructure code.

Our goal was to find a tool that could be adopted by all teams in our company, not only teams hosting services in Azure, but also teams on GCP and AWS. Since all of these have their own infrastructure-as-code file types, we needed a tool that could scan multiple infrastructure-code file types.

We started to search for this tool in Visma. We try to validate new services using academic methods. This was joint research, not only in Visma, and was led by a professor from NTNU University in Trondheim. We partitioned our research into four steps. We analyzed the existing scanning tools on the market and chose three tools that could scan Terraform, CloudFormation, JSON, and other file types. We spoke with representatives of each chosen tool. Then we called for development teams to volunteer so we could test the tools on real code. There are many public repositories you can use to test these tools, but we wanted to see real code and how we in Visma do infrastructure-code scanning security.

After teams provided their code, we scanned it using the three tools. We obtained the results and had to normalize, explore, and visualize them in order to interpret them.

For the tool evaluation, we took multiple metrics into account. We encouraged developers to look at ease of installation and configuration, documentation, reports readability, interoperability, and how the tool integrates with other CI/CD platforms. From our department's view, besides features, we looked at security: whether we could trust the tool enough and whether the data was stored somewhere outside Europe. We also looked at flexibility, whether we could adopt the solution fast, innovation, cost to implement the tool and integrate it with our index application, and license cost.

When we first sent the message to involve teams, we did not receive a big applause. It seemed the teams were afraid to participate in this kind of code scanning because they were afraid their vulnerabilities would be made public. We assured them that we were not going to publish the results. Instead, we would extract vulnerabilities and make best practices from what we found. We wrote blog posts, did live demos, wrote in Slack channels, and so on.

Twenty-two teams volunteered to participate. Besides looking at documentation and reports, they identified false positives together with specialists and answered feedback forms. We split the teams by size, based on the lines of code in their projects. Mostly Azure teams volunteered, but the AWS and GCP projects were larger.

Here are the results of the three tools. We used two commercial tools and one open-source tool. As projects get bigger, with more lines of code, the number of vulnerabilities increases. One commercial tool performed differently depending on the infrastructure-code file type. It performed better than the open-source and the other commercial tool on CloudFormation files. In the 10K and 20K categories, there were more AWS projects written in CloudFormation. In the 20K to 100K range, we had more Azure projects with infrastructure-code file types written in JSON.

For common issues across providers, we found many publicly exposed resources: storage accounts, serverless strategies, Kubernetes APIs, IAM issues, and many encryption problems. Developers rely a lot on the security defaults provided by the cloud provider. For Azure, we had many public storage accounts because by default in Azure the storage accounts were public. In AWS, there were many unencrypted storages because, by default, AWS did not offer encryption, at least last year. We saw that if developers left default configurations in place, that might be prone to vulnerabilities. For new services, such as Kubernetes, teams also had misconfigurations for their services.

We had no hard-coded secrets in the 22 projects. This is because we had many awareness sessions about passwords in code, so it validated that those awareness sessions we hold in the company are good. With all these vulnerabilities, we analyzed them and the vulnerable services, and wrote blog posts to raise security awareness in the company.

The teams preferred a tool with good documentation, a good interface, and good reports. They also preferred a tool that could be used for scanning application code as well. They did not want another tool that scans infrastructure code; they wanted the same tool used for application code to also be used for infrastructure code. For the experience, it was appreciated that they did this research project together with us because they understood their infrastructure better, learned a lot on the security side, and were motivated to adopt infrastructure-as-code scanning.

On our side, we discovered projects with partially provisioned infrastructure code. There were projects where Bash and PowerShell scripts were used to provision infrastructure, and there were no tools covering this kind of script. Infrastructure-code scanning is not applicable if you have drift: if someone else modifies resources in the cloud and this is not reflected in code, you are uncovered. Teams were prioritizing security code and ignoring runtime security.

We thought infrastructure-as-code scanning is good, but it is not enough. We need to think about cloud infrastructure from the beginning, when planning and designing our system. Threat modeling sessions may also be useful there. For runtime, we need to see whether workload protection and data protection are okay.

Here is an image of cloud-native application protection that was mentioned in Gartner some months ago. Besides infrastructure code, we saw that we need to focus on cloud security posture management, which is like dynamic application security testing where we look at cloud configuration. This is especially for drift, if you use infrastructure as code. We also need cloud identity and entitlement management for IAM configurations, cloud workload protection platform for malware in VMs or storages, and data security protection platform.

We are evaluating two tools right now with more teams than in the previous project. For me, at least for data security protection, it was a surprise to see that we have unencrypted data on some sensitive information.

We are trying to use these security issues to improve our security-by-design process. What we find at code level and runtime level, we want to use at design level. We extract the issues, and based on the issues and the resources where we found them, we will create questions that will be asked during threat-modeling sessions.

What we want to achieve is to extend our security program to take cloud infrastructure security into account and define layers of protection such as infrastructure-as-code scanning, cloud workload protection platforms, and other services. Of course, we want to track how well our teams are doing on the infrastructure side from a security point of view.

What we learned: not blaming teams that have security issues encourages trust and openness in the company. Involving teams in choosing the right tool motivates them to adopt new technologies. Discussing findings with teams increases security knowledge. By involving the teams, we train them, they become our ambassadors in the company, other teams join the program, and we have higher security coverage.

In conclusion, cloud security is a shared responsibility. You should not rely only on the cloud provider. Shift-left with infrastructure-as-code scanning is one step in your defense line against threats. By involving teams, you can achieve more in raising security awareness in your company.

If you have questions, you can write me on Slack. This project was part of our Visma Security Program, under the lead of Dr. Daniel Lacruz and Dr. Monica Jovan, and it was joint work with my colleague Nicole. Thank you all.