Fast Product Development in Digital Banking Without Sacrificing Security

Log in to watch

Las Vegas 2020

Download slides

Fast Product Development in Digital Banking Without Sacrificing Security

Rafael Alvarez

CTO and Co-Founder · Fluid Attacks

Camilo Piedrahita

IT Manager - DevOps & Software Engineering · Bancolombia

In this session, you will learn how Bancolombia, one of the largest banks in Latin America faces the challenge of improving the lead time and security standards of its applications to keep competitive with the new products emerging from the new 100% digital banks and the emerging Fintechs.

Get to know how a 30,000 people bank serving 14 million customers, moved from 6 months lead time in its applications to under a week, through the implementation of DevOps and security practices.

Chapters

Full transcript

The complete talk, organized by section.

Camilo Piedrahita

Hello everyone. I hope you and your families are very well despite the current situation. Today, we want to tell you a little bit about how you can achieve high speed in an increasingly competitive financial system without losing security.

I am Camilo Piedrahita, DevOps and Software Engineering Manager at Bancolombia. I'm currently responsible for defining the stacks and best practices for the software development of the Bancolombia Group. In the past, I led DevOps transformation in large companies such as Suramericana, the leading insurance company in Colombia. I am here with Rafa.

Rafael Alvarez

Camilo, thank you for this great introduction, and hello everyone. My name is Rafa. I'm the CTO and co-founder of Fluid Attacks. I lead one of the largest ethical hacking teams in the Americas, focused on fast exploitation and vulnerability disclosure.

I look forward to sharing with you how we collaborate with Bancolombia in their adoption of DevSecOps and some takeaways you can apply to your business.

Camilo Piedrahita

Bancolombia is a bank with 140 years of history and one of the largest in Latin America. We are present in nine countries, and two years ago, we received an award for the most sustainable bank in the world. We have more than 30,000 employees and more than 3,000 of them under the IT vice presidency. During 2020, our clients highlighted the digital capacity of Bancolombia, and 87% of all transactions have been totally digital. In that way, we serve more than 14 million clients with an agile value proposal featuring the best software development practices.

Rafael Alvarez

And at Fluid Attacks, we have tested over 240 million lines of code. In 50% of the applications we test, we find at least one critical or high-severity vulnerability, which puts the business at risk. We have guaranteed the security of more than 28 million pipelines, compilations, and deployments. Our false positive rate is 2% and our false negative rate is 4%.

Camilo Piedrahita

Competition in Latin America is becoming more demanding. Although we no longer have to worry about other Latin American banks, much less other banks in Colombia, our competition is really being affected by all the fintechs that enter the financial market. Of course, by the proximity of other giants in our sector, such as Apple Pay and PayPal, as well as by the entrance of JP Morgan into the Colombian market.

This is why we have to accelerate our processes. We cannot continue with the old traditional approaches, as implementation of core banking and value deliveries to our mobile application every month. Bank users need an even more digital and straightforward experience. For these reasons, a few years ago, we decided to start the path of DevOps, Agile, and Cloud.

So how has our journey been? Six years ago, we undertook a project called Synthace, in which we created more than 200 work cells, and decided to be 100% agile. However, we realized changing the methodologies wasn't enough. We also needed new processes and new tools.

That's why, after learning about the success stories of Capital One, Walmart, Verizon, Nationwide, and other companies, we chose to embark on a journey with DevOps practices in 2017. As a result, today we have more than 700 applications with continuous delivery processes. This allowed us to make several value deliveries in a single day.

But what was happening with security during this time? We were still very low. Security at the end of the process made deploying to a production environment impossible. We had open vulnerabilities that had to be resolved on the date delivered to production. That's where DevSecOps came in. We wanted to shift left in the security process of Bancolombia. So now we have DevSecOps, one of the big trends worldwide. But what next?

We were definitely not as fast as we wanted to be. You cannot compare the speed to change or deploy a microservice with deploying a complete COBOL program on a mainframe. Architecture matters. It matters a lot.

Therefore, one of our next challenges was to be 100% cloud. So we created a project called All In, with which we've migrated or created more than 100 applications using serverless ways, allowing us to be much more cost-effective, elastic, and resilient.

Our journey is inspired by Nationwide. When we started the DevOps path in 2017, we looked for certain consultants and suppliers to help us define the tools and processes. At the time, we embarked on a referral journey visiting the DevOps Enterprise partners, as well as world leaders such as Capital One, Microsoft, Disney, and other mature DevOps companies. We formed a team of sherpas, consisting of DevOps engineers with experience in other companies.

We arranged the first encounter so that the whole organization could understand that we were going to climb the highest peak possible. In the base camp, we defined tools, changed processes, and created many training sessions called DevOps Dojo. We found early adopters, or an advanced squad, to accelerate the organizational cultural process, and had the first automatic deployment with many hours for those who adopted the best practices.

Then we started to promote the strategy and define a role called mobilizer, responsible for introducing the practices to each team. Each team had a mobilizer, seeking to avoid centralization of knowledge in the DevOps team.

Later, when we arrived at north camp, we decided to unify the role of developer and tester following the words by Tapabrata Pal in 2017: "You build it, you test it, you own it." And we started measuring four key metrics.

Subsequently, with the support of the vice president, or expedition leader, the order was given that the only way to make changes in the production environment was through DevOps. DevOps became a mandatory policy and practice in the organization. Today, we are able to perform over 3,600 changes to production on a monthly basis.

But what are the next challenges? The next peak we wanted to climb is to extend continuous chaos and DevOps for database. But the big challenge is to be 100% cloud and open first in the organization.

Why fast? In Bancolombia, we definitely have medium-performance and low-performance applications. But with much joy, we can say we also already have elite-performance applications. Here, I have examples of different applications that have received the benefit from DevOps in Bancolombia. From our people app, which has had 50% improvement in lead time, to inclusive applications that have more than seven deployments per day with an MTTR of less than an hour and a change failure rate of 1%.

We have a lot to improve, but our process has currently allowed us to deploy new features in a couple of days that have generated transactional growth of up to 30% for those applications. If you look back to three years ago, that was completely impossible. But how do we go from an application with a lead time of months to applications with seven releases per day?

DevOps in mainframe? Hmm. We know that we want to have the same practices as a microservice, but we need to improve and modernize our processes while we kill our monolith. Here, we can automate continuous delivery processes and regression testing. We are also in the process of defining our unit testing strategy.

Also, with our great ally Liquibase, we automated our database deployment process. Previously, we had many DBAs exclusively reviewing scripts and running them on production. We've integrated Liquibase with Git and created an extension of Azure DevOps to execute the rule engine that Liquibase has. Then we invoke the Hammer stack for Liquibase in order to execute the promotion of these changes.

Now, we guarantee homogeneous environments in databases. One year ago, it would take us a whole day to perform the database deployment process, including ticket sign, work queues, guide construction, and other side tasks. Now, we need 20 minutes to carry out the continuous delivery process for our database changes, and we are working hard to extend data strategy in with appropriate continuous testing for database changes.

We've also introduced chaos engineering strategies in Bancolombia. Usually, this kind of strategy is difficult to carry out in highly regulated companies. But how can we get the speed if our business continuity exercises are conducted every six months? Can we really guarantee business continuity?

We're using Chaos Toolkit in our pipelines. We no longer validate our resilience at the end of the semester. We do it directly in the pipeline. Every time we have an output to production, we are able to guarantee our resilience to failures. This is the best way to train ourselves for potential problems we might encounter.

The 100% DevOps strategy is supported by several pillars including continuous testing, because work without quality is to have a fast track for DevOps in production. At Bancolombia, we have largely automated all our tests, always having control in the radar. We decided to include robotics in our ATM tests and integrated this into our pipeline. In that way, we have gone from taking 18 days to run a complete regression to taking only two days.

But how can we guarantee that we really are fast? As Gene Kim, Jez Humble, and Nicole Forsgren said years ago, it's necessary to have four DevOps working metrics as a compass. Of course, having a tool like Hygieia has allowed us to have a real lake house of information for the organization. We can see the success of deployment lead time in more than 2,000 bots. Today, all our leaders have the visualization of their applications, and developers are able to access the relevant information of their components in one single place. Thank you, Capital One, for releasing Hygieia.

Speaking of Capital One, in 2017, they had a session called Better Governance, where they emphasized the importance of giving their counterparties peace of mind through governance and not compliance. They presented the concept of clean room, and at Bancolombia, we've decided to adopt this concept, and we have all gates and controls in the same pipeline.

The policy validates topics such as coverage, successful build, code review, and different strategies that allow us to have a real quality control in our pipelines. In addition, we have different engagements in the pipeline that make automatic validations, such as: were the performance and regression tests successfully executed? Is the technical debt not increasing?

And we've even created an integration with the Hygieia API to validate correct standards of pipelines, repositories, and deployment pipe. We are fewer stages, fewer approvals, more governance. But if we have governance, where is the security?

Automation isn't enough. At Bancolombia, our goal was to take security to the left of the process, as I mentioned before. Developers have to have all the tools available to find the vulnerabilities as soon as possible. We customized our SonarQube profile and used Prisma Cloud to validate security in our containers and CloudFormation for AWS.

Moreover, we integrated JFrog to monitor the Docker images and open source libraries used by our developers. Were we a success? Definitely not. We have tools, but the vulnerabilities didn't diminish. In the last report, we had 3 million vulnerabilities to resolve with tools and over 50,000 with continuous hacking.

When we're going to resolve the vulnerabilities, we were able to scan our code, infrastructure, container, and so on. But what was next? That's when we formed the Fluid Attacks continuous hacking strategy, which allowed us to reduce the gap between open and closed vulnerabilities within our application. But Rafa, can you show the secret sauce?

Rafael Alvarez

Sure. Let's have a look at how we implement DevSecOps at Bancolombia and what you can apply to your company.

Let's now understand the magic. We call it hackers at the center accelerated by AI. To illustrate this, we will use a simple diagram that shows the interaction sequence of the different actors involved in a continuous hacking process for DevSecOps.

The first actor is the development and operations team, responsible for creating or maintaining either an application, a system, or a server. This team, independent of the methodology it follows for the product's architecture, will make commits to a long-term branch in a Git repository. This Git repository is the only prerequisite for any continuous hacking approach and will allow full traceability of the changes made by the team.

Using the code stored in Git as input, continuous hacking, called Drills from now on, will begin performing attacks through two types of techniques: static application security testing and software composition analysis. The first one allows us to review the security of the application, even without having any functional versions deployed, and the second one allows us to determine if the third-party components used are secure.

The fact that hackers carry out this process makes it possible, first, to rule out the existence of false positives reported by the tools. Second, to find vulnerabilities not detected by them, also called false negatives. And third, to relieve the developer of the hard work of discovering and understanding vulnerabilities related to complex attack techniques that evolve on a daily basis.

Then, Drills teams of hackers will report the confirmed vulnerabilities, providing detailed evidence of the attack, be it animated videos, screenshots, structured records, or other media, via an attack surface manager called Integrates. Integrates prevents anyone from removing existing vulnerabilities, controls when each vulnerability is viewed and by whom, and in general, minimizes or prevents zero-day vulnerability management from becoming a spreadsheet- or email-based process.

Integrates also allows an independent hacking team to mark the vulnerability as closed after a technical re-attack, non-documentary, on the target system. Integrates, therefore, will be responsible for adequately notifying stakeholders about new vulnerabilities found, as well as confirming their status, open or closed, after re-attacks by hackers.

Integrates' web interface also allows direct written communication between developers and hackers, which is especially important at the beginning when developers require explanation about the nature of the risk and vulnerabilities. The hackers focus on explaining the problem and never the solution, so they can guarantee their independence for future re-attacks.

The strategic positioning of the attack surface manager as a vulnerability storage vault means that senior management, whether they are CEOs, CTOs, product owners, scrum masters, auditors, or even customers, can always know the security status of each system and how every remediation process is evolving.

Once the peer environment is available corresponding to the code in the Git branch being monitored, more advanced techniques such as dynamic application security testing and interactive application security testing are used. Since Bancolombia has continuous integration, Drills hackers add an agent to the pipeline called Forces, which is able to break the build, avoiding going into production if the software is still vulnerable. This also obliges the team to make explicit risk acceptance decisions that are documented in Integrates.

Due to the formality of this approach, the applications team prefers remediation rather than signing off on this acceptance of the risk associated with a vulnerability.

As you can see, continuous hacking is focused on the hacker. But to be fast and to support teams with many daily production steps, there has to be more to it. Let's look at the behind-the-scenes of Drills.

The first component of our hacking team artillery is called Skims. It is an internal tool that allows us to locate low-criticality peripheral vulnerabilities. Then we have an artificial intelligence engine called Sorts. Sorts learns daily from the vulnerabilities found and tells the hacker which files or areas of the application should be checked first, as they are more similar to files or areas that have had vulnerabilities in the past.

After work, the hacker at the center does his magic and reports confirmed vulnerabilities so that the development team only has to focus on the final remediation. What were the results of Bancolombia with this approach? Let's remember that the bank had 3 million reported vulnerabilities via tools. With attacks under the continuous hacking model, Fluid Attacks reported over 50,000 confirmed vulnerabilities over an 18-month period.

Bancolombia has been able to reach a closing rate of 83%, and the numbers keep improving.

To get an idea of what the attack surface manager looks like, here is a screenshot of Integrates. In it, we can see the evolution over time of the search for vulnerabilities and, better yet, of their closure. The mobile interface is aimed at top management, so it focuses on high-level indicators that allow them to understand the system's current security status.

The first indicator is the rate of effective remediation, which is basically calculated by considering how many of the vulnerabilities found have been effectively remediated. Below, we can see the weekly progress on this indicator. That is, whether the past week has been better in terms of remediation than the week before.

Finally, we can see that this organization, in its entire continuous hacking portfolio, has only two systems under attack. Therefore, management can understand the real context of the sense of the remediation and not have a false sense of security.

With this simple data in the hands of top management, the team's rates of remediation rise to incredible levels as progress indicators become daily, automated, transparent, and understandable matters. In turn, this leads to an increase in security priorities and an improvement in resource allocation, or alternatively, an implicit acceptance of risk by the responsible executive. Camilo, do you want to share something with us?

Camilo Piedrahita

I want to share with you some lessons during the journey.

Architecture matters. You can include a lot of practices in your monolith, but you don't have the same speed. You need to refactor, migrate, and create modern applications.

If you want speed, you need to have lean processes and simple pipelines. You can have governance in them, but if you include the traditional compliance, you are losing DevOps.

You can have an average, but you can't create an insight with centralized knowledge. Tools aren't enough. You need to integrate the security team in the process.

Rafa, do you have any takeaways?

Rafael Alvarez

I think so.

So to conclude, we want to share with you five lessons we have learned after several years implementing this methodology in Bancolombia and for many other clients, and which we consider can maximize the security of the systems.

The first and foremost, it's about putting the hackers at the center of the action. Only they can find what is critical. Only they can connect one vulnerability to another to achieve a high-level attack like those we read about in the news, and only they can minimize false negatives, one of the most structural flaws of the tools that promise speed, but which usually is not discussed.

Hackers also allow us to implement another of the great conclusions and stages: discarding false positives. This security contest, even by the hackers, is an imperative. Therefore, thanks to this process of discarding false positives via real tests, we allow developers to save valuable and expensive time that can be more intelligently invested in remediation.

Since humans are at the center, we must accelerate the process, and for this purpose, artificial intelligence can be used to help them rather than replace them. A precisely trained and validated AI engine helps speed up the hacker's job by allowing them to focus on the application of high-risk areas.

Feedback is one of the three key principles of DevOps. Feedback is given precisely by breaking the build. The red indicator tells the developer that something is wrong and that they must repair it before trying to go into production again. Paradoxically, DevOps, and especially DevSecOps, increasingly violate this principle.

DevOps has become a buzzword for manufacturers giving priority to speed and automation at all costs. We see countless companies investing millions of dollars in low-precision, high-speed tools with high rates of false positives that don't give timely feedback, don't break the build, and simply become out-of-bounds report generators to show you that due diligence is being done.

Therefore, the conclusion is to break the build with manually confirmed vulnerabilities to force remediation and resume the principle of feedback. This should supersede the approach of imprecise speed, something that we consider going in the wrong direction at a fast pace.

Finally, we have the typical conclusion of involving top management, where here our suggestion is concrete: put in their hands, ideally via a mobile application, a single indicator, the organization's remediation rate. With this level of visibility, development teams accelerate their pace. And not only that, management allocates resources to it because it will be an act of negligence to see, at the touch of a button, that the organization has 50% of confirmed open vulnerabilities and not do anything about it.

In Bancolombia's case, a big corporation, the executive VP has constant visibility of the global remediation rate, and that is the key to what makes this process work more than anything else. Now we just need the CEO to install the application, too.

Please remember, due to the formality level of this approach, the application teams prefer remediation rather than signing off on the acceptance of the risk associated with a vulnerability.

Thank you all very much for attending our talk, and please do not hesitate to contact us if you wish to discuss any details, share an experience, or need any clarifying.