Shift Left Security in Ludicrous Mode
VM (Virgin Media) Digital and security teams worked with Accenture as a blended squad and delivered a highly successful and effective DevSecOps project in Summer 2021.
This was a marquee project sponsored by the CDO to address how to create a left shifted security approach to address long wait times for security approvals where impeding unlocking value into production. The dedicated project team moved the idea from conception to go live in 9 weeks. To power the change the team were the first application of the new container platform (Anthos) VM’s new Google Cloud Platform.
Since the launch date, we’ve had zero incidents (security or otherwise) to date; deployments are highly performant and provide scalable customer-facing systems. This approach has brought a paradigm shift on how VM looks at security in releasing to production and a major mind shift in the culture.
Chapters
Full transcript
The complete talk, organized by section.
Fortune Barnard
Virgin Media O2 is a telecommunications provider in the UK. We provide fixed-line, mobile, internet, and TV or cable services to households and businesses in the UK.
Virgin Media O2 is a joint venture between Liberty Global and Telefonica. Virgin Media, which is a cable company in the UK, merged with O2, which is a mobile company, and that merger created the second-biggest telco in the UK. At the time of the merger, it was valued at 31 billion sterling.
This journey started when it was announced in May last year that Liberty Global and Telefonica had entered into a discussion and agreement to merge. It was very important for us to send a very strong message to the market straight after the merger. So we started this transformation, or DevSecOps journey, seven to eight weeks before the merger. The message was very clear: sending a strong message to the market. The project was a seven-week project, not only to merge organizations, but also to prepare for a different way of developing software.
We had a few challenges, just like many companies. Security in the past used to be very difficult, one of the biggest headaches to releasing software. We had environments where you could spring a cloud within minutes, but it could take you up to three weeks to get that cloud into a production environment. So we were not different from any other companies.
Some of the headaches, some of the challenges we had, were the siloed and manual way that we were doing security, and excessive governance processes. And don't get me wrong: it's very important to do those processes and governance, and those central teams were key to helping us reduce risk of data breach. But they were slowing down the business, and we needed to look at different ways.
The security team had limited bandwidth. The business was moving at 100 miles per hour. People don't scale, so we needed to find a different way. There was also the need to digitalize. As a telco, we needed to become more a tech company to be able to meet the needs of our customers. Finally, we needed to move away from the monolithic way we were running our applications to microservices, and all this required a change in ways of working and turning security, moving from police officers to doctors and nurses.
To be able to address the new ways of working and the shift, we had to adopt a top-down approach. That meant sponsorship from my chief digital officer and myself to ensure that security could move faster without compromise on security.
Empowering others. Shifting left security, but also continuing shifting right. Adopt security by design. The only way to adopt security by design for us meant security as code. Privacy by design: ensuring that we do threat modeling at the beginning to understand what we're trying to do, what could go wrong, and how we could solve it. Built-in self-services for the developers and the software engineers.
What does that mean? Making sure that they can help themselves throughout the environment they are building, scan the application, and make sure that those vulnerabilities are addressed before it goes into production. It gives us also one single visibility of the risk. We could all understand what are the vulnerabilities to those applications, and whether we are comfortable with the security posture of that application before it goes into production.
We had to shift everything as code, which is a model that we use today that still helps us make security no longer blockers, no longer police officers, but more guardrails, more doctors and nurses. That helped us also to change the narrative. We now talk about security as value creation. We now talk about security as growth driver. That's what helps us today.
Karel Kohout
Thank you, Fortune. My name is Karel Kohout, and I lead our application security practice in Europe. I'd like to talk to you about how we think about application security all around, how we look at this end to end, like the example of the Hummer cars in the picture.
We don't only look at securing the code, the application code, through testing. We look at the whole ecosystem from that application, through the libraries that are linked and connected, and making sure that those are secure, up to the environment and the infrastructure the application is deployed to.
Also, what we look at is making sure that the pipeline, the whole automation around, is secure and providing that paved road that Fortune mentioned to the developers. And making sure that we don't focus only on one pipeline for the main programming language, but things like infrastructure as code have security as well and are protected.
What's more, we find that the majority of our clients have multiple pipelines, and we can secure all of them, even legacy and ad hoc scanning, and provide that overarching layer on top of that through our internal assets and our intelligent application security, and provide a single pane of glass and single point of view.
We also look at the data, because the application is a gateway to the data, and securing the data and making sure that when the data is managed, handled, and accessed, it is done securely together with the application. As well as, as I've mentioned already, the infrastructure. Whether the application is deployed to a physical server or to a container, we want to make sure that that environment is secure as well, and there is that layer and wrapper around the application that protects the application as well.
What's also very important: all of this helps to find security defects, vulnerabilities, and risks. However, addressing those risks, either through mitigation, meaning doing something around the application to protect it, or through actually addressing the code, is very important. We actually work with our clients on auto-remediation, where we can help and where we can suggest the remediated fixed code as code that can be copied, or actually as a new branch that can just be checked in.
What is really important, and this is a nice segue, is the training and the culture. Because DevSecOps is really a journey and it is a cultural change. It is very important to make sure, as Fortune has said, that it is as easy as possible for the developers, so they adopt it. Meaning that they can trust the technology, they can trust that it will help them deliver the application securely, and it will create that paved road where they don't need to become security experts, but they can actually trust the system, trust the setup, and know that it will help them deliver fast and secure code to production, and it won't stop them. Again, as Fortune has said, it is an example.
Pulak Agrawal
Hello. Thank you, Fortune, and thank you, Karel. So you have by now heard what the original problem was, what the challenges were, and the commissioning of this new idea, this new project, which Virgin Media needed to do, not really wanted to do. They had to. And then we went on the journey of selecting the right platform.
Now comes the fun part, because this is strategy. Now I'm heading into implementation, the challenges faced, the achievements, the outcomes in the next few slides. My name is Pulak Agrawal. I'm the DevSecOps lead for Accenture UK, and I was a very integral part of delivering this project, so I can speak to exactly what we did.
The first challenge or the question was workload selection. If you see the car theme we have in this, yes, we had a great road in Google Cloud Platform. We had a great car in the Anthos Kubernetes solution, GKE. But we needed to select the right workload.
Remember, IT, in my mind, exists as a supporting function to most businesses, and so it's about business at the end of the day. So we went on the journey of selecting the right workload, which was a content management system. The parameters, if you see, were easy to implement, small, high business value, and building from scratch a new infrastructure, which was in a way clean and in a way risky because it was new. We had multiple third parties we needed to bring on board.
Obviously, we had different tools for SAST, DAST, SCA, container security, RASP, and threat modeling, all of those integrated in the single pane of glass, which was one of the key objectives as well. Virgin Media, like any other enterprise, has a lot of tools, and for this marquee project the idea was to have a single pane of glass, which eventually gives you a dashboard. You will find many pipeline tools which claim they do that, and of course most of them do, but it needed a bit of integration.
So there was a tooling aspect, there was a container workload aspect. We selected a content management solution, which was high business value because immediately after the joint venture, that system had to be available to the end consumer and provide the business value. So it's the right ROI.
Moving on, what we did after that was go into selecting the right team and the delivery model. It sounds as obvious as it is, but we were fortunate to have a model which was first badges out the door. This is basics of delivery. Everyone tries to do it, but very rarely will you get the perfect combination, and I'm fortunate to say we had the perfect combination.
Badges out the door is what it means. Me, as an Accenture employee; Virgin Media employees; other third parties involved; the product vendors of the content management tool; support from Google; support from tooling vendors like GitLab: all of them were part of the same team, working together.
We also did multi-geography. This was about this time last year in May, so you can imagine everyone was pretty much remote. This has been done in IT for 20 years now, but we were able to use very effectively the follow-the-sun model. Part of the reasoning for choosing a multi-geography team was skills. We didn't care where the people were sitting. We cared that they had the right skills. So we got all the people from four different countries with the right skills.
Anthos was a relatively new product about a year ago. I wouldn't call it beta, but just released, and it was coming up with new features every couple of weeks. So we had very limited skills in the market, but we did get them. Obviously, security is a common skill, but it's also not a common skill, so we found the right people in the right location.
We also did a Kanban approach because this was a short-lived project. A lot of planning wouldn't have helped. Priorities were changing all the time. Another thing was trusting your SDLC and your people and your tools. That's what helped us. If you have a security scan running with a tool, don't go to a human to get them to check the tool again. The tool is telling you. Everyone can read that. So if you've done the implementation well, if you have the right people doing the implementation whose skills you trust, you should move on. Otherwise, you'll never finish it in this short time.
Remember, Fortune must have spoken about a slide where we mentioned six plus one weeks. I'll come to the one-week part. This was a six-week project to be taken into the development environment, and then the fun started, which you'll see in the next slide.
This is tools. This is a DevOps Enterprise Summit, so I'm not going to spend hours talking about this pipeline. I'm pretty sure most of the audience of this summit have done it, delivered it, are probably owners of these tools, and have seen hundreds of types of implementation. But I'll leave this slide on for a few seconds, few minutes, so that you can digest. GitLab is the base pipeline tool with a lot of security tools and in between peppered with your standard CI/CD tools for your build and deploy and test and your ticket management and your risk management.
So like any delivery project, everyone has problems and obstacles. We'd like to think a three-wheeler rickshaw running on a dirt track race would have been easier. This wasn't easy. This wasn't easy delivery. This was tough. Even with the best skilled teams, with the best intentions, things obviously went wrong, like most deliveries do. If you know Hofstadter's law, nothing ever finishes in time, or Sheop's law. So that happened with us.
The key obstacles: one was the organizational mindset, the existing silos, and I think most of the people who are part of this movement, this journey within DevOps, have seen that problem most places. So we had that as well. This was a pioneer on the Google Cloud Platform digital area for Virgin Media, the first project. On top of that, the first project on Anthos, GKE, and GCP, which, as I mentioned, was a relatively new product in the market at that time.
There were also some last-minute performance volumetric requirements, which meant that we had to scale the system to handle 10 times the load, which wasn't in the original requirement and design. Organizational alignment of different functions, these third parties, and the old ways of working obviously had an impact.
Interestingly, this being a DevSecOps project, security had a dual role. While helping us and enabling us to shift left security, it was also a case of them doing their job and blocking us or stopping us when they needed to. So the pen test team found that we needed an antivirus solution because the content management system had a file upload feature, so obviously it was required, and we didn't have a plan for that, nor any tools. So that gave us a bit of a hit. We would overcome that later on.
Interestingly, we found a security bug while doing supply chain testing of the product itself, and we had literally days. So we went to the vendor of the tool and asked them, and they couldn't fix it in time. So we had to find ways to ensure that the penetration testing passed while the feedback was given to the vendor, and they changed the tool later on in future releases. So these were the obstacles.
And now, the priorities. Yes, priorities change. We were supposed to go in development, but since this was a big business event of the joint venture, it was deemed appropriate by the business and parts of the IT function to take it to production with three days' notice. So yes, we were in a development environment on a completely new cloud platform, on a completely unused or untested container platform, with a workload that was sitting on another cloud provider and had to be migrated. And we were taking it from development to production on three days' notice. There was no production infrastructure in place.
But most of us work in the automation area. We know what infrastructure as code can do. We know how mature the cloud service providers are now. We know the experience in the global teams we had. So there was a fighting chance, even if it meant long hours and weekend working and whatnot. We took up the challenge.
Interestingly, if you've seen my car theme, you'll see that this is a different one. The reason is, I, as a member of the team, wasn't planned to continue after development. Then I was in Peppa Pig World on a planned holiday, which is when we understood that this needed to go to production. So I was in Peppa Pig World attending meetings because the priority of the business changed. That's why this picture: business will change priorities, and working in this industry, we all adapt and we all come through.
And then reminding again that IT is just an enablement function. Business is what's more important. Sticking the best product on the best container platform on one of the leading cloud platforms is a great thing for us in IT. It's a good learning, and we can write it in our CVs, but what really matters is what the business wants. Some people call it the BizDevOps moment, whatever you want to call it. The end result was business needed something, it made sense, so we made it happen. So the priorities changed.
Then we move on into ludicrous mode. I don't have a lot of text on this slide because this is worth talking about. What happens is we deliver this project from development to production. We have our security tools. We have a single pane of glass. We have functional testing. We have unit testing. We have performance testing. All the challenges I listed were overcome eventually. And we had a penetration testing team, the red team, testing directly in production. That's something I'll speak to in the lessons learned in a second.
So we had done everything, as much as everything can be said. I've worked across more than 30 different companies and clients or delivery projects, really, and this was one exception, which was, as a personal experience for me and for a lot of other people, we were ready. We had done everything and we wanted to say, "What else do we need?" We've done all sorts of testing. The workload is proven. Whatever we needed is met, and we have done phoenixing and cattle behavior in production. So we know it's rock solid.
The benefit of going on a new technology, apart from the risk, was we didn't have a lot of technical debt and legacy, so we could do all of those things. Then we engaged with Fortune, and he was a person who allowed us to say, "Go." It is very rare that you do not have a formal CAB change process or five humans sitting in a meeting discussing whether there is a risk there, fifty other teams we should inform, and whatnot.
Part of it was enabled by the decoupled architecture, where this application could stand on its own and dependencies were met. SDLC was done properly. Delivery was all green text. The security tool dashboard was telling you no criticals, no mediums, and no highs. There were false positives, which we got rid of. But eventually, it was a case of we were all in a call when we said, "Hey, it's done. What do we want to do next?" And we were told, "Go, ludicrous mode."
This was a massive change. This was a massive change because very rarely do you see anyone doing that, just relying on a tool and saying, "Yes, you're ready to go live." No other discussion, no other meetings, no approvals. In the context of Virgin Media having past challenges where three weeks, sometimes even six, were required just to approve things, this was a major step. So this was the ludicrous speed we went with: from development to production in three days and a total end-to-end process of seven weeks, which was, in my book, a reasonably good achievement. That's why this title of the presentation: it was really ludicrous.
Recap. These are just a few bullet points. Most of them are commonsensical or well-known, but these were very specific to us in the same project.
We saw that the senior leadership buy-in with Kai and Fortune was important because we were able to overcome communication barriers and directly reach the leaders when we needed quick support. It was also a case of a senior leader, if you read Linda Rising's "Fearless Change," having a general's ear helps. It's one of the methods she talks about. So senior leadership buy-in was important.
Security for enabling rather than blocking. So the mindset change where security are not your enemies; they're a friend. They want you to be successful. Choose a simple workload like we did, which is easy to implement and still has business value. Trust your people and tools. We do end up doing repetitive behavior when it comes to testing and security and approvals, so we did what we needed to, not any more than that.
Retrospective learning is that using bleeding-edge products and tools is great for learning, and rather use cutting edge; don't do bleeding edge. If you have to, then have the right team with the right skills. We were in that fortunate position to have the team with those exact skills we needed for all the tools and products we were using, and the team were quick learners when new things came in.
The red team, the penetration testing team, has to be completely independent. We had relatively new engineer people from a third party coming in telling us that our pen testing had failed because of five reasons, and they were empowered to say no to anyone in the firm. That was very important. I know the security constructs are such that these teams are kept separate, but that was very much something which was useful to us.
Remember, even if you build a new platform, unless you are a startup or one of the unicorns built in the last five years, you will likely have differences in non-production and production just because of the way it is. There is legacy, there is integration, there are people, there are mistakes, there is a limit to everything as code. So remember, production is going to be different. Wherever you can test in production, like we did at performance and penetration testing, do that.
Everything as code and cattle are only useful if you've tested them in production. That's obviously one of the main points. We all know it, but very rarely get to do it. We were fortunate because it was a new platform with some limited integration, and it was blue-green across different platforms because we were able to build this new platform and the product on top of it, and test it in canary before flipping the switch from the old platform to the new platform. You won't get this opportunity every time, but if you do, please definitely test your cattle and everything-as-code approaches in production. Blow it up. Do it again. It's something Gene or Martin would have said 15 years ago. It's still not done enough. So if you have the opportunity, definitely do that.
I think that's it. That's me. Thank you very much.
Karel Kohout
Thank you, Pulak. So what is next for Virgin Media?
We've enabled the development teams with various security techniques and tools, from threat modeling through static analysis, up to supply chain and antivirus, as Pulak mentioned. We've demonstrated clearly that we can do not only one release a day, but actually that the North Star is multiple releases, even hundreds of releases a day, and that implies scale.
We are taking Virgin Media on the journey from a traditional approach, not only to security, but also development, to the modern approaches: zero trust, cybersecurity mesh, and so on. What's more, you've heard it from all of us: this is a paradigm shift for not only security, but also for the dev teams. Moving security from blockers to enablers, changing the connotation of security, being positive, helping, and proving that we can do those multiple releases a day without compromising on security. Security being an enabler and allowing us to address risks.
As I said, this is a journey, and what is ahead of Virgin Media now is continuing that culture change, changing the minds and winning the hearts of the developers, adopting all the security practices without really needing to become security experts.
The other part is scale. How can we move to those 100 releases, applying the pipeline that we've shown you to each and every development effort across Virgin Media O2, across the company? This is where approaches like Accenture's Intelligent Application Security can really help, where we embed seamlessly, help to find only the critical issues, and accelerate remediation through the auto-remediation that I've mentioned. Meaning, the developers do not need to spend time figuring out what this security vulnerability is or what they need to do with it, but are actually offered options automatically on their code with the fix that they can apply and just check in.
Also, helping the whole organization, again, both parties, the dev, the sec, all three parties, I should say, and the ops, to run this at scale. Meaning 100 scans a day and making sure that security is, again, enabling, is embedded, and provides better outcomes from a code quality perspective and risk management for everyone.
So with that, thank you very much. Been a pleasure.