Global Continuous Delivery in a Financial Organization
After moving to Agile methodologies the next step in innovating the ING IT landscape has been the transformation to Continuous Delivery and DevOps. At beginning this transformation was local to each different team. Every team could define and implement their own continuous delivery pipeline, integrating the ops activities with the dev activities. However, this approach caused a lot of repetitive work among the different teams to implement and maintain their own pipeline. Moreover, in the case of global improvements needed in any step of the continuous delivery process (e.g., Quality Assurance improvements) all the teams should apply different changes to integrate the same improvement into their different pipelines.
To avoid these problems we developed a global continuous delivery pipeline that is available as service to all ING DevOps teams. Having one unique continuous delivery pipeline has different benefits. First, repetitive development activities have been removed giving this responsibility to a dedicated team. Second, having a global continuous delivery pipeline enabled the seamlessly integration of improvements to all teams using it. This is particularly relevant to assure always high quality and compliance of ING software systems. Finally, having all applications going through a unique pipeline makes the pipeline the single point of truth. The continuous delivery pipeline is not only a technical support to ease the release of software systems. It is the ideal means to apply the scientific approach and, hence, methodologies like Six Sigma and Lean Analytics. Currently, the pipeline is becoming a framework to apply the DMAIC (Define, Measure, Analyze, Improve, and Control) data-driven improvement cycle. Since all the software systems goes through the same pipeline, managers and devops are enabled to solve problems following a data-driven approach. Once a problem is defined (e.g., improve productivity, reduce software problems) metrics can be easily measured mining all the components of the pipeline (e.g., versioning control systems, quality assurance systems). The values of these metrics provide relevant insights to, first, act on a problem and, then, to measure the impact of the proposed solutions. As a consequence, the pipeline is becoming a framework to speed up the resolution of organizational and technical problems and not only the release of software.
However, the globalization of the pipeline can hide critical risks for the agility of the entire organization. First, it’s challenging to have a unique perfect pipeline for the needs of hundreds of different teams causing frustration in teams. Second, having a centralized team dedicated to the pipeline isolate its engineers from the rest of the IT landscape. The main risk is that the CD development teams take away the teams’ responsibility of mastering the tools and the entire delivery process bringing back the organization to a pre-devops era causing the silo-oriented organizational problems. To eliminate these risks the Global Continuous Pipeline at ING is becoming a framework that everyone can extend and contribute to. In this scenario the central continuous delivery team is the integration team that integrate all contributions coming from all teams while assuring the concept-of-one and
Dr. Daniele Romano, Product Owner Continuous Delivery as a Service, ING
Chapters
Full transcript
The complete talk, organized by section.
Dr. Daniele Romano
As explained, I'm Daniele Romano. My role within ING is Global Product Owner of the Continuous Delivery Pipeline. And as product owner, I'm responsible for defining the vision and making sure that we execute the vision and roll out the continuous delivery pipeline across the globe.
But today, I don't want to only show the continuous delivery pipeline. Trust me, it's a nice pipeline. I don't think it's the best pipeline in the world, but I would like to explain a bit the journey: what we did at ING to reach this level of continuous delivery. It's a journey that started back in time, in 2012. And, well, I think we did great progress, but we still have many challenges ahead, and I hope that the senior management will still invest in this initiative.
To better understand the journey, we should take a step back, back to 2012. We were not playing rugby. Well, still football, soccer, is our favorite sport. We are sponsoring the Dutch national soccer team. But we were doing Scrum. So Agile, Scrum especially, was our religion. We had all the practices on the floor. We were doing the planning, we were doing the retro, the daily stand-up to track the progress and to monitor the impediments.
Yet management was not really happy about the lead time, the time to market, and about the quality of our software system. And, well, this picture: we loved Agile and Scrum so much that then we started visualizing the release journey, the release process, on the brown paper. And you can see that, well, it's not the easiest journey. I'm sure you cannot read the blue stickies, but trust me, it represents the time, and for some steps we are talking about weeks. And actually, I could not manage to have all the release process in one slide. So we were doing definitely something wrong.
With this complex release process, well, it's not a surprise, but when you are about to release, you don't have any confidence in what you are doing. You cannot even remember all the steps that, across weeks or months, you followed. And if you find somewhere some courage to go live, then, well, most likely you cannot go back to your dears at home and enjoy the dinner with your family, because you have to stay and work, fixing all the bugs that you could not control in this release process.
And that's where our journey started.
Usually, I don't like to advertise anything during the presentation, but I think that this book doesn't need Daniele Romano to be advertised. Anyway, the senior management bought this book. So they read this book, and then, well, they bought many copies. They put the copies on the different engineering floors, and then we started understanding, first of all, and then doing continuous delivery.
What was continuous delivery back in time? With continuous delivery, we wanted to solve two main issues. First of all, we wanted to automate as much as possible. We wanted a manufacturing-like production line that could automate all, or ideally all, the manual steps and all the steps that needed some human intervention, only because it's simply too error-prone and the cycle time was too high.
While doing so, while improving the lead time, well, definitely we didn't want to forget about the quality, and we had the ambition to improve the quality as well. So being faster, but with higher quality. And this means automating especially all the quality checks that we had in the release process.
So let me show a bit the pipeline, the CDaaS offering, as we have now. Well, this is an example of a Linux/Java application. Everything starts in ServiceNow. In ServiceNow, the managers and the person responsible define their vision, define the themes, and then based on the vision, we split the vision into epics and stories.
Then the DevOps team starts working on these stories. We have Git as version control system, and we have GitLab to share the collaboration across different teams and across different teams in different countries. When you do a commit, or even better, a merge request, then the team should link the commit and the merge request to the story ID. In this way, we have the traceability with the vision.
And ideally, you go straight away to the continuous integration stage. There, in the case of the Java and the Linux application, we have Jenkins. Jenkins orchestrates the build, but some quality assurance as well. For instance, we can have Maven to orchestrate the build, and then we can have checks like OWASP, like SonarQube, like Fortify, that assure that the code, the in-house developed code and the outside delivered code, they don't contain any major critical high vulnerability.
As output of the stage, then you have a binary, you go to Artifactory. Artifactory is the place where we version the binaries, and it's the place where, again, we enable collaboration across teams in the different countries. So if you don't want to contribute through code, then you can take a binary and reuse it in your own application.
This continuous integration step is integrated and automated to what was the ops part of our delivery process. Here we have a release automation orchestration tool. Through this release automation, basically what we do, we orchestrate different steps that you should follow in the release journey. So basically, it allows the DevOps team to go to, well, the classical development, testing, acceptance, and production environment in automated fashion.
And depending on the different environment, then you can execute, always automatically, some test cases. And we have different testing frameworks. For instance, JMeter, SoapUI, etc. That depends on the technology of the team. These test cases are, again, defined in GitLab. So after using GitLab, we can say that the DevOps team doesn't need to log in in any other system.
Cool. We implemented it. I think after two years, two years and a half, we already had a lot of applications going through the pipeline. We had around 700 applications, and these applications could bring the features that our customers were using.
Yet the management was still not happy. And we still had some long lead time. We still had some reliability issues. And basically what we recognized was that we were kind of building a Ferrari, a high-speed car, but we didn't have any idea about driving it.
And that's where, basically, we added an extra layer on top of the pipeline. We used Kafka to get all the events out of the pipeline and to monitor the way we are using the pipeline. So it's not only about the tools, but it's also about using the tools in an appropriate way.
With this analytics approach, then we built a tool we call the Cubes. It's an in-house developed tool. And here, basically, we were visualizing more than 50 metrics, from the frequency of deploying to the frequency of continuous integration, etc. And we started doing the scientific approach. So getting the facts, trying to understand what's going on that slowed down the delivery, and trying continuously to improve it.
And we didn't do this alone as a banking company, but we did it together with some academic institutions, for instance, with the Technology University of Delft in the Netherlands and the University of Sannio in Italy. For those of you interested, we have some papers published in scientific conferences. In this paper, what we explain is a bit the best practices, and worst practices as well, that we were following within ING with regard to continuous delivery and Agile. We have other papers about what are the most bottlenecks in the continuous integration stage. And of course, we have papers that were rejected, and we are trying to improve.
So what were the problems? I don't think it was a great surprise, but we had facts. So first of all, we recognized that we needed to invest more in cultural innovation. So again, it's not only about the cool technology, it's not about the best tool, it's about using it and having the mindset of using it.
The DevOps journey. We didn't start with DevOps. We started with Agile, then we applied continuous delivery, and only at a later stage, we recognized that we had to improve our DevOps journey.
And no surprise, well, we are providing banking services. We have regulators. We have compliancy, and we noticed that this compliancy was too expensive. So we had to find a way to have the same assurance, but with less time.
So let's start. Let's see one by one these problems, and let me give you an idea about how we are trying to solve them.
Cultural innovation. When it comes to cultural innovation, well, I think this involves your entire organization, and definitely it starts with managers.
Why the management? Because portfolio management or the lean portfolio management can be really a critical step. If you spend weeks, months in a long decision-making process, well, then you can go to production in one day, but you still spent a lot in discussing what's the best feature that you have to deliver. And continuous delivery is about delivering the feature and then, based on the customer feedback, deciding how to improve or to test your hypothesis.
The second point is about understanding what are the bottlenecks within your organization. Most of the time, there are some teams in the organization that are simply a bottleneck in the moment that you want to deliver a functionality. If you have one team that is a bottleneck and gets all the feature requests from hundreds of different teams, well, I think you should reconsider the organization and change it.
Product owner is another important role in this journey. I think it's the most important role because, at the end, the product owner is the guy who should give room for improvements within the sprint. If we have only pressure from the product owner and the management, and we never have time to make any improvements in the quality of our product, it's impossible, also with a continuous delivery, highly automated pipeline, to improve the quality.
And most of all, continuous delivery is about releasing small and frequently. If the product owner is not able to split a big vision in smaller steps, then we are still doing kind of waterfall approaches.
And last but not least, the most important actors in our organization are the engineers. And the engineers as well need some shift of mindset, and all the practices that are well explained in the Jez Humble book, well, they should be applied, and they should have the time to do it.
Integrating continuously the change. Don't work in isolation and only after weeks, months, then go to your peer and decide to integrate your change. I think it's one of the most important practices.
Push your change as far as possible in the continuous delivery process. I think this is another important guideline. If you have a continuous delivery pipeline, and then you first spend months in developing, and only after months you go to the continuous deployment testing, well, this is not continuous delivery. So it's better to not spend money in automating the delivery journey.
And of course, shift left all the quality checks. In this way, you can fix the issues as soon as they appear, without waiting weeks when you already forgot all the context of the change that you did.
Okay. How ING applied this cultural innovation. I think that we have a great senior management, great executives, and, well, it's not easy to transmit this new culture. What they did, they organized thousands of people under reorganization over the entire company. They applied the Spotify model.
I don't want to go into details about the Spotify model. I think you already know this new organizational way, but I want to highlight two points. And the first one is a product owner with IT background. The product owner should understand definitely the business, but if he or she doesn't have any idea about what's mutation testing, what's smoke testing, or what's confidence check, then the quality of the system will never improve, only because they will never reserve time in the sprint.
And the last point, but I think this is already, well, I heard many times since this morning, is business in the tribe. So in the moment that you have your business colleagues working together with you, then it's easier to do portfolio management. It's easier to test hypotheses, and you don't spend a long time in the decision-making process.
Another important thing that we did after reorganizing, we basically reviewed the offering of our continuous delivery pipeline. We decided to call continuous delivery a service. Why? Because we don't want to give the tools. We want to give them the mindset and the usage of the tools as well. And that's why we called continuous delivery a service.
Right now, we have basically one opinionated flow that is reused by all the different teams that are using the pipeline. So basically, we are telling all the teams to adhere to this flow, and this is challenging. How can you make sure that you have a generic flow working for hundreds of teams?
You can follow two approaches. The first one: you call your CIO, your CAO, or a powerful guy within your organization, then you ask, "Okay, please define the flow in the pipeline, and then we're going to dictate it on the floor."
Cool, but this is not the approach we followed. What we followed, we created communities. So basically, we took the open source model, and we created communities that were telling us how the pipeline had to look like. And then we made sure that this pipeline could fit the continuous delivery mindset. It took some more time, but I think that engineers are happier. At least I hope so. And now we have communities that work with the front end, with APIs, Scala, .NET, COBOL applications, TIBCO, etc.
Okay. Cultural innovation was one, I think it's the most important step in doing this journey. But then, as I already told you, we recognized that we had to do some DevOps transformation. And I don't want to spend a lot of time on this slide, even though this is a DevOps conference, but I would like to highlight only one important aspect of the DevOps transformation.
How did we make Dev and Ops work together? Basically, we put everything as code. And not because it's cool to have code, but only because there we could control, make sure that Dev and Ops are working together. So right now in GitLab, we have an authorization group, and only the people within those authorization groups can contribute to one application. There they define everything. So they define a set of blueprints and the manifest file where they can specify the entire IT stack of their application.
From the different systems that compose the domain, the connection between the systems, and all the features from the operating system up to the different middleware layers that they have. Basically what we are doing, we are shifting from a deployment approach where you had an IT stack where all the ops were putting some layer, and then at the end, the application team was putting an extra layer on top of this multi-layer system.
Now we are transforming towards the assembly of the entire IT stack from a few configuration management tools that are versioned in GitLab.
Compliancy. Well, we are a bank, we are not delivering pizza every day. We have to be a bit in control, I would say, of what we are releasing and make sure that our customers don't suffer any problem. Here we did, I think, two main important improvements.
So first of all, we are moving more and more towards immutable patterns. So basically, we want to be sure that after you release an application, the state of your application will not change over time. So you're confident that what you're putting in production, it stays like you meant. And we have different ways to implement this concept. One is just having immutable servers, containers, or whatever. So you put your virtual machine read-only, then you're sure that this will not change. Another way is to use other configuration management or deploy tools like Puppet, Ansible, Chef, that were born as a self-healing technology.
But before going to production, then you should be sure that what you're releasing is compliant to your regulation. And here what we did, remember the Kafka layer, we implemented on top of the Kafka layer an in-house tool that's called iValidate. iValidate, what it does is getting all events from the pipeline, and then it has some business logic that can tell if the application you're bringing to production is compliant or not automatically.
So now it's becoming an automated tollgate, meaning that before going to production, the release orchestration will send an API call to iValidate and it will ask, "Okay, can this application with this version number go to production or not?" And then you will get a yes or no. Of course, for managers, there is a magic button to overrule all this criteria if you want to risk your job. But here, basically, you get all the vulnerabilities, you get all the testing results, and then you can profile your application to understand how ready it is to go to production.
Okay. Finally, management, I think, is happier. At least it's more relaxed. They know that we can complete this big continuous delivery journey. We have improvements both in the reliability and in the lead time. The lead time now, or the cycle time, can take minutes or hours depending on which functionality you have to bring, depending if you have to perform or not the manual penetration testing. But I would say most of all, we have a more relaxed engineering floor. So you cannot recognize anymore the teams that are releasing and the teams that, well, they just started their release.
What's next? Next, well, it's a global continuous delivery offering. We already started and we already did huge progress. ING has retail or commercial business in over 40 countries in the world, and we want the teams to reuse this central shared CDaaS offering.
And before concluding, those are some numbers to give you an idea about the usage of this pipeline. So on average, we are onboarding in the pipeline around 50 applications every month. And we have in total 1,000 applications going to production within the pipeline. It means that more or less 700 teams are using the pipeline from across the globe, with mainly usage within Europe.
To wrap up, because I think my time is up, I showed you where we were back in time, what was the complexity of our release process. I showed you, of course, our continuous delivery pipeline. And I went through what we think are the main challenges of this journey, from the cultural innovation to the review of the DevOps journey, down to having a better control of the compliancy.
And with this, I'm thankful that you joined this presentation. I hope that I gave you something that will open a bit, well, will give you some feedback for your journey. And I'm not sure if I have time for questions. No, I don't have. Thank you.