Go Faster, Break Less: DevOps Transformation at HSBC

Log in to watch

Amsterdam 2023

Go Faster, Break Less: DevOps Transformation at HSBC

Global Head of DevSecOps Transformation · HSBC

An interactive account of HSBC Technology’s DevSecOps transformation programme featuring David Keane, Global Head of DevSecOps Transformation.

This talk will provide rich insight into transformation from concept through to operational delivery, revealing the real and ‘in the field’ impact for employees and HSBC customers.

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

Gene Kim: The first talk of the conference comes from David Keane. He is Global Head of DevSecOps Transformation at HSBC, a global bank that operates in 70 countries with 236,000 employees, of which 50,000 are technologists.

I am so delighted that David is presenting for a variety of reasons. Certainly one of them is because of the massive scale at which David has influenced the technology practices across his organization, which I suspect will astound you, much as it has astounded me.

Another reason is that David attended this conference in 2015, 2017, when he saw Jon Smart present, when he was the head of ways of working at Barclays. David thought, "I want to be able to get to the point where we can share our story too." So I am so delighted that six years later, David is here to share his story with us. So here is David.

David Keane

David Keane: Thank you.

My memory, actually, of that discussion that Jon gave was this sense of relief. You know, my God, we are not alone here. There was somebody else experiencing what it is like to attempt to do a transformation in a big, old, traditional kind of bank. It was a relief to understand that there are other people with the same struggles.

I work for HSBC. I am sure many of you know HSBC, the brand. If you do not: big, old, traditional bank. It is about 160 years old, present in 70 or so countries. I remember when I joined about 12 years ago, I was lucky enough to be invited to an open forum like this. The newly minted CEO at the time of HSBC, Stuart Gulliver, was giving a talk. There is one line from that that has really stuck with me down through the years.

He said he is often asked: HSBC, the Hong Kong and Shanghai Banking Corporation, headquartered in London, is it a Chinese bank or is it an English bank? His answer was, well, it is neither. It is a Scottish bank. With apologies to any of my Scottish friends out there, his reasons were: one, we are tight with money. I can confirm that from personal experience. Two, we are deeply conservative. But three, we endure.

He saw those as characteristics of HSBC, and I think it was probably the best introduction to the culture of the place that anyone could get. I hope it gives you some idea of what it is like trying to do a transformation, some of the challenges that you are going to come across.

As Gene was saying in the introduction, it is a large bank. I think there are 50 countries in the world that have a smaller population than HSBC has staff numbers. We are regulated in all the countries, so that is 70 regulators. We have three different business lines: we have an investment bank, we have a commercial bank, and we have a retail and wealth bank. All these things lead to great complexity and maybe do not lend themselves easily to a transformation of this sort.

My own DevOps journey started in 2014 by a printer. You remember those things, when we used to go to meetings and you had to print everything out before you went. We do not do that so much anymore. I was based in the UK at this time, and I was running a large part of IT operations for the investment bank, as well as doing some transformation activities.

The printer, as it happened, was next to a colleague of mine, Peter Tans, who was leading the charge, really, on all things Agile ways of working within the FX department. Just as I got to the printer, our CIO rocked up, Richard Herbert, and he kind of collared myself and Peter. He said, "Hey guys, what do you think about this DevOps thing that Gene Kim has just invented?"

Fortunately, Peter was more clued in than me, and he was able to give some fairly convincing answers. Anyway, as Richard walked away, he turned back and he looked at us and said, "Peter, you are dev, and David, you are ops. Go do this DevOps thing for me."

Without very careful career planning, a few weeks later I found myself leading the DevOps transformation for the investment bank, and then a number of years later for the wider bank.

What are the reasons that we are trying to do it? There are three, really. Is it competitive advantage? We have to be able to deliver functionality to our users, to our customers, at pace. You might even say it is survival. We need to be able to respond to cyber threats and other events much more rapidly than we used to do in the past. But we also know from the State of DevOps Report from 2014 that firms with high-performing IT organizations are twice as likely to exceed their profitability, their market share, and their productivity goals. So it is good for your business.

Also finally, most importantly, we wanted to make it a better place to work.

So how did we go about that? This is not a 13-year transformation plan. That would be a bit brave. Do not take this back to your management, to your board of directors, and say, "I have a plan for 2036." This is more a look back at the main events over the last 10 or so years.

Where did we start? A small group of innovators. They taught us the art of the possible. With all the headwinds that you have in HSBC, or an organization like us, they showed us what you could do. We landed on a strategy somewhat inspired by Accelerate, but it was based on and focused on speed of delivery. Over time we built numerous capabilities, but I would say two really stand out.

We agreed a small set of metrics and we automated them so that we could know where we were heading, and there was no extra toil involved for the teams in order to understand that. We also went after some of our biggest blockers. In a highly regulated environment, you have 70 regulators in different countries and 40 million customers. It is not surprising that IT controls are a big deal for us. Going after that as a blocker was one of the other big things that we did.

Where did we start? FX eVolve is a foreign exchange platform for institutional clients. It is not like me coming here getting my 100 pounds sterling turned into 120 euros. This is multi-billion-dollar trades that it handles. It needs to be available 365 days a year, 24 hours a day. It is an incredibly important application for the firm.

We had a bunch of really highly skilled engineers that joined to work in this team through 2012, 2013. They had been used to working in a very agile way, so I think the Scottish bank was a little bit of a shock to their culture. They found a platform that was not that stable and a very strained relationship with the business. Very traditional business-IT relationship, a poor one.

Then disaster struck. The application, the system, went down and it was out for 48 hours, for two days. This is unheard of. I think if it had been down for two hours or half a day or something like that, they would have just fired somebody. They would have fired lots of people, but they would have carried on. Because it was down for so long, desperate times, desperate measures: they sat down and they talked. The business and IT decided that they could not continue to operate in the way that they had in the past. Something had to change.

From an IT perspective, they started doing a bunch of things. They looked at their structure. They were very traditionally structured, lots of silos. They had a dev team, they had a support team, they had a testing team, they had a bunch of BAs and the rest of it. Over time, but pretty quickly, they decided to get rid of all of that. Everybody had to carry the pager. Everybody did deployments. BAs learned how to do deployments; management. If you did not like that, you were thanked for your service but encouraged to go somewhere else.

From a business perspective, the role of a business product owner really started to become much more pronounced in that group. The culture changed in that group quite a big change, a noticeable change.

They decided amongst them that one thing they needed to focus on was to release more frequently into production. They used to do it at that time maybe once a month. They challenged themselves to do it once every two weeks, then once a week. If you met that team now, they release into production a hundred times or more a day. They do it 365 days a year if they want to. They typically do not do it at the weekends. If you were to walk into a meeting with IT and the business, you would really struggle to tell which are which. The empathy levels between the two teams are such that it would be really hard to tell.

For me, I was a bit nervous about this. I have to admit, the idea was that we needed to go faster and that was going to improve production. I do not know what the worst day of my career was, but I am pretty sure it was a Monday. For somebody who has worked in operations for a large part of their career, you are familiar with all these complex releases that have taken months to prepare and have been greatly tested going into production on a Friday. Then you discover on Monday, well, maybe it was not so great.

The idea that this very direct correlation that I had seen and understood all of my career - more change was more failure - the idea that more change was going to give us less failure was not something that came naturally to me or natively to me. That is why this graph is so important. The first time it showed and demonstrated to me and to many others that if you do all of the small things right - if you do small, de-risked changes, if you automate all the things that you need to, if you have proper product ownership in place - you can not only break this correlation between change and incident, you can send it into reverse. That is what this group did.

Our mantra became, "go faster and break less." We realized, as we tried to broaden this across the organization, that we needed to have a small number of metrics that everybody could agree on. So we landed on the DORA metrics, very happily. Going faster and breaking less: the number of changes and the number of incidents were the two most important ones to us.

We took changes one step further. We came up with this very beautiful acronym, PDPTPPY, which I am sure you have all guessed is production deployments per ten-person team per year. That allowed us a rudimentary way to measure maturity, essentially, from any part of the organization. We use that to this day. Incidents, I think everyone understands. You could look at a team's release frequency and its incidents and judge them against each other. More recently, we have added lead time to deploy, which is quite helpful, and change failure rate, more useful, I would suggest, for spotting anomalies than some of the others.

As important as agreeing what these were - and that is not simple in a place with as many opinions as silos - was to make these data points available really easily to everybody. Today we have got 5,000 pods in HSBC, 50,000 people, so roughly 5,000 pods. For every pod, we can look at these numbers, and everybody can see it entirely transparently. From a pod all the way up to enterprise, we can tell how people are going.

We needed to move on to the investment bank. Our next task was: how do you grow from that small idea of one team to a whole department? From one application to a thousand applications, from 100 people to 6,000 people. Leadership really was a game changer.

We knew the mantra. We knew that "go faster, break less" was the idea to sell to people. We had proven that. But we got lots of pushback. There were lots of excuses. It is hard. I have got legacy systems. My business does not understand. It is going to rain tomorrow. Whatever it was, there were lots of excuses. So we sharpened down on "go faster, break less," and we made it "double and halve." You need to double your releases and halve your number of incidents every year, each year. Every year we got that message out.

We told people there were a few caveats that came with that, and they are kind of important. If you are in charge of a service-line department, you might have 50 systems, you might have 100 systems. We did not dictate that you needed to double them all. If you needed to quadruple one and flatline another, that was perfectly fine with us. You needed the outcome for your entire organization of double and halve.

Really importantly, also, it was multi-year. First year, people are used to fads. Business say, the management say you have to do this this year, but next year they will have forgotten that; they have moved on. We are in year six of this in the investment bank. People realizing this was not a fad, that it was going to stay, was a very important thing.

Finally, it is an OKR and not a target. That is really important because you want people to be ambitious. Doubling is ambitious. Going up by 10% does not change the dial. Anybody can change by 10% by working a little bit harder or cutting something out. If you double, and you do that every year, you have to tackle the hard architectural issues or whatever else it is. Having that ambition was really, really key.

This is a quote from one of our business leaders in the investment bank. They are big supporters, such big supporters they think they invented this now. We like to let them believe that.

There is a story that maybe illustrates this a little bit better. Giles and some of his team were visiting a client, a shipping client in Asia, I think in Thailand. Just as they were leaving, one of the guys said to them, "You know what? I would love if I could cut and paste that thing and stick it into my spreadsheet over here. That would really, really help me." He did not expect the answer. The answer was, "Yeah, we will do that and you will have it tomorrow." And we did. We delivered it the next day.

That is a really small thing. But we compete on two things, really, with all of our clients. One is our price point, our fees, and the other one is our client relationship. That client wrote back to us the next day and said, "Literally, I am stunned. I expected I would ask for this and you would say, well, we would add it to the backlog, and if lots of other people like it, we will do it next year in version 2020x. But to turn it around the next day, I am stunned." We do not have to work that hard with that client anymore. He is stickier. He is going to do more business with us. We have delighted our customer. It is a really simple story, but I think it tells the difference this can make for our business.

Having done the investment bank, we had to get on to do the entire bank. Now we are moving to 8,500 applications and 50,000 people. We know our mantra. We know "go faster, break less." We know "double and halve." We went back. There are nine different CIOs across the bank, all across their own departments, and we had the same thing: "We cannot do it here because the business does not like it," or, "We are not the investment bank. The business are more risk averse," and the weather, and all the usual things.

The first year, 2022, was the first time we got all of the CIOs to agree to a target, not an OKR, that they would increase release frequency by 10%. We pushed all the other things out, the metrics and the other enablers that we had given them. They surprised themselves. They hit 40%. Guess what? In 2023 we were going to double and halve. Double and halve is the mantra for everybody.

If you are trying to change an organization this size, you have to change the culture. You have to appeal to the 50,000 people that we have as engineers. In the program that I am running, we really treated the engineers as our customers. Putting feedback loops in with our customers was key. We listened to what their biggest bugbears were. For us, it was the IT controls. So we went after that big time.

We looked at the intersection of where you had the highest number of releases and the simplest control story, and we found this journey, which was the journey that we were trying to push towards anyway: the simplest software release that only had to do four controls in order to get out into production. We automated the hell out of that. We simplified the hell out of that. But we treated the controls as products. That was the really, really key differentiator.

I happened to be the controller for two of them, so it was a little bit easier for me to eat my own dog food. We treated the engineers as customers because they were the people consuming these processes, and they had never been listened to before. They had always been treated as the enemy, somebody to be suspicious of. That was a real game changer.

I think it has helped shift the dial for us. In 2022, because of the work that we did, we were able to eliminate 35,000 days of toil from the engineers' experience that year, and seven and a half centuries of wait time in HSBC. You love big numbers. Seven and a half centuries of wait time removed - that is just something that pleased any engineer. I think we are winning the hearts and minds of that community too.

In March we had a Spotlight, an internal TV event, where we invited people to a talk about our new DevSecOps strategy. Maybe a fairly dry topic, but 11,000 people turned up to it. It is voluntary. You had to register for it and you had to take the time to either listen live or listen to it later on. I can only think that they were doing that because there was something in it for them: what was the next thing that was going to save them time and money?

Our group CIO recognizes that this change - and again referring back to the DevOps Report from 2014 - we showed that there is a close correlation in organizations that went faster and those that were more profitable. We have proved that at a small scale for eVolve. We have proved that at a medium scale, if you like, for the investment bank. What we are doing now is trying to prove it for the enterprise. That is where we have gotten to.

We have learned a lot along the way. I do not have time to list all the things that we did not do so well. What would I say that did work well? Transform from within. Get outside advice, get help, absolutely. But your teams must know they have to do this for themselves. Have a simple set of metrics, a simple message: double and halve. If you have to communicate with large numbers of people across multiple different countries, you cannot get to them all. You have got to tell them that leadership is key.

The experience with moving to product from project is important. The IT controls thing for us was a huge game changer.

What not to do? Do not tell smart people what to do. Agree with them what the outcomes should be and then get out of the way. Beware fake news. I put that in there because too often you have people celebrate a success story, but engineers will know it is not true. Or a manager will say, "We are done. We are 100% agile," or "82% DevOps," or whatever it is. People see that for what it is. Try and avoid those things.

Last thing I will say is: I think I am going to do a Q&A session at midday in Ballroom Four. If anybody has any further questions, I am happy to take them. I have a few questions of my own; be warned. I would love some help around how you can do tech-for-tech funding, and also moving from correlation to causation for ROI in a heterogeneous environment. That would be something that I would personally like some help on.

Gene, thank you all very much.