Barclays Cloud Experience Report: “From Racks and Cables to the Cloud”

Log in to watch

London 2018

Barclays Cloud Experience Report: “From Racks and Cables to the Cloud”

Vice President - Cloud Technology · Barclays

Nick Funnel leads development practices within CTO Application Hosting - an infrastructure function working to enable Public Cloud across the whole of Barclays.

I’m also responsible for ‘ways of working’ and DevOps within GTIS (Global Technology Infrastructure Services), as the organisation moves from a traditional physical infrastructure function to a modern software-driven organisation. I’ve been in role since 2016, involved in setting up what was then a new function - an 'accelerator' group to focus on enabling Public Cloud within Barclays.

In essence, as an infrastructure function, GTIS does not have a history or culture of enterprise software development - it manages the data centre estate, and where people are writing code, it’s largely scripting and provisioning.

As an experienced enterprise developer, I was asked to grow a culture around development standards and practices. My role has expanded to encompass promotion and adoption of new ways of working (agile/lean and devops) within the function. This role has expanded to encompass ways of working, and helping the teams to be more cross-functional and customer-focused.

Chapters

Full transcript

The complete talk, organized by section.

Nick Funnel

My name's Nick Funnel, and I work in infrastructure technology at Barclays.

Today, I'm going to tell you a little bit about the last two years, where we have essentially taken an infrastructure function, a group of infrastructure engineers, fairly traditional infrastructure, and evolved, turned into essentially a software-driven team delivering and enabling cloud within Barclays.

The financial industry is changing all the time. But in the last 10 years particularly, margins have been squeezed, profits are reducing, regulations are tightening. It's harder and harder to do things. You have to do more and more with less and less.

Particularly, finance has always been underpinned by technology. You can't do anything financial without technology. And as I say, we're having to do more and more with less and less. It's getting harder and harder.

Additionally, the cost of entry is now a lot lower. With cloud technology, you can scale from nearly zero. You can get to the market very quickly. You can get an idea to market in weeks, if not days. It's very, very easy. To take advantage of this, you have to be fast, you have to be nimble, you have to be reactive.

This is Barclays. We've been around a long time, 328 years. We've got annual revenue of 20 billion pounds, 48 million customers, and we're a big organization too. We've got 80,000 employees in 40 countries, and 29,000 of those in technology. We are a technology company. As I say, all finance is underpinned by technology.

To support that technology, we have, I think, 18 data centers. Our expertise is in finance and finance technology. It's not in data centers, and yet we have a lot of infrastructure. We support all our technology ourselves, essentially.

You all know, or you can imagine, in an enterprise organization, if you've got an application, to get technology, to build technology, you need somewhere to put it. You need to fill out some forms. You need to scope what you're doing. You need to estimate what you're going to need, and you normally overestimate so you can scale. You may pay for it upfront. And then you wait. You wait weeks, if not months, to get your space on the physical infrastructure, somewhere to put your application.

So when we're talking about building applications and bringing them very quickly to market, you just can't do that, essentially. Very, very hard.

Additionally, we save money where we can. So if we're building an application which is load-balanced on two servers in production, maybe we just buy one for UAT, just to save money. So our environments aren't identical.

Our challenge is we're going all in on public cloud, Amazon Web Services. Essentially, we are building a foundational software cloud service for our app teams within the bank. We're not doing this in scattered areas. We're building a foundational platform.

What we want to do is get out of the way, essentially. We're not building a broker layer. We're giving our developers full access to Amazon, so the AWS console, the API. And that's really, really hard. That's really tough, to build in security.

Rather than having security guards on the door, we've got incognito security scattered throughout. If we've done our job correctly, then our developers won't be aware of what we're doing. We're sort of checking things in the background. We're keeping our developers safe, and we're keeping the bank safe. So that's what we're trying to do.

This is GTIS, Global Technology Infrastructure Services, and this is where I work. This is 5,000 people, and somebody said to me the other day that this is the size of a not insignificant company, just infrastructure alone: 5,000 people, 18 data centers. GTIS supports all the infrastructure that the bank runs on: the networks, the data centers, the hosting, laptops, desktops, everything, essentially.

But it's, as I say, traditional infrastructure. So about two and a half years ago, we pulled out around about 35 people from across infrastructure. We had database, network, storage, virtualization, all these fairly traditional areas. Essentially, we were thrown together and basically said, "Right. AWS. Get it done. Get it live as soon as you can."

Which was an interesting challenge. We didn't even know how to do this, and it was really, "Work this out as you go. Take cloud, work it out, and get something live."

Just to tell you why I'm standing here, essentially, my experience is as a financial software developer. I've worked in banking my entire career. Banks have their own set of challenges, really interesting challenges. As I say, we're technology companies that do a bit of finance, basically. We do some really, really clever things, but it's also a very constrained environment. We're highly regulated. You have to be very careful with what you do and how you do it.

I joined infrastructure about three years ago, and my challenge here was essentially to help us evolve from a traditional infrastructure function to a software team, essentially, to build software skills and practices and deliver cloud.

The thing is, not only did we have to be a software team, we had to be a really good software team. If we're building a foundational platform, this has to be really, really good. We know that, traditionally, all applications die eventually. You start them off, you take some shortcuts, you rack up technical debt, and eventually there's so much technical debt that you're running just to stay still, basically, keeping it working.

At which point, somebody usually says, "Well, we should rewrite this. We can do this better. We can do it differently, and we'll get rid of the mistakes of the past," and you start again.

With cloud, I don't think we're going to be able to do that. It's foundational. If we are exiting our data centers, or at least that's the aspiration, then we've got nowhere else to go. This has to work, so it has to work well. So the challenge we were facing was considerable.

This is an experience report. It's a journey. It's the journey word. I'm going to take you along the timeline, the last couple years, just finding a couple of things to talk about that are things that we've learned along the way as we made this up.

Now, the Y-axis there, it's not entirely scientific. It's really my view of how things were going, how we were feeling as a group. I would expect that most of the people involved along the way would probably feel very similar to me. So it's an idea of where we are.

Around about March, April '16, that's when we kicked this off: thrown together, get cloud done. We were scattered over four locations: London, Cheshire, Lithuania, and we had one person in Singapore. Very different cultures, even in the same country. London and Cheshire, very, very different ways of doing things. Different areas of the bank we'd come from. All very different mindsets.

The first challenge we had was we were all busy, there was a lot getting done, but it was slightly chaotic. We had a lot of very smart, very talented people working on some really interesting things. But we were struggling with being pointed in the same direction, essentially.

So we had to find a way, A, to bring us together and, B, just to get the work under control.

What we did, and there's a couple of things here, we did two things. First of all, this is something that Dan North coined, the visualize, stabilize, optimize. Essentially visualize the work, get it all into a system, turn the lights on, see where you are. And this is what we did. We used Jira. We threw everything into it.

I was going to say it was a positive effect. It wasn't an immediately positive effect because a lot of people were fairly unhappy. But long-term, it was a positive effect because it essentially showed that we were slightly kidding ourselves. We had product owners who felt all their products were being worked on, and yet, when we put it into a system like Jira, we could show that actually that's not the case. We're only working on this, and we're not working on this.

That created a lot of healthy tension. A lot of interesting conversations.

The second thing we did, and this is what you can see up on the screen, we did a lot of stuff with cameras. We set up webcams in all our locations, so essentially you could see into each other's offices. That was really, really powerful. Far more powerful than I was expecting it to be, actually.

For a start, I saw people on the camera that I didn't recognize, and I didn't recognize them because I only knew them from their phone book photo that was 10 years old. These are people who were working in the same team, and I didn't recognize them.

I think it made a huge difference to the way we worked together, to bring us together. We also did daily stand-ups around video boards. Again, all just the human interactions. Obviously, if you can, co-locating is the best thing you can do. But this is the next best thing, and it worked very, very effectively.

Advancing along the line. By this time, we had the work more under control, but we were still trying to do too much, essentially. Amongst 35 people, we had two testers, and they were basically overwhelmed. They were overwhelmed because we were treating them in a very traditional sense. The developers were doing the work, and then they were assigning it to the testers.

We thought, "Well, let's get that under control." We were using a Kanban board. We'll have an in-test column. It'll be fine. So we did that. That made things worse because essentially work appeared in the in-test column as if it was in test, but actually it wasn't, and our testers were complaining they'd never seen this before.

So what they suggested, rather than this sort of model where essentially development work is done and it's then handed to the tester, we called it power of three. It's also known as the Three Amigos, and it's something that George Dinwiddie coined, I believe. It's very, very simple, and it's essentially where you take a product owner, a developer, and a tester, and they're involved right at the start of the conversation for every requirement that comes in, every story.

So we did this, and we worked through our backlog. It took us about a month, having a meeting every morning, and we just plowed through them, essentially. I think a lot of the developers were pleasantly surprised by the changes that it brought to their designs. Making something more testable actually makes it a lot more consumable, particularly for APIs and that sort of thing.

Again, a very, very positive effect. We ended up here, essentially. So straight through. The outcomes are more positive. The quality is more positive. Particularly around cloud and cloud technology, where it's all software, so it can all be tested using software.

What we found ourselves doing, with a rapidly building test automation suite that gave us confidence to proceed and a stable platform, was we could start deploying regularly, we could start deploying frequently and make it boring, make it repeatable. And it worked very, very well.

Based on our findings there, where we're actually moving our testers earlier in the conversation, we'd stood up a couple of groups where we were using testers, where essentially we almost formed a couple of teams, basically. And where we did that, it worked very well. We still had chaos outside of those teams, but the teams themselves worked very well.

So what we did is we decided to essentially split our entire organization into feature teams. The idea being that essentially you would give a team, you'd empower them. We took a lot of detail out of the stories and just gave it to the teams, said, "Right, solve this problem." This was just basically a group of developers, each one with a tester assigned, not to do the testing, but to ensure the testing was done, ensure the quality, almost like a test coach, I suppose you'd call them.

This is something that we learned along the way. Anybody who developed software in an enterprise, this is the traditional waterfall model, I suppose, where you've got your siloed functions. Essentially, you have this mechanism where you throw things over the wall. Development builds it, they give it to tester to test, maybe it comes back, you engage security late in the process, and then finally, you get to the ops wall, where you have an argument with ops, and they say, "Well, you haven't provided the documentation," and then there's a debate about that.

Again, it's not very lean. There's handoffs all the way along.

What we were trying to do is something more like this, where essentially we're aligning to the service and we're putting everybody behind the service. So rather than thinking about your function, you're thinking about bringing your function's expertise into the delivery of the product.

Essentially, as a security person, rather than thinking about being secure, you're thinking about delivering in a secure manner.

This is really key, actually. What we're seeing here is that the technology isn't really the hard bit. We had continuous delivery, we had test automation, we had monitoring and metrics, all these things. But that wasn't the hard bit. The hard bit is this, essentially.

Based on that, the next logical step then was to look at involving the security consultants.

Security had been a huge problem for us for certainly the first year. As you can imagine, banking, cloud, security, a lot of fear, a lot of uncertainty, and a lot of doubt. A lot of security consultants, experts, they're not willing to work in a pr-- No, I won't say that. I won't say they won't work in a pragmatic way, but there's a lot at risk, and the culture we have is very much one of safety, very much one of, "Well, this has to be secure," not, "This has to be secure in necessarily a practical way."

Of course, the problem is, in an organization the size of Barclays, we've got infrastructure, we've got security, two big parallel organizations. If you can't come to an accord with your security consultants, you have to escalate very high to get a decision made, if you like. So you do your very best to avoid that, to work more closely with them, I suppose.

Again, going back to the cross-functional model, in a startup or a small company, you get that for free because all of your people are sitting together. There's probably a small group of you, and you talk to each other and you share your problems.

Whereas in an enterprise organization, as you scale, as you get bigger and bigger, you form functions. If you take a small startup, maybe you've got 10, 20 people. You've maybe got one person who focuses on, I don't know, recruitment and people. Say that company then doubles in size. Maybe you've got two of those people. Say it then triples in size. You've got three of those people. And then you've got a department, basically.

So then they're starting to share their specialties, and then maybe they'll sit together. And then maybe you think, actually, we can cut back. You can do without. You've essentially created a department where, instead of the people being focused on the product you're delivering, they're focused on their role, on their function.

We see this all the time in a large organization where, anybody who's worked in technology, you must find this, where you raise tickets and you're waiting for the ticket to be processed. The person processing the ticket does not have the urgency that you do because they have absolutely no idea why you've asked for the ticket, and you need these escalation processes.

Whereas I can pretty much guarantee you, if the person was sitting next to you, they would get it. They would understand, and they would understand the urgency.

By bringing people close in together, we're getting, again, much more positive outcomes.

So you see here, this is the problem. What we tend to do in the traditional model, I suppose, is we engage security late. We talk to them at the end where we've nearly built the thing, and then we say, "Here you are," and we hand them this thing, which is basically a black box to them, and some documentation, if they're lucky. They look at it and they go, "Well, okay." They can't possibly understand all of the details. They haven't got time.

So they will raise a number of constraints, if you like, a number of risks. At which point, we have a discussion about it, a robust discussion. And then we accept the risks and we move on. In a company like Barclays, we have internal systems to register these risks because it happens so often that they will raise risks. We need to put them in a system and mark them as accepted so we can move on.

Again, by bringing them into the teams, we've moved it from a default, "No, I'm not comfortable," to a default, "Yes, I don't see why not." That's something I heard a lot, actually, is, "Well, I don't see why not. I have no objection."

I've seen these conversations going on over the desks where the security consultant is almost shrugging his shoulder and saying, "Well, okay, yeah. If you do this and you do this, then I'm happy." Because they've been involved in every story. It's almost a power of four now. You've got dev, test, product owner, and security all involved in the conversation. And again, very positive outcomes.

You know what's coming next as we advance along. We think, well, it's working well with security. We've got test. Well, let's bring the run guys in. Let's co-locate with them.

Again, we've got this model where rather than coming to the last stage and then having to negotiate as to whether we can get something live, we bring them in at the very beginning. They start to understand the technology that we're using.

I think we use Jenkins for continuous delivery. With change requests, we respect separation of concerns. Again, it's something you have to do in a bank. You can't just deploy straight into production. So we have change requests and ops press the button. But they're involved, and they understand the technology. Because they now trust what we're doing, it's a much easier conversation.

Now, looking at that chart, you see that we brought ops in fairly late in the day, perhaps. But what we also had is we had one of my colleagues, he used to work in ops. So essentially, we had an ops proxy in the team. He knew the team very well, and he could wear that hat within the team. I think that was very effective because they trusted us, but they trusted him more, if you like, because they knew that he had done the job.

So I'd invite, in any situation like this, even if you cannot get your ops people sitting next to you, if you're offshore or whatever, if you can have someone in the team wearing that hat and representing their views, I think you get, again, much more successful outcomes, much more reliable.

To give you an example, early on, I remember building a dashboard. I knew that our ops team, they built something using Excel and a database behind it. Honestly, it slightly offended my developer sensibilities. I thought, we can do better than that. Grafana, Prometheus, look at it, it's wonderful.

I showed this to my colleague, and he said, "That looks really good. It's great. Doesn't do what they've got right now."

And I said, "Yeah. But it's this and it's that and it's brilliant."

And he said, "Yeah, it is. But it doesn't do what they have right now."

Again, this is perhaps also a mistake of you not knowing your customer, not really finding out. But I thought I knew what they were using tools for, and I'd built this thing, which I was quite happy to give them. But actually, it wouldn't have done the thing that they'd built this rather ugly process to take care of.

Again, having that person in the team, actually having that conversation up front, meant that we start to avoid that sort of outcome.

Again, it's an emerging theme here. We're moving people further and further to the left. The last stage of that, I suppose, is Barclays Lean Control. This is something that we adopted, I think, last year. It's a process that's been built within Barclays, essentially to move governance closer to the team to support iterative, agile ways of working and sort of DevOps ways of working.

It's focused around long-lived services and products. So rather than projects and initiatives, we have entries for each product, service, or platform. And then we have outcomes along the life cycle of that service. The value stream: you build a service and there is value coming out of that service over time, essentially.

It supports continuous delivery, continuous compliance, continuous engagement with our control professionals. We have control professionals forming groups across the bank in all these areas, in financial crime, security fraud, et cetera.

The way it works, essentially, is that every product, every service has an assigned control member. And we have what's called a control tribe. These people engage with the team, engage very closely, work very closely together.

Again, rather than reaching the last minute or where we sort of almost perform this form of governance theater, where we're ticking boxes and we're making sure things are in place, we're actually talking to these people very early in the process. In this process, they actually raise their concerns and their stories in our backlog. So we have constraint types appearing in Jira that we have to show that we're dealing with, that we have to do something with.

The overall message, I suppose, in terms of what we've been talking about, is basically align everybody, push the conversation to the left. As I say, the technology is not the hard bit. It's not the challenge. It's hard, don't get me wrong. It's not easy stuff. But it's not the thing that's holding us up.

As I say, we had a lot of smart people, a lot of very technical people. But actually getting them, empowering them, getting them to own the outcome and to sort of understand what's going on, I think this is what we had to do. And it's worked very effectively.

In fact, I couldn't resist this. Obviously, we're talking all about DevOps. I think everybody talks about DevOps and what is DevOps. For me, DevOps, it's a cultural thing. It's dev and ops essentially sitting in close contact.

What we've actually got, and in fact, I think, was it last year, DevSecOps appeared, that became a thing. I don't know if DevTestSecOps is going to become a thing. But I think testing is one of the most important things we're doing in cloud. For something to be as reliable as we need it, testing is absolutely key. So I think it fully deserves its place in that conversation.

This is where we are today. We're now two years into this program. And honestly, it feels like we're just beginning to hit our stride. It's taken that long.

I think sometimes or often, this stuff, yeah, agility is good, DevOps is good, but it is hard. It's really, really hard. I think sometimes we all underestimate how hard it is. It has taken two years to really get to the point where we're sort of ready to run, if you like, and it's been two hard years.

You can see by my opinion of the emotions, you can see it sort of going up and down. You can see now we're beginning to climb. We've got the processes in place. We've got the technology in place. We've got the pipelines. We've got the test automation, and we've got the teams.

What's going to be interesting now as we start to approach regulatory sign-off is to see the bank starting to adopt these things and the teams. We've only really enabled the basics at the moment. Amazon's got 100 services. You can't just enable all of them.

So we've enabled the basics, and what's going to be interesting now is we're going to start to iterate, and we're going to start to see what people actually want. And we will start to react to that, and we'll start to react based on what they're doing and the usage.

I think the other interesting thing is for students of organizational psychology, it was pointed out to me a couple of days ago that basically the Kubler-Ross curve, which essentially has got the stages of organizational change, if you like: shock, denial, grief, et cetera, et cetera, acceptance.

I just think that's quite interesting, that is pretty much, not exactly backward causality, but that's how I see things going. It has indeed followed that form where we've sort of started off with a bit of excitement and then rapidly gone downhill and then got ourselves together and gone up, and now we're flying.

As you know, they like us to talk about, I think they use the word exothermic. She used the word exothermic, essentially, to energy. So these are the things I am still, or we are still, struggling with, I suppose.

A cross-functional model works very well for us, but what I'm trying to do now is sell it within our organization, particularly to leadership. If you are leading a function, by talking about cross-functionality, you are loosening those bonds somewhat. You're loosening the vertical, which there's a lot of changes that come with that.

So it's how to sell that to senior management to say, "This is a good thing, and your function will be stronger because of this. We're not breaking up your function. We're actually allowing them to perform their function more effectively."

Secondly, we're in GTIS, 5,000 people. How do we get that message out across a large organization?

There's been very much with infrastructure, physical data centers, there's always been a bit of a perception that perhaps it's not really necessarily appropriate for... You can't be agile with a data center. You can't build a data center in an agile way. It's a building, usually.

But as we move towards software-driven infrastructure and cloud and other forms of infrastructure, how do we get people to start to evolve?

And avoiding the cookie cutter. The thing we've probably learned over the last two years more than anything is that, read a lot of books, seen a lot of presentations, spent a lot of time on YouTube looking at this sort of thing, but you cannot take those things and just apply them and the magic happens. You have to find something that works for your organization, and that's, I think, what we've done over the last two years.

So what worries me is in an organization like us, when in infrastructure we look at what we've done and we say, "Well, okay, we can take that and then we can spread it across." Well, we've just spent two years doing this. You can't just apply it and the magic happens.

So how do we avoid that? How do we help teams to take the principles of what we've done and to apply them in a structured manner, if you like, in a way that works for them?

Thank you.