Improving Developer Experience with a Developer Platform at U.S. Bank
Improving Developer Experience with a Developer Platform at U.S. Bank
Chapters
Full transcript
The complete talk, organized by section.
Ian Eslick
Thank you. All right. Welcome, everyone, and thank you. And thank you, Gene.
So I want to talk a little bit about the bank. We're at U.S. Bank. U.S. Bank is the fifth largest commercial bank in the United States. We've been around for a very, very long time. We have more than 70,000 members. We're headquartered in Minneapolis. We're a national bank. We don't operate in all states, but many states. But we also have strong international operations.
When you look at the functions of a commercial bank, we maintain and support almost half a trillion dollars in assets. We support commercial customers, consumers, we do corporate, we have wealth management, and we have a very robust payment system. And all of this obviously runs on IT. And as you know, banking has evolved dramatically over the last 40 years. Something our CEO said recently is, you've seen more change in the last five than in the last 40.
When we look at the evolution of U.S. Bank, and I just want to say a few words today to set some context for the speakers who will talk about the fun stuff today, we've come a long, long way.
Six years ago, when we started our digital transformation, we moved into a much larger technology transformation that I've had an opportunity to participate in. Along the way, we've been building products that are getting accolades in the industry, that are supporting our customers. We've been bringing in new small companies to create innovative new software products for some of our market segments.
And as we look back at our time here at the DevOps forum, as Gene introduced, we started with a very much design-centric view of how we start to serve our customer segments better. And so our digital transformation is really the start of our journey. We created Experience Studios, which we talked about in an earlier iteration of the DevOps Forum.
I joined the bank just before 2020, so I got to spend a couple of cold months in Minneapolis, followed by a long period of time working from home, as many of you did. COVID-19, though, was a flashpoint for us. There was some time in 2020, I remember, when the sea changed and the CEO decided that we had to really start to think differently about how we, as a bank, approached technology. And it was a wonderful opportunity for me to participate in a growing transformation.
So Werner Loots and I came to DevOps Forum virtually in 2020, I believe, and we talked a little bit about some of the challenges that happen with the way that companies can get structured, how projects are funded, and some of the opportunities that the challenges of 2020 opened up for us.
In 2022, we announced a very large cloud transformation effort, and that kicked off an effort internal to the bank on a number of really interesting pillars. And that led us to this year and some of the work we're going to talk about today.
So just for those of you in the audience who are on the technology side, it's not enough to fix the technology, as all of you know. And I know there have been a lot of really great talks about all of the different dimensions of change. But for us, the things that have really come together over the last few years are the bank's commitment to changing the way that we think about building our products end to end, much more end-to-end ownership, which is a key part of that project-to-product transformation.
But for a regulated industry, you also have to take a crack at how do we think about meeting all of our obligations while also getting the best possible outcomes from our technology teams. And so the company has been fabulous in terms of partnerships across the bank, between our risk and compliance, our security functions, our finance function, to say, if we start from the premise of how do we, end to end, get that tech company outcome while meeting all of the extra obligations that we have to meet, that's the challenge and that's the opportunity.
So we engaged in a broad process re-engineering effort across the bank. And I remember kicking it off saying, if you'd told me two years ago that this would be the most exciting thing we're doing in 2022, process re-engineering, I might have laughed. But it is the unlock in a regulated industry for everything else. And you're going to see some of the pieces of that today with Levi and Antonio.
So one of the things I believe in strongly, as you might expect from my background in Amazon and just having spent most of my career in tech companies, is in platforming. Platforms are a way to create standard ways of solving problems that sublimate a lot of the complexity of the underlying technology away from the people who consume it, right? And so creating that interface and supporting that interface is a different kind of engineering than companies like U.S. Bank typically engage in.
And then the last piece that we really have to work on a lot, which I think I probably underestimated coming from a tech industry where a lot of people knew the things I knew, to an industry that was learning. And so thinking about how you set up and reinforce a team operating model. So if you think end to end, what does good look like? What does a typical day look like where we're going? Because today doesn't look like that. And so you have to kind of give people a faith in tomorrow while you're bringing them along on that journey. And you'll hear a little bit more about that from Levi and Antonio.
So thank you very much for being here today. And Levi, Antonio, please come on up.
Levi Geinert
Thank you, Ian.
So developer platforms are clearly a hot topic. You'll see a few papers that IT Revolution has published on the topic. There are presentations you'll see later today discussing it as well. Definitely recommend you check them out.
Everyone on LinkedIn and X is talking about developer platforms, and even the snarky comment, "DevOps is dead," which I don't think we all agree on that one. But it really is running your platform like a business within a business, which is one of the topics in the paper. The other one is enabling full-stack engineers to reach our fullest potential.
So we are on a journey. We've been building the U.S. Bank Shield platform. It's not Marvel S.H.I.E.L.D. The logo for U.S. Bank is the shield. It is our internal developer platform.
So it is our unified application development platform. It automates our infrastructure services, enabling rapid deployments, repeatable, secure development. It is the defined method that we use to get onto the cloud.
Like we mentioned earlier, and like many of you, we're using cloud, and our existing on-prem capabilities did not fit the need for us to get to cloud. So it's solving teams' needs to fast, easy, and extend the model that allows them to leverage all the benefits of being on the cloud.
Self-service, designed to be easy with comprehensive documentation, reducing cognitive load, making them happier. We're making it extendable so that they can build on top of it. Antonio will be sharing a little bit more about this shortly. Innersource, so that our teams are collaborating across the enterprise to help contribute to the platform.
There are a lot of benefits. Many of you have experienced this, right? Reducing cognitive load comes up at a lot of talks here this year and in the past. It's a big deal. We want our engineers to be happy when they're developing software for us. It's reducing complexity overhead, automating the toil and the manual tasks that all the engineers work through.
Working together on Shield platform allowed us to solve risk and security challenges as a normal part of our engineering processes. And I'll let Antonio tell you more about it.
Antonio Beyah
All right, thanks, Levi. My name's Antonio Beyah. I've been at the bank for about two years now. Platform engineer, trying to make all this stuff come together.
So they talked about the what and the why. I'm going to talk a little bit about the how. To give kind of a high-level overview, there are a couple different components that I'm going to touch on.
One is the Shield console, which our intention for it is to be the one-stop shop for an engineer at the bank. If you need to do something as an engineer, you should be able to go to the Shield console and figure it out. We're leaning on open source software for this, so we're just using Backstage and building a ton of integrations on top of it.
Platform services: this talks primarily about the things that we need to build to make the experience great for the engineers.
Declarative config: we want to have a way for teams to be able to declare what they want, have a thing that we can reconcile against, and ensure that as things are deployed, they stay that way. So we try to create an easy way for teams to be able to do that.
And blueprints: what a blueprint effectively is, is just a collection of approved and pre-vetted patterns and ways to deploy a certain tech stack in a way that checks all the boxes without requiring additional overhead from the engineering team. For my technically inclined people, you would just think it's a simple directed graph, and that's effectively what it boils down to. And we made it a little bit more complicated than that.
All right, so high level. I'm going to talk about a couple foundational elements. I want to primarily focus on CI for the purpose of this talk because there's a lot here.
So when we look at some of the foundational elements, when we got started back in 2021 or beginning of 2022, there were a couple things that were happening. We had a new platform, a new team, new frameworks, and a new language that we were using. So there are a lot of pieces that we had to get in place to get all of this stuff put together.
Some of the foundational elements that we started tackling right away, and it all feeds into the continuous integration, are some things like authentication and authorization. So before, what we had was kind of an ad hoc process for getting access to different tools. We wanted that process to be seamless, and there were already some existing tools at the bank to deal with this. So what we did, as part of the platform re-engineering effort, is we said, how can we center on the concept of a team and have that team authorization flow down through the different systems?
Some of the other foundational elements are simple things like secrets management. So when we start talking about secrets management, when you need access to a system or a tool from the CI system or from your local machine, we want it to be consistent, standard, dynamic, and automated.
One important thing I want to talk about in the continuous integration space is the console output and formatting. One thing that is often overlooked when you're building a CI system is, what is the developer experience when they're actually using it? So we built a set of tools to make sure that we're providing that experience that we want. And we go down to the level of even ensuring that the output that gets printed in the console of the job, we have testing around that to ensure that we're maintaining the experience we want to see.
So the developer console, I had mentioned earlier that this is built on top of Backstage. This is a screenshot. This is a little bit outdated, but it gets the point across.
Some important pieces here are the catalog and the APIs that are on the side, and then the documentation. So getting back to that one-stop shop, the hope is that you can go here, you can see all the things you need to do. Training is a very important one at a bank. If you don't do your training, you lose internet access. Happened to me yesterday.
So how did we do this? We went and we took the problem, and we know that we're using a lot of open source tools. So we built a wrapper on top of it. One of the questions that often comes up is, why would you build a wrapper on top of an existing CLI?
An example of a CLI that we're wrapping is something like the builds where we're building container images or scanning images. And the reason why is illustrated here. So everything that you see in blue are things that the bank requires. The thing that is in red is the actual invocation of the tool itself.
So we wanted to have things like a consistent experience from an authorization perspective. So are you authorized to run this? Can you publish to where you want to push? The off-the-shelf tool doesn't account for any of that. So we allow our wrapper to do all of the things like RBAC, secrets resolution. And then after we actually run the tool, we actually do things like capture evidence, submit it to a store, and we also generate metrics to make sure that our platform is operating correctly.
So everything that you see, we built it to make sure that we're aligning with the way that we want to engineer software.
So this is what it actually looks like in practice. We're using GitLab. What you're seeing here is some GitLab templates, and this is how an engineer would actually consume these modules.
At the top, there's an include statement that an engineer would just go and copy and paste into their repository, put it in their GitLab CI file. If they have an existing repo, they can just drop it in without impacting their existing workflow.
One of the important things was to ensure that we allow the engineering teams to keep their flow control. So we want them to be able to control the steps and the order that the jobs execute. What we care about from a platform perspective is that you ran certain steps. So we move the gates to where they actually belong. So the checks when we're going to prod are very different than the checks when we're going to dev. So we only enforce the checks for prod when you're actually trying to go to prod. Try to keep it as low-friction as possible.
The second box down there is an example of building an image. So there's a lot more flags there that are optional, but in all of our examples, we provide the things that we require you to specify, and you can see from there it's very minimal.
And then we also do things like build out templates to determine where you're targeting. So we have Artifactory, we have Azure Container Registry, we have AWS's container registry. So we want to be able to make it so you can have a template that targets each one of those without you having to have the mental overhead on what to actually do.
These are actually screenshots taken directly out of our repo. So our repo where we're doing all of this work is open to the company, and it's open for innersourcing. And we operate as a platform team the same way we would want any other team to operate, which is allow teams to fork the repository and submit merge requests.
In the list of plugins there, one of the cool things about that is that we actually took those plugins, and that's generated from the actual source code. So all the descriptions and the technology support and the plugins, we actually have CI tests to validate that it actually matches.
This is an example of the CI testing that exists for the pipeline CLI. So starting on the left, we have some build, and you can see there's a step in there. It might be a little bit hard to see, talking about documentation linting and release notes. So we push all of that stuff through a CI/CD process as we're developing this.
On the right-hand side, you can see an example of some of our functional tests. So we have a complicated suite of functional tests that happens on every merge request to ensure that we're still running and working.
So this is an example of some of the declarative configuration. So when you want to go deploy something to Kubernetes, for those who know, it usually takes a lot more than this. So what we've tried to do is build out a couple archetypes, and this is an example of deploying a web application to Kubernetes.
A web application is something we define as something that will be consumed via web browser, which means that we have certain standards for it, and we've captured the things that are required for a web application in this manifest.
Some things I want to point out here are things like health checks and DNS. So we support doing these things declaratively. You say what you want, and we'll make sure that it is that way.
The DNS section specifically, we have some documentation that talks about what happens when you declare that. So as you go from different environments, dev, IT, stage, prod, the naming convention for your DNS is enforced by the platform. So you just say what you want your base DNS to be, and we'll make sure that it exists. We take care of things like certificates, the actual DNS name creation, and all this stuff out of the box.
So when you want to add something, what you would do is you would just go modify this. The image on the right is showing how we would add storage as an example.
So automated documentation. I had mentioned the fact that we're doing auto-documentation where we need a place to publish it. This is where the Shield console comes into play. So all of the documentation that we're writing in our repo gets packaged up and shipped to a centralized location inside of the Shield console.
So this is just a screenshot of some of the documentation that walks people through how to use the platform. At the end of the day, if you can't read the documentation and do what you need to do, that means we have a gap in our documentation. We need to fix it.
So I want to talk a little bit about our current progress. Where we're currently at is we've heavily focused on the CI side to get that experience where we want it to be. We have some support for declarative configuration for deploying both containerized and VM-based applications.
We have a secrets management strategy that allows teams to easily onboard both secrets that they provide and dynamic secrets. So non-human accounts, we have a policy that says you have to rotate them a certain amount of days. We make that easy, and teams can use it out of the box by default.
Regulatory evidence is collected by default. So when we run the plugins, it ships all of the information off to the system to make sure that we can pass all of our audits.
So I'm going to give it back to Levi to wrap it up.
Levi Geinert
Thanks, Antonio. Hope you all were ready for that much engineering talk at 9:00 in the morning, because that was a lot.
So you can't run your platform like a business within a business if you're not asking for customer feedback and working to improve based on that.
Antonio shared some about the automated documentation. So you can see that some of our feedback coming in was that that was a very positive experience. The documentation was helpful, relevant, up to date.
Ian mentioned process re-engineering, and our enterprise architecture has been on a transformation of their own. And we're seeing the benefits of that, which is that a lot of those architecture and governance policy or processes have been improved and streamlined, allowing teams to get products into production sooner.
We have some key accomplishments, right? Antonio shared them. But we've automated and done a lot of design documentation. Our infrastructure is much more robust than it was, like we mentioned about our on-prem hardware, and redesigned our pipeline.
The big thing for us is making it easier for teams to build with risk and security by default, right? Someone mentioned it on the screen: reduce friction for the good paths. And that's what we've accomplished.
So we've faced a lot of challenges. We have challenges remaining. What does it take to build a platform team? What does it take to make an ops team into a platform team? Those are challenges.
Improving our engineering culture, having a bigger focus on testing and operations. Moving from scripting to development. Building while operating is always a challenge. We all know this. How much tech debt is too much?
We need to operationalize our tools. We don't want to conflate a tool with a strategy. Think Terraform.
We must constantly focus on simplification. That includes simplifying our technology stack.
Upskilling the organization to be able to benefit from all of this work is the challenge. We've enlisted champion programs, right? Volunteers helping train and certify others around cloud usage and ways of working.
And then you can't accomplish that without providing guidance and support to your teams.
Our next challenge: enabling adoption at scale. We mentioned innersourcing. Someone mentioned, how do they have time to do that? Those are challenges that you'll likely face.
And for you to have adoption at scale, you really do need to have it be extensible and easy to use.
So if any of this was of interest to you, please check us out. The QR code is shared, and hope you have a wonderful day. Thank you.