DevOps @ Fidelity - Investing in Inner Source and Engineering Excellence
Fidelity Investments is a privately owned company, one of the world’s largest and most diversified financial service providers. Starting in 1946, Fidelity Investments has been a company deeply invested in technology and using technology to deliver services to its customers. Fidelity embraced DevOps practices very early on around 2014. In 2016, Fidelity successfully deployed an application on the public cloud for the first time. Today, about 50% of Fidelity’s applications run on public cloud. The goal is to reach 80% cloud adoption by the end of 2024.
In this experience report, I will present Fidelity’s DevOps journey over the years, the challenges faced and what is being done to address those. Over the past about years and a half, there has been significant focus around increased collaboration, modernizing technology, improving developer experience and engineering practices while being secured and compliant.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
[00:00:13.150] I met the next speaker, Dr. Tapabrata Pal, at the IBM Innovate conference in 2014. And holy cow, did he make an impression on me? He was the first distinguished engineer at Capital One. That's the most senior individual contributor position at that time. Everyone pointed to him as a person who was leading the technology modernization movement there.
[00:00:35.800] Topo was also one of our first speakers at DevOps Enterprise Summit in 2014. It was one of the most shocking presentations that year because it was an actual bank talking about doing DevOps-y things, which at the time, as Maya mentioned yesterday, seemed absolutely unthinkable.
[00:00:51.990] Over the years, Capital One later declared itself as a technology company with a strategy of being cloud first and open source first. In March 2021, Topo joined Fidelity Investments as their VP of architecture.
[00:01:05.590] Fidelity was founded in 1946 and recently celebrated its 75th birthday. It is one of the largest asset management firms, with over $9.9 trillion under management. According to Wikipedia, Fidelity also manages the largest non-indexed fund in the US, with over $100 billion in assets. In 2021 they had $24 billion in revenue and over 70,000 employees. Here's Topo to talk about his new adventure at Fidelity.
Dr. Tapabrata 'Topo' Pal
[00:01:39.060] Thank you, Gene. We are down to the last two, and I'm one of them. I am one of the two people who are standing between you and a beautiful Amsterdam evening or afternoon. For the next 20 minutes, I'll be talking about DevOps at Fidelity.
[00:02:01.640] This is a disclaimer. I'm going to pause for a moment and let you read that. All right.
[00:02:09.230] Besides what Gene told you about me, I still have something to say about myself. I'm Topo Pal at Fidelity Investments, one of the vice presidents in the Enterprise Architecture group. My focus is DevOps domain architecture, which is essentially looking at DevOps adoption and practices across the enterprise. I also run the open source program office.
[00:02:37.930] Essentially, I'm a developer, engineer, DevOps enthusiast, and I also wrote a book. You might have gotten a signed copy of that yesterday, so thank you very much. Before this, I was at Capital One for 10 years. I joined Fidelity a little over two years ago; before that I was at Capital One for 10 years, and I was deeply engaged in Capital One's DevOps transformation, hands on keyboard, writing PowerPoint deck--no, just source code. A lot of coding.
[00:03:11.820] One fine morning I got a note from LinkedIn: hey, congratulations on your 10-year anniversary. I'm like, that's way too long for one particular company. So I decided to look. When I started looking out for new opportunities--not because I had to, I just wanted to because LinkedIn said so--I was looking for two things. One is it has to be culturally solid, open culture, and it has to be technically inclined.
[00:03:44.220] During my second interview with Fidelity, I asked HR, what is Fidelity culture? She sent me this. Culture portrays who we are and what we aspire to be. We know that happy people are more productive, creative, and produce better customer outcomes. We are here to make a difference by delivering quality to our customers daily. We believe in leadership at all levels, and that titles don't determine what you can and cannot do.
[00:04:22.910] Even today, when I'm in a meeting with a bunch of other people and some of them I don't know, I cannot tell their level looking at their name or the HR system. It is an open conversation. As long as you can debate, discuss, argue about things in a positive manner, everything goes, which is fantastic. The last thing--no roles, only role models--floored me. I decided to join Fidelity with that.
[00:04:53.400] Let me talk about Fidelity Investments a little bit. World's largest and most diversified financial service provider. Founded in 1946, and in 2022 we completed 75 years. During that time, we grew a lot. Today we support 25,000 business customers that trust us with their money and their investments, about 40 million individual investors, $9.9 trillion under administration, $3.7 trillion discretionary assets, and 2.88 million average daily trades. That's huge scale.
[00:05:39.070] Fidelity has always been a technology leader in its business domain, starting from 1979: Fidelity Information Phone, Fidelity Automated Service Telephone, online space, Fidelity homepage on the internet in 1996, when people were actually posting cat pictures on the internet. We had a homepage that allowed people to look up their 401(k) transactions and assets. Fidelity invests a lot of time and energy in developing new cool things around AI, ML, and digital assets.
[00:06:25.720] We have about 18,000 tech associates. These numbers are based on 2022 numbers. We are across nine countries, and total tech budget is $2.5 billion.
[00:06:43.240] Let me tell you where I belong in Fidelity. Fidelity has about 15 business units, and then there are shared services or enterprise units, about five. One of them is enterprise architecture, and that's where I belong. With that structure, it's a complex structure. What technology do we use? Everything. I used to ask people, do you use this? Some technology was in front of me and I was just looking at the name; I couldn't tell what it is. So I asked who uses that? I stopped asking that. I say, okay, show me the usage of it, and they'll actually show it.
[00:07:26.780] Fidelity is not new to DevOps and cloud. In 2016, Fidelity had its first cloud deployment. In 2017, the enterprise Agile kickoff happened. In 2022, 50% of applications are on cloud, and our goal is to have 70% of our applications on cloud by 2024. Fidelity's cloud infrastructure and strategy are complex enough. We are multi-cloud because of various business needs, and we have private cloud and data centers. That's not going to change in the very near future.
[00:08:06.180] The progress up to 2022: 50% of our applications are on cloud, and we have about 1,600 Agile squads. I already talked about 18,000 developers. Here's the improvement that has happened over the years through the DevOps and cloud journey: business-impacted incidents have gone down significantly; change-induced incidents have gone down significantly; MTTR has gone down significantly; at the same time, the number of changes went up significantly.
[00:08:44.110] So the question is, how do you go to the next level? What are the challenges, and what are we doing about that? During my first month at Fidelity, I interviewed all kinds of people across the enterprise: CIOs, architects, developers, risk officers, auditors, security folks. I asked them three questions. What is going good? What is not going good? What would you like to see changed?
[00:09:12.300] Based on that interview over the period of one month, I created sticky notes. I stuck them on my wall in my home office, went through them, grouped them together, and created a set of challenges in these four groups.
[00:09:32.260] Number one is developer experience. Onboarding takes a long time. Documentation is a problem. Not that there's not enough documentation; actually, there are too many of them. Context switching is a big problem for developers between one context to another. There is a cloud onboarding, there is a DevOps tools onboarding, there are other onboarding processes.
[00:09:53.670] Tools sprawl: I think this is everybody's problem here. Raise your hand if it is not. There are too many tools and technologies and platforms, and there are variations. Modernization is a problem.
[00:10:06.740] Security, audit, compliance shifted right: late feedback that causes a lot of friction between developers and the rest of the enterprise. Ownership is a problem: ownership of security vulnerabilities, who owns them, who is responsible.
[00:10:22.250] Metrics: there are a lot of metrics around and a lot of techniques to create those metrics, and many, many maturity models. With this, I presented to the CIO councils and basically said we need to do something about it. The idea was to get everybody together, and that's what we did over the next six months.
[00:10:43.360] That was 2021, and we created something called the DevOps Council. The structure of the DevOps Council looks like this. There are business unit representations from every business unit. The DevOps tools platform engineers are there. The architecture software engineers, SREs, risk, audit, compliance officers, and cybersecurity team are there. There are CIO sponsors, and I am one of the co-chairs of the DevOps Council.
[00:11:12.010] The DevOps Council, depending upon what problem it tries to solve, creates these working groups underneath it: small working groups, small problem, get onto the problem, find a solution, come back with the solution. Then as the DevOps Council, we approve the solution or we debate more about it.
[00:11:30.200] With this, I go back to the challenges and we discuss these challenges in the DevOps Council to figure out how to solve this problem. These are the four big patterns that we created. Number one, we need unified developer experience. We need tools standardization. We need continuous compliance. We need contextual metrics.
[00:11:51.470] I'm going to talk about these things in my next slides except unified developer experience, because that itself has become a big program across the enterprise. It's not easy to solve unified developer experience all across. The things that I'm going to talk about are tool standardization, continuous compliance, and contextual metrics.
[00:12:15.900] The first is tools sprawl. As I said, it's a big problem for all of us. In my past life, I may have counted so many tools across the enterprise and had gone into such a friction-full tools discussion that I would like to stay away from it for the rest of my life. There are solutions that existed before we started. There are solutions that have gone into production during my time, and there are solutions that are coming on. There is friction all across.
[00:12:50.290] What we decided is that we are going to create standardized tools, standardized pipelines, and automated onboarding and setup. Standardized pipelines is because of this: in my past life, I counted one company--I'm not going to name them--had 57 pipelines, and they all did the same thing. Why do I need 57? If you have a Java application and you are doing Maven or Gradle build and deploy to your Kubernetes cluster, I think there should be only one pipeline.
[00:13:20.570] Based on technology stacks, we are trying to create standardized pipelines for all these technology stacks. The question is, are the developers going to use it? We have to see. But when the developers push back, we ask one question: what different thing do you want other than this? If you have something good, try to contribute in the same pipeline so everybody else can use that. Automated onboarding and setup goes back to the developer experience point of view.
[00:13:49.250] To create a better developer experience all around, security, audit, compliance must be embedded. We aim for continuous compliance. We want to shift left security. Security scanning, whether SAST or ACA or open source scanning, are embedded into the pipeline and in the source code. As soon as a PR or pull request is opened against a Git repository, the scans run immediately and feed back to the developer so that they can make a decision as to whether to merge that pull request or not.
[00:14:26.800] Preconfigured, mandated gates in the standard pipeline: that is one purpose of standardizing the pipeline. We want to mandate certain gates within the pipeline to make or break the pipeline depending upon some criteria. We are going to talk about that criteria in a later stage. Pipeline execution data collection: we are streaming out all this pipeline execution data in an evidence store so that we can track which pipeline is running which stage and what the results are for that particular stage.
[00:15:02.390] Aiming for continuous compliance, these are the rules we are going to implement, and we are actually in the process of implementing that. Rule number one: no persistent production access to anyone, period. Not even ops. Changes in production only via a pipeline, which basically means everything is source code. What that is going to do is force us to do every change peer reviewed before it gets to production, and real-time audit of every change via pipeline evidence store that I talked about before.
[00:15:41.930] This audit and compliance team and the developers had a friction-full life. Imagine yourself: you have a release button. When it is clicked, it releases software to production. The question from audit and compliance side is, who clicks that button? Everybody is happy; the developers are not happy. The release and compliance teams are happy as long as the developer who actually coded that feature, designed that feature, architected that feature, and knows all about that feature, does not click that button. They're all happy except the person who clicks that button, because that person has no clue what that button does.
[00:16:26.490] In that scenario, the only way to solve the problem is actually not have anybody click that button. The button gets activated by the evidence that is collected from the pipeline execution. Once that button is activated, it doesn't matter who clicked it. That's the model that we are going after.
[00:16:44.770] The next is contextual metrics: measure what matters. The evidence data is also going to give us the metrics, or the data behind the metrics, that we want to care about. Measurement: we want to go back to the basics of measurements instead of embracing DORA metrics or any other metrics. You can talk to me offline after and I can tell you my experience around all these different frameworks or metrics.
[00:17:13.780] We want to go back to basics and ask ourselves: why do you want to measure something? Who do you want to measure--an application team, an individual, an organization, a business unit, a particular product? When do you want to measure it? Do we want to measure a measured application or do you want to measure the application that is being constantly developed and changed? What are you going to measure? Are you going to measure code commit frequency, delivery frequency, deployment frequency, or what?
[00:17:43.220] How are we going to measure? This is important because if we want to do deployment frequency measurements, somebody needs to define what deployment means. Deployment of an application--next question is, what is an application? All these questions come up. Unless there's a solid definition of how the measurement is going to take place, the unit of measurement, there's no way to compare two teams or two products or two applications or two business units whatsoever.
[00:18:12.610] We'll need to avoid the observer effect. By way of observing something, we sometimes impact the system so that the observation by itself is wrong. For example, if you want to measure commit frequency, as a developer I'm going to work around the system. I'm going to fool the system to get my numbers up. That's not a good thing. Measure to improve something.
[00:18:31.420] What we decided is we want to write down what we want to measure just like an Agile story: as something or someone, I want to measure something so that something. This is an example: as a product owner, I want to measure team's production access frequency so that I can understand the team's technical maturity and help improve it.
[00:18:58.710] We are not going to do it all by a central team or something. We are investing in inner source. We have developed an InnerSource model within Fidelity, and it looks just like any other good open source project: incubate, fund, govern, manage, promote, and operate.
[00:19:17.830] With this, in two years we have five total InnerSource projects. In one such InnerSource project, which is our pipeline libraries and pipeline catalog that we are creating, we had until 2022 174 features delivered, 3,500 commits done by 12 business units, and about 50 active contributors.
[00:19:42.710] The help I need is: if you are using any DevOps metrics, I want to learn what are you using and how are they doing, or how are they helping you. Number two, if you are invested in inner sourcing, I want to learn more about your model, because we do have some struggle in scaling that model up, any processes that you have or any challenges that you have faced with that. Thank you very much. Have a good rest of the evening.
[00:20:13.540] Thank you, Topo.