Creating Enterprise Data Ownership at Vanguard
Creating Enterprise Data Ownership at Vanguard
Chapters
Full transcript
The complete talk, organized by section.
Brian Charles — Open
Good afternoon and thank you for choosing to be in this breakout with us. We are very thrilled to see you, 'cause that means you're excited about data just like we are.
So with brief introduction — my name's Brian Charles, I'm a senior manager in our investment data space at Vanguard. And with me today…
Steve Brosius
I'm Steve Brosius, the Senior Solutions Architect at Vanguard's Investment Data and Analytics team.
Alexa Cain
And Alexa Cain, an Investment Data Architect.
Brian Charles
Alright, so with everyone introduced, let's dance.
Alexa Cain
Whoa, whoa, Brian. I know that we are in Vegas, but it's 2:00 PM — hard to tell without the clocks on the wall, but it's two. And we are at an Enterprise Technology Leadership Summit. Aren't you going to talk with us about creating enterprise data products?
So if that is what you're wondering out there, don't worry — you are still at the right breakout session. There is a reason though why music is key to how we think about data product management.
Speaking of great music and great shows — did anyone happen to see U2 when they opened the Sphere?
Brian Charles
Yeah, as a huge U2 fan I'm kind of bummed I didn't get to go to that, Alexa. But I did get to check out Taylor Swift's Eras tour and that was so good. I'm actually going to go again this fall. But don't take my word for how good that tour was — check this out.
Alexa Cain — 'Data is like a song'
Alright, now let's talk about data. Take a look at some of these Google reviews for your favorite performances. These are amazing performances with overwhelmingly positive reviews. U2 and Beyoncé — 97%. Taylor Swift — 99% positive reviews across thousands of reviewers.
Brian Charles
By the way, if you're one of the 1% that left the negative review for the Eras tour, please come and speak with Brian after the presentation.
Alexa Cain
But this is what we're all working towards, isn't it? To create something that leaves our audiences delighted. And that's exactly what got us thinking. Data is like a song. We can delight audiences too.
There's a lot we can learn from music. Just writing a song is hard work. Have you ever stared down a blank page and tried to create something? It can be hard just to get started. You have to write the lyrics, the melodies, harmonies, the rhythm — to say nothing about coordinating across an entire band. It is a huge accomplishment just to finish writing a single song.
But even with all of that hard work and that huge accomplishment, it's not actually enough to get that song to an audience. You have to produce the song, make it consumable — maybe via streaming or physical merchandise. Most fans are going to be happy using their favorite streaming service, but we know that's never going to satisfy the dedicated vinyl collectors out there. You have to promote the music through social media and other advertising so people know it's even out there to begin with.
You don't have to be an international sensation selling out stadiums, opening for the Sphere — but even thinking about basic music management can be overwhelming. And data shares that same story.
Just obtaining data is a lot of hard work. You have to acquire the data from a vendor or produce it via your business processes. You have to store it, maintain it, make sure it suits your business needs — to say nothing about coordinating across product teams. It is a huge accomplishment just to onboard a single dataset to your company.
But even with all of that hard work and that huge accomplishment — and it is a huge accomplishment — it's still not enough to get that data to a wide audience of consumers. After you produce the data, you have to make it consumable in many different formats and access mechanisms to suit your consumer needs. Most clients are going to be pretty happy with the basic REST API, but we know for a fact that's not going to satisfy your most dedicated analytics power users. And you have to catalog that data, promote the data, to let consumers know it's even out there.
So if data is like a song, a data product is a lot like a musical career. You can have data without that data being a data product. You can have a song without having an entire musical career. But when you want to delight the widest audience, you need to be deliberate with how you manage what you create. You need to apply product thinking. That deliberate product thinking applied to a dataset is what turns a dataset into a data product. And we have audiences across Vanguard clamoring for them.
So how do we do it? How do we go from data to data product? How do we go from having a single song to being an international sensation? You need to be deliberate.
Let's go back to music for just a moment. Those popular international artists are deliberate in how they develop their songs and their brand. U2, the Beatles, Beyoncé, the Rolling Stones — their success, those wildly positive metrics we showed earlier — their success isn't accidental. It's by design. The storytelling, the relatable themes, the emotional authenticity, the catchy melodies and hooks — those factors are woven into all of those artists' songs, with consistent delivery, and people love it. That's what makes these artists appealing and accessible to a wide variety of audiences across the globe.
So where are we going with this? How do we make data products more appealing and accessible to our data consumers? How do we delight our audience? How do we unlock the power of one of our most important assets — data?
01Success by design — the data product wheel
We have to be just as deliberate with how we design, develop, and maintain our data products as musical artists are with theirs. The factors for a successful data product do look a little bit different than a successful song, but I suspect they're not a surprise to the audience here: discoverable, high-quality, usable data. That's what clients want.
At Vanguard, we've introduced a Data Product Maturity Model to guide our data product teams. Teams retain the autonomy to build their products to their clients' needs, but with the benefit of a consistent set of standards to ensure we're moving from dataset to data product together.
The Data Product Maturity Model is just one tool in our data product owner's toolbox. The model captures multiple levels of data product maturity, so it can guide a team from onboarding a brand new, maybe simple data product, to being one of the most widely-used complex data products in our organization. The model gives a framework for a team and for their data product to grow with its audience across its lifetime.
We're going to dive into a specific use case at Vanguard, but first, just a little bit more about our roles.
Brian Charles — Who is Vanguard? Who is GIDM?
In the other sessions this week you heard from Mike and Dalin about what Vanguard does. So we thought it'd be fun to just throw some numbers up on the screen that represent some data about Vanguard.
- 50M+ investors globally - $9.5 trillion assets under management - $200B cashflow in 2023 - 423 funds offered globally - 20K global employees - 18 locations across the globe
There are a lot of big ones up there, and a couple of them have been taken on a roller-coaster ride over the last two weeks with the markets. But the one that I want to focus on is that lineup of 423 global products you see. To offer a mutual fund or an ETF to our investors every day, we have to churn through a lot of data. And that's where Alexa, Steve, and I come in.
We work in the Global Investment Data Management function at Vanguard. That's a lot to say, so for the rest of this talk I'm just going to shorten that down to GIDM. GIDM's role at Vanguard is to ensure that the investment data that's needed to manage and support all of those products is available to our consumers every single day. It's our mission statement: make high-quality, fit-for-purpose data accessible and timely to our entire firm.
To manage those 423 products, we have to store millions of data attributes every day. We source over 70 million securities that contain the details about the assets in each one of those products. We also bring in hundreds of benchmarks from our global providers.
Our data fuels investment management, portfolio management and trading, data operations, global investors, data disseminators, and our regulatory bodies across the world. So we have to make sure that data is up and available 24×7. And we do that today across a very large mix of legacy and modern technology platforms. And that's where things start to really get challenging for us.
02The case for data product maturity — looking back
When we look back at the last decade, Vanguard's growth was incredible. We had completed a global expansion. Our assets under management tripled to that $9.5 trillion number on the prior slide. So we needed to scale the investment data that was powering that growth — and fast — to help keep costs low and give our investors the best chance for investment success.
Anytime you need to scale, that leads to some very large and targeted programs. And with targeted programs come tactical solutions, particularly for data.
If you aren't familiar with asset management, it's okay for this next part — just understand that there are two major different types of assets out there: equity and fixed income.
For one of these tactical solutions, GIDM built a data hub to do equity data management — to manage equity index and equity reference data. But this solution lacked a broader, longer-term data strategy. That data hub was a program objective, and it was designed to solve very specific equity business needs. We weren't really thinking about the next business need that would come along later.
Just a few years down the road, a new program emerges that requires fixed income. And it would probably surprise none of you here that the equity data hub didn't have fixed income assets inside of it — or even a modern way to get data out of it at all. That was something the equity hub just simply wasn't ready for. We had to go refactor and rebuild the entire on-prem vendor architecture to make that happen.
We also didn't understand at the time how many consumers would emerge from across Vanguard looking for managed investment data. Since that system was embedded so heavily in the equity workflow, onboarding new consumers was quite challenging without jeopardizing some of the day-to-day equity operations.
It's not that we enjoyed saying no and turning people who need data away — it was simply too risky to the equity business. We couldn't jeopardize the savings that clients entrusted in us.
When we finally came up with creative ways of mitigating these risks, getting data to consumers the way that they needed it and in the formats they desired was challenging and never timely. So really over the entirety of this decade, investment data was always treated as the byproduct of systems implementations.
03Critic reviews for investment data
Unlike all those positive reviewers that attended those concerts, we were dealing with very frustrated consumers and us as data producers feeling we were very inadequate. As data producers we felt we did a fantastic job of the day-to-day — our SLAs were often met and our data quality was very high. But over the last few years when it came to the expanding desires for analytics, that's when we just weren't well positioned to meet those needs.
If you are a data consumer looking for analytics, these are some of the things you might have been caught saying: - I can't find the data I'm looking for. - How do I join this data together with other datasets? The identifiers don't match. - The quality of this data isn't good. - It took way too long for me to get access to that data. Brian, Steve — 11 months, come on. - I need a different format than what you're offering.
For consumers looking to work with historical data, the problems got even worse. Historical data was often contained in our system tables where consumers couldn't get access to it, and in many cases the data was incomplete or insufficient for their use cases. We bolted on solutions over the years with attempts at saving off data for analytics, but none of those solutions were really enduring. We simply didn't have the foresight about the volume and the variety to do on-demand insights with data. So in the eyes of every consumer, investment data became the impediment.
That was a lot of bad feedback for our group to stomach on a daily basis. So we needed to flip the script and build a better reputation for investment data.
04Building a better reputation for investment data
In 2020 we began a modernization journey to retire those legacy platforms. Building on AWS allowed us to move a lot faster and make data available in many new ways and formats. We got all of our durable product teams started on modernizing our data pipelines.
But a few years into the effort we started to see those same patterns from the past happening again. But now in AWS we made all of our data available in one very specific format. We built rigid APIs that consumers couldn't customize. So while the tech was modern, our data strategy was still lacking.
That led to a realization that we needed to hold data to a new standard. So GIDM is taking this as an opportunity to reset how we structure data — and that reset is treating data as a product and building that roadmap to data product maturity.
Steve, why don't you tell the audience how we're doing making progress on that journey?
Steve Brosius — The Data Product Maturity Model
Thank you, Brian. Fortunately for us, cloud development has freed us up from the on-prem constraints we once had. We went from data constrained by monoliths to a microservices architecture, leveraging fit-for-purpose data stores in a broader data lake.
And as we learned from the past, one size doesn't fit all. So our data products are catered to both operational and analytic use cases. When thinking about data as a product, just like a song, you make it available in multiple formats that suit the consumer use case no matter what they're trying to do. For example, we keep data in raw format and more refined formats like Avro and Parquet. And for the high-performance analytical use cases, we have Iceberg tables, REST and GraphQL APIs for data distribution. We expect more formats and standards to emerge in the future.
And this is why we built the data maturity model. By defining how we want to mature our data products, we now have a consistent bar to measure the lifecycle of a product to ensure they are findable, joinable, and accessible. The goal is highly composable data products.
This is because our data products are like building blocks — or should I say, musical instruments. You can use a single musical instrument, but to create the best listening experience you need to join those instruments together in complete harmony. And data is the same way. A portfolio manager at Vanguard requires security reference data, holdings data, corporate actions, prices, and index data — just to name a few — all working together. These data products are the foundation of what goes into giving you the industry's best-performing mutual funds and ETFs.
Our data model ensures that GIDM is providing ways for our consumers to put these building blocks together. Like every good artist's musical evolution, we expect this maturity model to evolve as we learn more.
05Why a custom maturity model?
So how do you go about creating a data maturity model? Or maybe you're asking yourself, why would I want to create a data maturity model when there are industry models already out there?
As we started building different types of products in our data lake, we identified the need for a consistent consumer experience. We referenced some of those industry data maturity models, but they didn't quite square to what we were trying to do with data as a product. We knew we needed a custom framework. So we started to build the custom framework to help guide data product owners. We began socializing these ideas informally, and then formally across the enterprise. As it turns out, other parts of the organization were thinking along the same lines. After much discussion, the pillars, the attributes, and the levels emerged.
Pillars like Ownership — deep domain expertise of the data product with end-to-end stewardship and Governance. Pillars like provisioning, multimodal formats to meet operational and analytic use cases. We're reshaping the same data to improve the consumer experience. We'd like to put an end to data wrangling so that consumers can have the data they want in the shape and format they need it. And infrastructure attributes like observability and data management, scalability and resiliency.
Before we knew it, the organization was talking about treating data as a product and how they could start to leverage the data maturity model.
06Three levels of maturity — Foundational, Enhanced, Mature
Initially we decided on three levels of maturity to help think about continuous improvement, starting with foundational capabilities.
- Foundational — empowered product owner responsible for business metadata, data quality, and governance; with basic formats and observability available; governed access; operational observability; raw data format. - Enhanced — adds traceability, connected metadata, joinable products, self-service capability; consumer data events and notifications; available in a marketplace; analytic-friendly data formats. - Mature — provides advanced capabilities like data streaming, historical time-series data, consolidated operational data management, and global resiliency; high-performance analytics.
With all of our data products in various stages of maturity, we are well on our way in this journey in GIDM.
07Results — critic reviews for GIDM data products
As a result, we're making it easier for consumers to find the data they need. Initiatives like birthright access have made it faster for consumers to get their hands on data. And we're working really hard at delighting our consumers. For the more mature data products, we are seeing very positive consumer feedback: - 'Weeks to minutes improvement on speed of data access' — Christine Chang, Portfolio Manager - 'Solves SQL problems and is significantly quicker!' — Amy Nichols, Data Analyst - 'Gold Star for Jupyter Notebook export' — Fred Patel, Data Scientist
The scoreboard: Producer score 90%, Consumer score 87% (up from 50% / 13% before).
Brian Charles
All right, wait a minute — I need to interrupt you, Steve. There is absolutely no way that everyone at Vanguard is now enamored with our investment data, right?
Steve Brosius
Not yet, Brian. Our consumers still want more metadata. They want to know if the data's from a trusted source. And for data quality, they need to know what checks have been run on the data.
08Where are we going next? — the Data Explorer marketplace
So how do you bring all this together to unlock those business insights?
Well, we all understand the value of a great online experience. Think of your favorite music streaming service. Features that excite us as consumers are things like great search, helpful product ratings, recommendations, accurate descriptions and previews.
So we are building an internal data marketplace to make data more accessible and usable throughout the investment data community. - A marketplace that allows consumers to search and discover the location of a data product and preview the dataset to ensure it's the data they need. - A marketplace that allows data producers to catalog and connect business and technical metadata and publish their products to a central location. - A marketplace that offers a self-service feature for consumers to request access, complete with automated approvals.
But this marketplace we're building has huge ambition — a state-of-the-art experience for data products with the goal of delighting consumers. Our marketplace that you see here is called Data Explorer.
09The help we're looking for — data mesh, knowledge graphs, semantic layers
We're also expanding into new technologies to help improve the client experience. Some of the areas we're actively working on include data mesh and the data marketplace. Our data mesh architecture is inspired by Zhamak Dehghani's book on data mesh. If you recall from that writing, the hardest part of building out that mesh architecture is creating an effective self-service data marketplace.
Treating data as a product is a fairly new concept in the last few years. We appreciate that some of you may also be on this journey. So we would love to hear your custom implementation stories about how you build a data mesh or a data marketplace for your organization.
Additionally, if you build a knowledge graph or a semantic layer — think of a canonical data model with traceability and data usage — please come and talk with us after. And bonus points for a knowledge-graph-powered RAG system.
10Close
So with our asks out there, we'd just like to close down this breakout by saying a huge thank you to Gene for inviting us to come share our story with you. Since we do have a few minutes before the break, we'll hand in our mics and Steve, Alexa and I will go hang out if anybody has any questions they'd like to talk with us about. All right. Really appreciate it all. Thank you.