Adaptability by Design: Decoupling Dependencies to Drive Client Value at Vanguard
Building for re-use and scale can be challenging in complex systems. At Vanguard, we've been able to achieve those goals by insisting on adaptability from the outset. By anchoring to consumer-agnostic design in a distributed micro-service architecture, and by building collaborative relationships with business partners, we've succeeded in building the right products, with the right controls, that empower our partners to drive value for our investors.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
The next speakers embody one of the most powerful learnings that I learned helping when I was co-authoring the book Wiring the Winning Organization with Dr. Steven Spear. The big aha moment for me — and the vocabulary introduced — was the notion of independence of action. One of the best things leaders can do is enable teams to work independently of each other without a lot of communication, coordination, synchronization, deconfliction, etc.
We saw the experience report from Captain Andy Biehn and Jim Juster about how they used modularity to decouple elements in the AEGIS missile system, which enabled the components to evolve independently of each other. Austin Puckett from CLEAR talked about how he enabled independence of action for the business process owners, so that the lane managers in the airports could make decisions independently of the technology teams — and that's what enabled them to upgrade 11 million members over four months.
So it turns out microservices is only one way to create independence of action. These mechanisms can come in very surprising forms, which is the reason why I was so blown away when I saw a specific talk at the Vanguard Technology Conference. Colleen Evans is Senior Manager of Product Management and Denton Burnell is an IT department head. They'll be talking about decoupling dependencies in a very surprising area, which is in batch jobs.
One of their goals was to reduce the blast radius of large batch jobs, which can cause massive downstream impacts — cascading batch failures, wrong data in critical business processes. Anyone who enjoyed Mike Nygard's talk on data being the land that DevOps forgot, you are going to love this presentation. So here's Colleen and Denton talking about independence of action.
Denton Burnell
Thank you Gene. We too are incredibly excited to be here with you today to acquaint you with the work that we've been doing at Vanguard to help empower our internal partners at scale, enabling them to deliver for our clients. We'll touch on the nature of those partnerships and how they've led to strong collaboration and also adaptable designs that empower a scalable consumer-agnostic approach.
But introductions first. I'm Denton Burnell at Vanguard. I lead a number of development teams that support our core advice-related products. Think of things like portfolio rebalance engines, glide path services, and a product you're going to hear quite a bit more about today — the suite of microservices that power our daily batch process.
Colleen Evans
And I'm Colleen Evans. I'm the senior product manager for that daily batch service Denton just mentioned. We call it daily portfolio management, or DPM. My team of product managers partners extremely closely with Denton's teams to bring our portfolio construction products to life and make sure we're meeting the needs of our clients and consumers.
Before we dive into the meat of our presentation, we thought it would be important once again to anchor you to what makes us tick. As you heard from our CTO Mike Carr on Tuesday and from a number of our other colleagues earlier this week, every decision that we make at Vanguard is informed and guided by our core mission, namely to take a stand for all investors, to treat them fairly, and to give them the best chance for investment success. The word that jumps out at me in our mission is the word all — because at Vanguard, we're not just concerned with the clients that invest directly with us. Hopefully we're building an industry-wide environment where all investors have a chance at investment success.
Now, as you think about product at Vanguard, you probably think about low-cost investing or our index funds. But if you caught Mike Carr's talk on Tuesday, you might've heard about another engine as well: advice. In order to provide investors the best chance for investment success, we offer trusted advice and perspectives to those clients who want or need help achieving their goals. We're making advice accessible to all clients, lowering its cost, enhancing the experience, and improving outcomes for more individual investors than ever before.
There are many different flavors of advice that a client can experience at Vanguard, and these range from all-digital to higher-touch hybrid offers and everywhere in between. The products and services that Denton and I own enable aspects of each.
Denton Burnell
Enterprise Advice, the organization that Colleen and I support at Vanguard, was formed with the express purpose of building scalable, consumer-agnostic, reusable services that enable us to achieve our advice goals across multiple partners and offers. Just as Vanguard has a core mission, Enterprise Advice also has a strategic purpose aligned to that mission, namely to empower millions of global investors with trusted advice. Combined with our values of collaboration, innovation, and client focus, we're working hard every day to build scalable, reusable services in support of that mission.
Colleen Evans
So what was the big problem we were trying to solve with DPM? Every day we need to make sure that hundreds of thousands of advised clients with millions of accounts and billions in assets under management are able to receive the advice they need to stay aligned to their financial plans and on track for success.
The purpose of daily portfolio management is to generate the investment recommendations for what to buy and sell in advised clients' portfolios. We do this by identifying the population of advised clients, then determining which of those clients are eligible for rebalance. This can be due to multiple reasons — for instance, we might see an opportunity to harvest losses for that client, or potentially we see that that client's portfolio has drifted away from their target asset allocation. We take this population of eligible clients, generate recommendations for them, validate those recommendations, and then submit the recommendations to our partners in trading to execute trades on.
We mentioned there are multiple advice offers, and those with advisors do have the ability to generate recommendations and submit trades throughout the day. But our big Super Bowl event is our morning batch run — except of course the Super Bowl happens once a year, and our batch run happens every day.
We're all trying to build reusable technology here. But when it comes to clients' investment recommendations, the bar's high. We have to be 100% accurate, 100% of the time, to make sure we're doing what's in the best interest of our clients. And it's not just our products we're concerned about — we work within a complex ecosystem of partners and teams and need to be able to work together effectively to make sure we're driving value for our clients.
There are really three pillars that occur every single morning. The first is data collection — making sure that we have the data we need to calculate accurate recommendations. That's owned by our partners and their teams. The second pillar is the investment recommendation generation. That's the scope of DPM, and that's owned by us. And then finally there's trading — again, that's owned by partners and their teams. If any one team supporting any of those pillars has a bad day, we all have a bad day. So we all make sure that we're focused on our collective ultimate goal: getting accurate trades out the door for our clients.
So knowing that we're working in this high-stakes, highly complex environment, how did we think differently about our workhorse of a batch to turn it into something innovative and adaptable that allows us to consistently meet the needs of our clients every day? We focused on three design principles: building consumer agnostically, giving our consumers control, and working in close collaboration with our partners to make sure we're meeting their needs.
Denton Burnell — Principle 1: Consumer Agnostic
Let's start with our first principle, building consumer agnostically. Ideally, we only build services that can realistically be reused by multiple offers at Vanguard, in the microservices world, in the cloud — in our case on AWS. How did we approach this?
When you think about that daily batch process Colleen just described and supporting that for multiple consumers, how do we know who's calling us, and how do we get the right data from the right places in support of those partners?
First, and perhaps most importantly, we employed the concept of an Authenticated Consumer ID, or ACID. This ACID token is unique to each individual consumer that might call us, and as the name implies, is authenticated — so that when a particular partner does call us, we know that they are in fact who they say they are.
The ACID is only part of the solution. As this diagram demonstrates, we have an authenticated consumer application calling one of our orchestrators — and an orchestrator for us is essentially a gatherer of data. It calls other services to gather data. But in calling that orchestrator, it has also provided that ACID. And from this simple bit of information, we know exactly who is calling us. We know which data they are entitled to on our platform, and we know which adapter services we need to call to obtain data that's not on our platform — that might be at Vanguard, or might be somewhere else entirely outside of our four walls.
Adapters are another key concept that make our architecture work, and they're really a key part of our partnership with our consumer partners. An adapter — an adapter service, that is — is really just a contract for data: a certain set of data to be provided in a certain way. At Enterprise Advice, we don't control or manage the adapter services that fulfill those contracts. However, we are heavily reliant on them to be accurate and performant to meet the needs of our clients. We have contracts for things like accounts and balances and transaction history and tax lots and so on. The use of the adapter pattern allows us to define a singular and consistent way of accessing key record-kept data without worrying about how or where that data is stored. As long as the service meets the obligations of the contract, we can do our job effectively.
Colleen Evans — Principle 2: Consumer Control
So Denton's talked through our first principle, consumer-agnostic design, and that's really our foundation that allows us to deliver value from our second principle: giving our consumers control.
So 99% of the time, this is generally what our day looks like. Overnight, while we're all sleeping relatively soundly in our beds, the overnight predecessor jobs are hard at work, updating the data we need to calculate accurate investment recommendations. Once complete, our partners kick off their DPM runs so that all recommendations are generated before many people are even starting their day. We check to make sure nothing looks abnormal, and we submit those recommendations to our partners in trading who execute the trades.
But that day is an every day. And as anyone in the world in technology will tell you, sometimes things — both in and out of your control — will go bump in the night. And when that happens, one of the things that makes our ecosystem of teams so strong is that we all band together to focus on meeting our collective ultimate goal: getting accurate trades out the door for our clients.
These mornings have another benefit as well: learning. Much like Rome, DPM wasn't built in a day. And our response to those issues has allowed us to enhance DPM in ways that empower our product managers and our partners to make decisions and act in real time.
First and foremost, our team has implemented straightforward event-driven functionality to give more control to our partners to start the batch when they're ready, by dropping a simple trigger file in AWS. This means our clients place a simple startup file in an S3 bucket, which DPM is patiently watching for that event. The file being placed tells DPM to wake up, and based on the contents of that file, take action to run anytime the client so desires.
Notably, and perhaps not surprisingly, that isn't the way it's always been. This is in direct contrast to the historical batch process that DPM replaced, which was based on a more traditional Control-M infrastructure where there were prescribed startup times and so on. We've obviously come a long way from those days.
Denton Burnell — Thoughtful Control & Proactive Scaling
But it goes deeper than this. As our process matured and evolved, we worked to provide even greater control of processes through a centralized control function we affectionately call Control System. This has allowed us to implement actions like those you see on the screen that allow us to better respond to unexpected events like those Colleen mentioned. Under the covers, Control System leverages a combination of queues via SQS, SNS topics, and Lambdas to provide this level of control.
So what does this look like in practice? A common scenario is things we call heavy compute days. A simple example: quarter end, we have more calculations, and sometimes that can delay our underlying predecessor jobs. This used to be a more difficult situation for us. When the data that we need to calculate investment recommendations wasn't complete, we'd have to jump through a few hoops to make sure DPM didn't kick off with incomplete data, as that would cause inaccurate recommendations.
Because of the flexibility we've added, our consumers are now able to watch for the end of those predecessor jobs and then kick off DPM when they're ready. In addition, we've added functionality such as the ability to pause or stop and restart the run, so that we're giving our partners and our teams time to investigate issues, and so that we can make sure that every client that needs to be rebalanced that day is able to make it through the batch.
Now, some of you might be wondering, and rightfully so — by introducing all of this great event-driven control and allowing consumers to do all of this, we've also created a bit of a challenge: namely, how do our data-gathering services know if, when, and how much they should scale to meet the demands of the DPM batch that's about to run? Now, we could — and we certainly do — rely on AWS auto-scaling, but that's a bit of a reactionary approach. And the truth is, we know that the batch is about to start because our clients just told us.
So to solve for this, we introduced a simple scaling Lambda function. When DPM runs, any service subscribed to that topic is informed that the run is about to start. Armed with a heads-up, these dependent services can take deliberate action to scale up to the level that they need to support the batch run. In the same way, DPM will publish that it's complete, and those services can similarly scale back if they so desire to their normal baseline levels.
By giving services this fine-grained visibility, this function has allowed us to save significant AWS dollars, because we're only scaling exactly when and how much we need to be. In addition, it also helps us mitigate risk of running during normal business hours, because services are scaled appropriately as they need to be.
Colleen Evans — Principle 3: Consumer Collaboration
So this takes us to our third principle: consumer collaboration. We partner with our partners on the daily runs, but our partners are also an extremely valuable source of end-client feedback, as their teams and the advisor teams are the ones that are directly working with our clients.
An example of this is that one of our partners came to us because some of their advisors were having issues rebalancing clients that we affectionately call Hulk clients. Essentially, these are clients that over the years have amassed thousands of lots, and the sheer size of these clients means that sometimes they were falling out of that rebalancing process.
In supporting these Hulk clients, our engines needed to consume increasingly larger sets of data. As you might imagine, processing latency increases in direct correlation to that data set when you're trying to optimize for cost. Our challenge in a microservice environment was to ensure that the batch timing did not increase as we handled these types of clients, and also that we didn't introduce any additional fallout that would require manual processes after the fact to resolve.
Enter our async processing model. The basic premise of this model is that our rebalance engines hand a Hulk client a ticket when they come and request a rebalance. This is a ticket like you would have at the DMV, only it gets processed way faster. Consumers that receive a ticket need to listen, and once that ticket is called, they go back, they ask for their recommendations, and they receive them.
Fortunately, because Hulk clients represent a fairly small portion of our portfolio, we were actually able to implement a hybrid version of this, where we use synchronous processing for the majority of our clients (because it's faster and cheaper) and we only use the async model when we need to, for Hulk clients — allowing us to save additional AWS dollars.
So this was an example of utilizing feedback from our partners and advisors to enhance DPM based on products that they were familiar with. But another benefit of us enabling multiple offers is that we're sometimes able to connect the dots when different partners are seeing similar problems.
Back a few years ago, one of our partners was launching a new offer, but they were running into scenarios where some of the recommendations coming out of the engine weren't meeting the minimum purchase amounts required by this new partner and their new record keeper. Because of this, the recommendations were failing in trading, and this was causing additional manual work to rectify. Our teams partnered together to come up with a solution that would look at the recommendation being generated, anticipate if it would fail in trading, and if it would, adjust the recommendation to meet those minimum requirements while remaining methodologically accurate. We added this product to the DPM process and named it Minimum Market Order.
At the time, the only one requesting this product was that partner. But true to our principle of reusable design, we still built Minimum Market Order to be flexible — to allow our consumers to actually define those minimum requirements without extensive coding changes — so that in the future if any other partner wanted to come and utilize this product, they would be able to integrate with Minimum Market Order and have it recognize their own unique parameters.
Fast forward a few years, and a separate partner came to us because they were seeing some fallout from de minimis recommendations, sometimes as low as a cent or two. These are different partners and actually different problems as well, but they were similar enough that we were able to reuse the underlying technology in Minimum Market Order that we built for the first consumer to meet the needs of that second consumer — and come to a solution in a shorter amount of time and for less cost than if we'd had to rebuild a new product.
Denton Burnell — Close
So we've enjoyed a lot of success through our focus on adaptability and by staying true to these core principles: by remaining consumer agnostic, by putting control in our partners' hands, and by collaborating closely with those partners to drive value for our clients — all investors.
Now, as you might imagine, this wasn't possible without the work of a very large cross-functional team back at Vanguard comprised of architects, product managers, product owners, and a bunch of autonomous full-stack development teams. And we owe them a big thanks for what they've been able to produce.
Our journey isn't complete here. We still have a lot of growing and enhancing to do in DPM and we're gonna continue to do that going into the future, to support the evolving needs of our clients.
To that end, when Colleen and I were thinking what ask we might have for you today, we were reminded of the increasing added complexity of managing the run side of a complex system like DPM, which is supported by multiple autonomous individual full-stack development teams — some of which exist within our organization, Enterprise Advice, and some of which are outside.
So importantly, we'd love to hear from anyone who has had success with different run models which integrate a number of individual teams who own pieces of a larger product ecosystem. If you have thoughts, please reach out. And with that, we are at time. Thank you very much, and we hope you enjoy the last few presentations.