Log in to watch

Log in or create a free account to watch this video.

Log in
Al Summit Spring 2026
Share
Download slides

Edge of the Present: Pursuing AI’s Frontier in the Enterprise

AI is quickly becoming a key pillar across many enterprises’ priorities. Here, we share how scaling inference-time compute can support product teams with the changing AI landscape, the advancements in model capabilities, and our drive for responsible innovation.

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

All right. The first speaker for this block is Devlin McConnell, Head of Emerging Technology Research at Vanguard, which I'm a huge fan of. Back in 2024, I got to meet their amazing leadership team, including Mike Carr, CTO of Vanguard, who actually spoke at this conference and gave a presentation describing how the thousands of technologists work in service of helping investors win. And with over $10 trillion of assets under management, there's a lot at stake.

That year, at their conference, I met Devlin McConnell, who at the time, his role was to own the programs that were experimenting with gen AI across the enterprise. And what was so cool about that presentation is that he co-presented with their Director of IT Audit, who was using an LLM to figure out how to make the internal audit planning process easier. So there was a switched-on auditor.

So I'm so delighted that Devlin is now reporting directly to Mike Carr, the CTO, and among his responsibilities is to help the most important AI innovation projects succeed. So here to tell that story is Devlin.

Devlin McConnell

Hello, everybody. My name is Devlin McConnell. I lead a team — we call ourselves Emerging Technology Research. We're really like an enterprise innovation team at Vanguard.

I titled this presentation "Edge of the Present" because I think it serves as an interesting metaphor for where we find ourselves in AI. Seemingly every month, it feels like we're going over the next cliff in this space. It's also a metaphor for how we like to approach innovation and emerging technologies, and I'll share a little bit about how we do that.

Our team has a responsibility to catalyze innovation around the firm. We're looking one to five years in the future, we're bringing our research back to the enterprise, we're educating our leaders and our crew — we call employees "crew" — and we're building out strategies for how this could play out. And if we do all of those things well, we get to achieve our team's objectives, our priorities, which is to partner with our business units and experiment and pilot and prove out these working hypotheses for how the space plays forward.

So we're looking two to five years in the future, but everything we do has to be tied to present-day business value.

I want to share a little bit about how we do that in this AI space with an example.

I really like setting the context in AI with this slide. I think it's really profound that just seven lines can tell such an interesting story. This is out of Stanford from a year or two ago, and we see that about a decade ago, AI was just on this horizontal plane — year over year, we were seeing incremental progress. And then a couple of years ago, the switch flipped and progress and innovation just went vertical.

And we're not only seeing that in some benchmarks in academia — I work in the Chief Technology Office, and we're seeing this in our day-to-day. Developer productivity is a big priority in the CTO, and in the advance of coding agents, we've seen this over two years: we've gone from autocomplete to chat, to now coding agents. And now the conversation yesterday and today is all about, well, is it going to be us supervising a team of agents, or agents supervising agents, or who's to know how this plays out?

So this progress is getting absolutely vertical, and the takeaway is we want to take it. We want to take these advancements and apply them to our business-critical applications.

We define business-critical applications as: if this goes down, our key stakeholders are going to feel it. So — clients, crew, partners. Research and operations and client service are obvious use cases in the space. We want to take what we're seeing in the space — vertical progress — and apply it to business-critical applications.

But we face key constraints around responsible AI, transparency, and key governance, and obviously balancing those with prioritized outcomes. And when we apply it to use cases in there, what really happens is we say, "Hey, we need the nines of the '-ilities' — the availability, the resiliency, all these things from software — and then we also need all the considerations that we apply to our crew."

So let's take the client service use case as an example. We have to make sure these AI systems don't give financial advice, don't give market predictions, don't give opinions — all these aspects that we need to constrain to make sure these use cases fit our needs. And we have to balance that with the functional value of creating delightful client experiences. So that's our challenge.

And when we face that challenge in our team, in ETR, what we want to say is, "Well, hey, the future is AI going vertical." And the implication is that if you can define your benchmark, if you can define your success criteria, well, frontier model labs are achieving it. It's a little different the way that we do it, but we want to say, "Hey, can we take that and bring it to the enterprise?" And it's not just our success criteria — which is the performance and the delightful experience for these client-facing use cases — but it's also ensuring that we're meeting the constraints. And that's the challenge that we face.

As we pursue these use cases, what happens is compliance gives feedback, and risk gives feedback, and legal gives feedback, and privacy and governance, and all these key stakeholders that we need to ensure we're doing this the right way — they give feedback. And the key levers that we have on our AI applications just start to populate and explode and drift, and it's really hard to ensure that we're getting this right.

And when we look under the hood — at what I would argue is the most important lever we have on these applications, which is our prompt — we look under the hood and we see what's actually going on after all of this work, all of this iteration over months to get to prod, and we face what I call the pink elephant problem.

You may have heard this. If I say, "Don't think about a pink elephant," what do you do? You think about a pink elephant. If I say, "Don't give financial advice. Don't say this word, don't say that word, don't do this, don't do this" — that compounds, and all of a sudden our context window with these AI systems starts to degrade and our functional performance follows.

So we have AI going vertical, we have really difficult challenges of getting to prod on these things that we've been working on for years. And we want to apply some emerging components to this sort of business problem.

So what we did was we built a product we call Möbius, which is taking advantage of a theme that you might be hearing in the space: this recursive self-improvement, this scaling of inference-time compute, of using AI to build AI to apply AI to these systems. What we built is an evolutionary algorithm — a scaffold that takes a prompt and uses AI to mutate it over and over and over again. And every generation of that mutation, we apply it to our evaluation criteria. Some of that evaluation criteria is, "Hey, this is what a delightful experience looks like." Others are, "Hey, you can't say these words or these phrases because it's against our brand voice." And others are, "Hey, this is for an executive, let's just say, and it can't be over 400 words." So it can be qualitative or binary — very specific feedback loops.

Then we run another generation and vary the prompts again, taking advantage of the LLM's nondeterminism and inconsistency to brute-force our way to variation, and then review. And then finally, over generation after generation, we say, "Hey, this prompt is really good at evaluation X. This prompt is really good at evaluation Y. Why don't you guys come together, have a baby, and have another generation?" And down the tree it goes.

And what we've seen when we take it to a use case — we have one today that's externally facing, it's live in production, it's not a chatbot so it's not an open-ended experience, but it's live today. It met our thresholds to go to prod, and it scores 62% on a composite score out of 100 on our product team's defined success criteria. And by optimizing the system and using AI with very clear feedback loops — and that's been a theme today and yesterday, very clear feedback loops and verification signals — we can use AI to do that iteration for us, taking months and months of iteration and collaboration off the plate of these product teams.

We ultimately brought the score up to 99%, as defined by our product teams. This process — you can see the generations on the bottom — ran for 24 hours of just continuous inference: continuous LLMs looping upon each other and checking.

And if you're my boss, the next question you're going to ask is, "How much did you spend on this?" And the answer is — at this point, with the cost of tokens — negligible. You can't really see it on the screen: 3 million tokens, $50 to $100 of just looping these models over and over again with these clear feedback loops.

So that's an example of how we approach this space. We say, "Hey, we believe the space is going this way based on some research in academia. We hear about these verification loops and evolving techniques. We want to apply it to a business problem that we face today."

Then the implication we've come to is: A, we can apply this to a variety of different use cases. Prompting is one, because it's really clear and difficult to get right, but there are so many other opportunities to apply it to.

Some takeaways that we've landed on: we believe that the frontier models — and we've seen it — they define a success and they define a benchmark, and they're going to achieve it. It was a hypothesis that we could do it in the enterprise without all the levers that those frontier models had, really with just a few prompting or specific guardrails and feedback loops. But we feel pretty good about taking that next step. It's like, hey, if we clearly spend time on our requirements and our success criteria, we can take that jump.

The other aspect is that feedback loop. Why prompting worked so well for this use case — and now we're thinking about where else might this go — is we had really easy-to-define success criteria, and the verification loop was instantaneous. So what are the other use cases where we can apply that? Where there are instantaneous feedback loops, very clear signals that this is doing what we want it to do.

The third comment I'd call out is building out these harnesses — another theme I've heard from a variety of talks. Building out these harnesses is no easy engineering challenge. They can be brittle at times, especially on multi-turn complex use cases. So creating stable harnesses has been mentioned before, but just wanted to double-click on that: it's a huge, huge enabler to a lot of these tree-like, open-ended use cases and pursuits that we're seeing evolve in the space.

And then lastly — the bottlenecks that we face, the failure points that we face as we build out these business-critical applications and what's to come next. Documenting those and reflecting on those is a great way to start taking a step back and saying, "Hey, is this an opportunity to apply AI?" Because every time we engage with product teams and partners on this topic of the prompt — of taking the prompt away from the product team entirely, if possible — it's a surprising revelation, because that was how we always engaged with the ChatGPTs of the world. We controlled the prompt. And it's again, hey, taking that control off our plates and leaning back on the bitter lesson type of thesis that's been going around the AI space.

So that's an example of a use case that we pursued. Coming here and listening to all the different talks is very validating — to hear that the verification and validation problem is top of mind for so many folks. This is an example of how we approached it.

And I would just say: we're standing on the edge every month. Whether it's Manus being released, or the next Gemini model, or the next OpenAI model — every single month we're walking up to the edge of what is possible today, and we're peering down, and that future is very hazy, it's very opaque. We approach this by building out working hypotheses and then trying to work backwards: if this is true, then this must be true, then this must be true, then therefore we can experiment with this today.

But my ask for this group, moving forward and for the rest of the day: if you are looking over that edge and you see something that you have conviction in, I'd love to hear about it. What does the future look like for you, and what are you doing to experiment today to get there?