MCPs at Scale: Unlocking Seamless AI-Driven Innovation

Log in to watch

Las Vegas 2025

MCPs at Scale: Unlocking Seamless AI-Driven Innovation

Senior Vice President, Product · Sonatype

The Model Context Protocol (MCP) is doing for AI-driven software development what APIs did for DevOps automation — enabling AI coding assistants to interact with tools and go beyond basic code generation to automate ticket creation, tweak deployment configurations, and do deep security research at design time. The best MCP tools expand not only AI agent capabilities, but also provide these agents with deep contextual data that can’t be distilled from the public data sources they were trained on. In this session, we will share firsthand lessons around how development teams can get the most from MCP, advise on best practices for integrating these technologies safely, and dive into how MCP is the key to bringing vibe coding to the next level.

Chapters

Full transcript

The complete talk, organized by section.

Tyler Warden

Afternoon, y'all. This is the after-lunch, maximize-nap-time slot, so nobody will be upset if you put your head down. And Steven and I's commitment is we promise not to go long. In fact, we're going to end early. We won't do big Q&A, but we'll be over here for questions afterwards. Hope you find something interesting out of this.

My name's Tyler. I look after product at Sonatype. Steven?

Steven McGill

Steven McGill. I've been working in technology, and particularly software security, for 15-plus years. And then most recently, like so many in this community, getting more into AI and looking at what large language models can do to change the game.

Tyler Warden

Yeah. So we want to talk about MCP today. It's just something that we're working pretty heavily on at Sonatype and want to just bring some learnings out that we have found useful.

I will maybe warn you all that there's some kind of 101-level stuff in here, because it's hard to know what people know. So if you already know it, feel free to take that aforementioned nap, right?

So, oh, here we go. Ready? Oh, look at that. So we see MCP as a standard that probably a year ago we were iffy about, and now has really solidified in a way that we can use as the glue to have models and agents and that sort of stuff talk to different tools and data. So this is a standard we see, especially in the AI-assisted code helper space, being a way that those models can get access to information. I like to think about it as access to more answers than it is already trained on today.

Oh, there we go. Okay, so Steven is talking to us about kind of why MCP, have we kind of settled on this, at least in our research and work.

Steven McGill

Yeah, great. So, you know, we'll say more about exactly what MCP servers are. Like Tyler said, it's sort of a way of connecting LLMs to other capabilities. It's an open standard, so there's been a lot of development, a lot of progress in this area in terms of new MCP server options coming online.

But why is MCP a thing? Why can't you just use an AI coding assistant on its own? Why does it have to connect to these other things?

One thing that I think we've noticed, a lot of people have noticed as you start working with these models, is they really excel at certain things, in particular language-oriented tasks, right? They're language models. So things like translation between human languages, translation between software languages, programming languages, summarization, and then of course code generation from requirements, right? And that's what a lot of folks here are using them for. We've heard a lot of experience reports this morning about the effectiveness of that.

But they struggle with some things that don't fit into that language-oriented bucket, right? So in particular, things like reasoning, like lengthy chains of logical reasoning, right? That can be a struggle. Inference, going from a collection of data in a certain topic to sort of making that inferential jump to: what does that mean in this other space? And we see some of that is very domain specific, right? And so we're going to talk about the role of domain knowledge in getting the most out of these models.

But I have on the right here, I asked AI, what is AI not good at? So this is a self-reported take on the situation. And it notes that, yes, they perform reasoning and inference by pattern recognition on vast data sets. So if some conclusion is there in the training data, these concepts are associated with this conclusion in the training data, then it'll pick up on it. But sort of deep common sense or causal reasoning that's not there in the data set, that's a point that can be an issue.

Also, I went to university with people that struggled with deep common sense and causal reasoning, which I thought was a funny way. So we won't hold it against them.

But we do want to find ways to work around these gaps, right? And MCP really gives us an opportunity to do that. And I want to show a little bit more insight into the gap, and then Tyler's going to talk about an example of how MCP servers can help close this gap.

Tyler Warden

So, a little bit of background about Sonatype. We are an open source software-focused company. Anybody written or used Maven before? All right, so our founders created Maven, and then we still are the stewards of Maven Central. So that's how we got our start. And now we do open source software, supply chain governance, dependency management.

So for us, dependency selection, choosing your open source component, is a big area of research for us. So that's why we're going to take that example through end to end, not as an advertisement, but more so as to say, because this is our area of expertise, we're deep into this problem space.

And so we're going to take this example of how dependency selection, if you're a coder, what library should I use, is a pain-in-the-butt problem to maintain and modernize. And so we're trying to solve that as a company. But that's going to be our example that I take us through in this. But this could be expanded to almost any use case where some expertise or knowledge is needed beyond what is kind of on the corpus of the internet.

Steven McGill

Yeah, great. So to start this example, let's say that we're asking some AI coding assistant to help us choose a JavaScript PDF library. So this will be a library for working with PDF files. It needs to support both import and export, and it needs to be compatible with PDFs generated by Acrobat and the Mac Preview app, right?

So if we give this prompt to an AI assistant, it will be able to provide good suggestions for this, right? Because things like, does this library support import and export of PDFs, that's right there in the description of the project, right? And certainly if you dig into the documentation, there's plenty of information there on what capabilities this library has. And so we have an excerpt from the project description for pdf-lib there at the lower left. And pdf-lib is in fact one of the libraries that the AI recommended when I was generating this example.

Things like compatibility with other applications like Acrobat here, again, you can get from these large training data sets, right? So this is a thread from some developer discussion board saying, I'm new to this field, I need to do this with PDF files, I'm creating them in Acrobat, right? And someone suggests this library is a solution, right? So that's a signal that this library fulfills that requirement of compatibility with Acrobat.

But what if we add onto this, right? We don't just need a library that's sort of fit for purpose in this sense of being able to manipulate these Acrobat files and import and export them, but we also want it to be actively maintained, right?

So asking ChatGPT-5 about this, pdf-lib comes up as an option. It notes that it's moderately active. It maintains a presence with documentation that's current, but there's an open issue asking, is this thing still alive? So that's a point of concern, but it's sort of not raising any bright red flags, right? It's a note of caution.

But then actually if you go to NPM and go look it up, the last release of this library was published four years ago, right? So that's not an accurate depiction, certainly not the sort of data you would want to base a decision to use this library on, and not something that you could imagine providing additional automation on top of, right? This response would work if you have a human in the loop at this point of the process. But if you want dependency selection to just happen automatically, and the agent continues with whatever it thinks, this is clearly not something you can lean on.

And so it's not hard to see why this is maybe difficult for a large language model to infer directly, right? There's a lot of factors to consider when it comes to the question of, is this library at end of life or not? End-of-life status is often not actually announced, right? A project just sort of goes dormant, development peters out, and the page is still live. pdf-lib is still up there. The documentation looks like it's current, right? The documentation website works and everything. But if you dig in, start digging into the GitHub repo and the issues and how responsive are they, you'd start to get certainly things that would be warning flags for large-scale enterprise deployments of this that you want to support for years and years and years.

So you need to consider a lot of factors when you're making a determination about end-of-life things like development activity, issue tracker activity, discussion forum chatter, all of this stuff, and really infer something about this one bit of information: is it end of life or not? From this multitude of information, and different things should be weighted differently, right? There's a lot of domain knowledge, domain expertise that goes into this determination.

And ideally, there's not that many projects that are potentially going dormant each year, right? You could review this all, you could have human reviewers in the loop, and then you have curated data that you can really rely on when you're making these determinations.

So that's an example of a case where a different sort of process outside the LLM, maybe with humans in the loop, maybe with some additional technology that goes beyond large language models and does some sort of other statistical analysis of patterns of release activity, could be the right solution there.

So I'm sort of listing the things that we want to incorporate when we're using AI coding assistants in their full generality, trying to automate as much as possible and trying to get them to do this dependency selection task. And then I'm saying at the top here, which ones are LLMs on their own, just in isolation, good at versus where there may be some struggles.

So project descriptions, documentation, we covered that. That was check mark. They're doing great there. Metadata analysis and curated data are not generally something that they have access to outside of MCP servers, which we'll talk about how those fix this in a second.

But suppose we're not even done yet, right? We also want a library that is not malicious, right? There have been a lot of open source malware releases, releases of malicious packages that are intentionally malicious. There's not an accidental bug that some developer introduced into the code without realizing it. These are packages that are intended to cause harm. And so if we want to avoid that, these releases, they're designed to sound legitimate and look appealing, right? That's sort of the whole point, is that developers think they're for real and incorporate them. And so those same sorts of things that trick human developers when they're going to just pull a package based on what they read on the internet about what's available are highly likely to also trick these AI coding assistants. So we can't just rely on the project descriptions here. Again, we need some sort of separate analysis that's tailored to this problem of detecting malware.

Reasoning and optimization is another sort of gap here. So we might also want to add, please prefer more popular libraries, right? Something that's popular, that's used widely by the community, you're more likely to get answers to your questions. They're more likely to have bug fixes. Maintenance will be faster. Releases generally are faster in those popular projects.

But what does prefer mean, right? How should we weight this versus things like, is it well maintained or fit for purpose, right? If we have a PDF library that's more full featured but a little bit less popular, is that the one we should suggest or should we prefer popularity? Those sorts of trade-offs between different constraints, and determining which constraints are hard constraints, like malicious, we probably don't want to give on that one. We definitely want a non-malicious library. But which other constraints are soft constraints, and how do we balance those? That's another thing where the LLM is generally not going to provide a great solution out of the box on its own. We want to bring in other capabilities. So yeah, go ahead.

Tyler Warden

And when we look at choosing MCP servers and the tools that are in there, this is kind of what our recommended checklist is when you're considering MCP servers, which ones to use, how to implement them. The more of these problems they solve, the better. Is there a limitation in the knowledge, generally in the LLM? Is there domain-specific knowledge? Is there specialized data behind helping close reasoning gaps as well as the integrations?

The more you can choose MCP servers and put those in your toolchains, especially if you're curating toolchains for your company, the ones that solve these, that are more bespoke, the better they are.

Dr. Vic Pence tells this story about this boy who was nine or 10 and he was surfing in Hawaii. He was out there surfing and he loved to surf, but he was surfing one day and he got into this horrible accident. Long story short, they had to take his left arm in the hospital. Very tragic, right? And so the boy is laying in the hospital, the mom is there trying to cheer him up, and she just can't do it. And when you're in the hospital a long time, you watch a lot of TV. And so he started watching UFC, right? So he started to watch them fighting. This was the one thing that woke this boy up, right? He was real excited about UFC.

So he says to his mom, laying there, horrible accident, lost his left arm: Mom, I want to do UFC. This is what I want to do. And at this point, I don't know any mom that looks at her 10-year-old and is like, yes, my son wants to do combat sports. But it's the first time her son had come alive in years. So she finds a Brazilian jiu-jitsu guy in Hawaii and says, my son wants to do this kind of fighting.

And the jiu-jitsu guy looks at him and says, son, are you really sure you want to do this? Yes. Are you really sure? Okay. It's going to be hard. Yeah, we want to do it. Okay. So every day this boy comes in and the jiu-jitsu master trains him in the same move. One move, every day, all day, two hours a day, four days a week. Same move, same move, same move.

Fast forward six months, the boy goes to his first tournament, wins the first match, wins the second match, wins the entire tournament. First time out. Mom is ecstatic, she's crying. They go up to the master afterwards, says, how did you do that? He says, it's very simple. I taught him the most complicated move in jiu-jitsu whose only defense is grabbing the left arm.

So there's this domain knowledge and knowledge limitation that they went to the master to kind of go get.

When I talk about this with my executive team that doesn't attend a show like this, I use the restaurant metaphor. So our MCP server here is the kitchen that's serving up cool stuff. And every recipe that's there is an individual tool. So an MCP server has multiple tools. More tools don't make a better solution for you. I'll just let you know that. We also find that once you get right above about 10 tools, the larger models and agents that are calling them start to get a little paralysis by analysis, right?

So you've got the agent here, this woman who's ordering, and then this client that she's talking to. I think we can all attest to the more knowledgeable our waiter or waitress is, the more that she knows about what's being prepared, the better experience we're going to have.

So an example for us is, this is my demo. I don't know how the internet's going to be, so I'm going to walk you through the demo that I would give if I was more consistent with the internet. So here we are creating a new Python. You can read the prompt. I'm not going to read it to you. And I just highlighted that we're going to use a requirements.txt file to manage the dependencies just so we can see what's happening.

So build me this cool thing. And you'll see that it went, it thought about it, and it found this dependency, requests 2.31.0. Okay. Now, when we said, hey, use our MCP server because this is what we're working on to try to help solve this problem, we say, hey, use ours instead. And look, we got a more recent version and it's a better version. It's better maintained, right?

So this is an example because our MCP server that we've been working with and building is designed to give an opinion back to the model on what is the best version to use. Just like the waitress goes to the client who's ordering, right, the model that's ordering, and she says, what do you recommend? And she says, oh, you've got to try the fish. So we go give it the fish. But if you don't like the fish, you could try the pasta, because it's that agent that's going to make the decision, right? Now, it might give the request to a human, but this is just an example of an MCP server as a domain expert. And that really is a good way to think about it.

Either it gives an integration that you can't do or provides, or that you don't have access to, or your LLM doesn't, or provides different reasoning or access to different data that isn't on the public internet. And so when you're curating and thinking about the problems to solve, this collection of experts that you can put in a toolchain can lead to a lot of success, we've found.

Steven McGill

Right. And so that shows an example of an MCP server filling in a knowledge gap that an LLM would otherwise have. And I want to sort of break, there's other reasons to use MCP servers, right? So I wanted to sort of break it down as, what are the major types of MCP server? How do they differ? Because there's a whole lot of MCP.

So I would say the majority of MCP servers that you go out and find are currently oriented around interfaces or actions, providing additional, basically providing an API that's easier for an AI tool to use. And so these are things like the GitHub MCP server that lets you create a pull request or do other things that the GitHub API would let you do. Jira will let you create an issue and interact with issues, again, in a similar way to the API would. Jenkins, you can trigger a build, right?

What we've seen today in the demo that Tyler just went through is a way to provide additional data to the MCP server, including, in this case, data that incorporates a lot of domain knowledge, right? A curated data set that's built up specific to a particular domain. There's some other MCP servers that provide access to a general store set of data. So like Elasticsearch, anything that you've indexed can be provided to the MCP server via their MCP connection.

But I think there's a lot of value, additional layer of value, to be accessed when you think about how can you pull in data using MCP. And I think, I've just listed these all as a flat, sort of like they all provide data, but I think there are distinctions, right? There's data just sort of like unformatted, unvetted data, but hey, here's some extra data, versus here's a curated collection of data that we have some confidence in and we know something about. Maybe humans have been involved in the generation of that. It's well structured, right? There's advantages to all of that. So all data is not the same.

And then I think the next frontier potentially is MCP servers as a way to deliver additional reasoning to these agents, right? And Tyler hinted at some of this with, what's the best version, right? Best is like, there's some reasoning behind that, right? There's not just a clear answer. It's sort of situation dependent. And so being able to delegate questions like that via MCP to some sort of reasoning engine that maybe is not just doing LLM-type inference but doing additional domain-specific inference, I think is another place where we'll see a lot of potential and a lot of development, hopefully a lot of focus in the near term in the MCP space.

Tyler Warden

Yeah, I think it's just important that I think we'll probably just end up leaving it here, give you some time back. But I'll say that, look, as we kind of wrap it up, we see that there's MCP as an open source protocol that seems to be adopted. And then we see the most successful implementations are ones that solve a problem that the LLM isn't designed to solve, like an integration. So an LLM could use REST easier, and you can have control on it, or access to data or intelligence that they weren't trained or curated by individual knowledge or reasoning.

So by finding servers that have, we see right now, a smaller, well-curated, well-loved number of tools that are designed to solve real specific problems leads to, I think, the easiest business case to get into your standard toolchain and real kind of differentiated acceleration for your businesses.

Yeah. So I think with that, we'll leave it here. Steve and I will be over here for questions. Thank y'all for coming. Hope you enjoy the show. Thank you.