When Vibe Coding Doesn’t Vibe: Hard Truths in Enterprise AI

Log in to watch

Las Vegas 2025

When Vibe Coding Doesn’t Vibe: Hard Truths in Enterprise AI

Director - Engineering Effectiveness · Grainger

The path from GenAI proof-of-concept to enterprise-scale is filled with unseen complexity. In this session, Grainger technology leaders share the “hard truths” of enabling GenAI at scale—from navigating vendor onboarding and evolving tooling to managing change fatigue and measuring value beyond the hype.

You’ll learn how Grainger is applying immersive enablement practices, prompt engineering, and a human-in-the-loop approach to customize GenAI tooling for its sociotechnical environment. The session also introduces a pragmatic adoption model—moving from PoC to pilot to production—to scale GenAI responsibly across the organization.

Designed for technical leaders aiming to translate AI’s promise into meaningful outcomes, this session offers practical frameworks, cultural insights, and lessons learned from the front lines of enterprise AI transformation.

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

Okay, up next is Eric Chapman and Philip Sears from Grainger. Eric is Director of Engineering Effectiveness, and Philip is our AI enablement lead, who was previously the quality engineering practice lead.

I remember my conversation with him so well. I was in an airport somewhere laughing so hard because that Friday, a leading coding vendor got acquired. Then over the weekend, another chunk got sold somewhere else, and they were left on Monday saying, "Who should we be having the negotiations with?" This is some of the absurdities one has when dealing with tools moving as quickly as this field is right now.

They will share their experiences to elevate the state of the AI development practice at Grainger. Here are Eric and Philip.

Eric Chapman

Thank you, Gene. Philip and I are super excited to chat with you today and share the story of Grainger's journey. Before we do that, we'd be remiss not to talk about the fact that we're truly standing on the shoulders of giants. On Tuesday we heard from Emily and Lucas about the five-plus-year modernization journey we've been on. This talk would not be possible without the hard work of the Austin engineers we have at Grainger, so we're just fortunate we get to represent them today.

As Gene said, I'm Eric Chapman. I run our engineering effectiveness organization at Grainger, and our customers are internal product delivery teams. Let's do a quick test. Lucas kind of quizzed us the other day: what's Grainger's slogan? Does anybody remember? "For the ones who get it done," right? With our customers being internal, our goal is that we try to get it done for the ones who get it done. That's how we orient ourselves.

Back in about this time last year, we added a new team to engineering effectiveness. That's our generative AI enablement team, and that's where Philip comes in.

Philip Sears

Hi, I'm Philip Sears. I'm the GenAI enablement lead, and I help engineering teams improve their engineering flow and SDLC with GenAI.

Eric Chapman

Awesome. Quick recap: you've heard a lot about Grainger already. On Tuesday we heard from Lucas and Emily on our five-plus-year journey and the ebbs and flows of building high-performing product teams. Yesterday you heard from Johnny, our CTO, and Johnny talked about how we keep the world running by removing complexity from our customers. We have to be careful, though, to not absorb all that complexity, so we talked about simplicity. Johnny also gave us some bonus materials with some dad jokes in there to help. He also touched on some of our journey of how we simplify our architecture and even our AI approach. Through this talk, we're going to double-click a little bit more about what we've done with generative AI.

Let's talk about vibe coding. For me, it's an awesome time to be alive. I think the first day we heard someone say this is a once-in-a-career opportunity, and I believe that. I think we're really fortunate as engineering leaders to be going through this right now. Personally, I've pushed more code in the last 12 months than I have in the last decade. Thinking about what Topo talked about the other day, family and friends are starting to worry about me a little bit because I'm spending way more time pushing code. I'm able to express my ideas as a technology leader with prototypes and pull requests rather than PowerPoints and whiteboard sessions. That's been awesome.

But what happens if I do vibe coding, if it's just me and the IDE and I'm staring at a problem? I can easily achieve vibe coding and come to work Monday morning with 300 new files, 30,000 or 40,000 lines of code. I made a Mermaid diagram, I made a README, so I tell the team, "This is awesome, and I've got all the documentation. Let's get aligned and go." Turns out that doesn't really work that well. If you try to push that on your team, that's a challenge.

What happens if we take that localized challenge at a team and zoom it out to a wider aperture? Now we have multiple teams trying to vibe code when we're trying to build an experience that we're focused on creating for our end users. What happens when you bring vibe coding to a nearly 100-year-old organization where governance, risk, and compliance matter? That's exactly what we're going to talk about today when we talk about the four hard truths that we've encountered with generative AI enablement, and we'll touch on some of our solutions.

Before we do that, Johnny touched on this, and I think this is important for any organization to think about. It's going to be critical to your journey of generative AI. It's our commitment to continuous delivery. We've been laser-focused for five-plus years on continuous delivery. It's been our North Star. In addition to that, we've made deliberate decisions in how we choose to build and deploy software at Grainger. We call those sensible defaults. There are ten of them: things like trunk-based development, team ownership of quality, and security built in and production. That's the underpinning of how engineering happens at Grainger. Johnny talked about it: we leverage simple metrics. We use the DORA framework and metrics beyond DORA. Philip is going to take us through a little bit more about our GenAI principles.

Without further ado, let's jump into hard truth number one.

Philip Sears

Hard truth number one is not everyone is ready to embrace AI. There are diverse views and opinions, and concerns even, and they're all mostly valid. Let's go on a quick journey together.

Who's here who's having the time of their life? This is the best time in 20 years. Great, great, great. It's fast, it's magic. Who's here who's worried about the quality of the code? I know I was as a quality practice lead. Maybe it'll take our jobs. You don't know; you're afraid. Anybody? Nobody in the back? Well, I was.

Maybe you're here: you're going up the slope of enlightenment to the plateau of productivity. Maybe you're having fun, but you're still unsure. You're more productive and ambitious, but maybe worried about the changes in store.

We understand that excitement and fear can coexist, especially in a company of our size, and we take a more balanced approach. We take the middle ground. What that means is we do not force AI on teams. We don't force AI on individuals. It's an opt-in approach.

For our pilots, which we'll talk about more in a bit, we include diverse viewpoints, seniority, roles, and parts of the organization. We also position at Grainger, we position AI as a competitive advantage. By not being forceful and getting the opposite effect, we get more bees with honey, and we get a pretty high adoption rate that way.

Moving on to hard truth number two: the fast-moving, ever-changing market. This is what Gene mentioned earlier. I don't know if you've read the news: AI vendors are changing. Maybe you're not sure what to make of the market. Maybe you started working with one vendor and it got acqui-hired by another one. Maybe it shifted to something else. Or maybe someone here is just piloting one tool or adopted it, and then they went to a conference and everybody's using something better, and you're not sure what horse to bet on.

But if you're like us, you just plunge ahead because we've got to move forward. As soon as we plunged forward on this path from POC to production, we realized this is really complicated and it takes a long time. Our leaders said, "We need to do this a lot more and much faster."

So we learned to navigate this process. We knew there were a lot of stakeholders involved, and what we did was apply flow engineering concepts. We put it all together in one value stream map, and this facilitated collaboration, learning, and trust. Actually, we reduced it by at least 60%, so we were able to go both safe and fast.

This resulted in a much more simple phased lifecycle. We have this four-phase framework to go from POC to production. Why do we have this? Because we want to do it more than once, many times. We have incremental rollouts and fast feedback loops. In each phase, we embrace failure as a learning opportunity, and we may not move forward. We believe this funnels the tools that provide the most value. We have optionality. We say, no one tool to rule them all. As we heard before, it could change anyway.

The first thing is we need a decision: is this? We have an AI review process. We can say, yeah, this is good. The second is for experiments and learning. We are not making a proposal; we need to learn, we need to experiment, we need to feel okay and do that in a sandbox environment.

Then we move on to real use cases. I think this is a very important point here: we have a diverse and inclusive group across the organization. We may take detractors and get them involved early. We may bring the power users because we want to make them more powerful. We get feedback and metrics on these real use cases in these product engineering teams. With these participants and all these use cases, it's very easy to make a proposal to move forward.

The last thing, which I'm sure we'll talk about more, is production: the whole setup and integration adoption. Then we see two more things: customization and measuring ongoing, which we'll talk about more.

Hard truth number three: when vibe coding doesn't vibe in the enterprise. What do I mean by this? We have a hands-on approach. We have a lot of complex enterprise systems that are heavily monolithic or not modular. We have standards that govern our software, and we have to understand our risk as a nearly 100-year-old supply company. Also, these GenAI tools lack our Grainger context. They don't know about our sensible-default practices like TDD, which I think is probably even more important with GenAI. They don't know about our styles, our API style, for instance, or anything about our product or domains.

This is our approach. We figure POC to production is only 70%. The hard work and the real value begin after those phases, and there are three things: prompt engineering, which is skills and learning; context; and measurement.

For prompt engineering, this is all about learning new prompt skills. These agentic AI workflows that we know are game changers, multiple agents, asynchronous, maybe not 15 agents for everybody yet, but breaking down this large work into smaller tasks and knowing that this is like team and ways-of-working changes. We think this is all learning, so we focus on immersive learning.

If somebody is not using a tool, we don't talk to them or browbeat them with it. We say, just use it and join one of our workshops. They turn better that way. We're building a community of practice, so we have all kinds of ways to share and get ourselves out of the way so people can connect.

Context, we think, is huge. It's not just context windows, which we learned is like more of something is not always a good thing. But custom rules are our answers to all the context I mentioned before. So far, they're just simple Markdown files that tell how to do our practices.

For instance, I said it doesn't know about our API styles that our architects defined. Well, now we just put an AGENTS.md or actually a Markdown in our API starter kit, and it adheres to that. I think we're converging on the AGENTS.md, which will be a standard, hopefully, across the industry. Also, integrations for MCP are probably too many to list, and prompt libraries are just ways to share things.

The third, we think, is measure utilization. Who's using it? Feedback: what do they think? Outputs: what is it generating, what does it look like? And value: what is the real business impact? Eric's going to talk more about this in a bit.

Wait a minute. That's not mathing, Philip. Eric caught me. Here it is: only 95. Is something missing? Actually, it's never going to add up to 100%. Five percent, we think, is continuous improvement. It could actually be 50%. There's going to be new agents, new waves, new models, new tools and features. I don't even know. We know this is a fast-changing market, and we need to keep learning. This is the worst AI we're going to use.

Eric Chapman

That takes us now to our fourth and final hard truth: how do we measure AI outcomes? Boy, wouldn't we like to know, right? I'll share where we're at, where we're going, where we're headed. But I'll just go ahead and give you a preview: this is going to be help needed here. I would love to join this collective community as we find the golden signals, the DORA equivalent to GenAI.

Let's frame the problem first. There's a two-sided coin here. On one hand, we have all the news of tons of tools and all these hype stories. On the other hand, we see the potential for a huge unlock and upside for our business. It's like the honeymoon phase of investment. But the board's going to come asking, "What's the ROI?" So this is how we're reasoning about return on investment and how we're measuring.

As we started thinking about how to measure, these are some of the common traps and pitfalls we ran into. Again, GenAI hype cycle. I've probably heard that a hundred times here, but you hear about 10x engineers, you hear about 100x engineers. I want to quote Laura Tacho, the very first talk of the conference. I love it. I'm going to read it directly. It says, "Data beats hype every time." Don't get caught up in the hype. Let data drive that. I love that. That's one of the takeaways for me at the conference.

The second one is the cost-center trap. I like to challenge us as technology leaders to think far beyond AI as a headcount reduction. Let's be strategic with it. That's the easy trap to fall into.

The bottom two signals: utilization. Utilization, you typically get some form of that out of the box with your AI vendors. That's okay. It gives you an idea: we've paid for a tool, we're buying a tool, are people using it? It turns out that's an interesting signal to join because there's a bit of internal hype that happens. You'll see that scale up, and then people tail off and on. We'll talk more about utilization. But the problem with only utilization is it doesn't give you the "so what" answer.

Finally, surveys. We all love a good survey. We do surveys ourselves, but qualitative data alone is not strong enough for the staying power, and they're not comprehensive enough.

So how do we think about this? Again, this is a work in progress and the journey we're on right now. I'll take you through the left to right here.

As Philip touched on, utilization is really saying, are the tools being used? We're thinking about the number of folks leaning in on the tools. What we found we have to do is upstart the data. Just knowing that a population is using the tool turns out not to be that helpful. So we upstart that into team-level metrics. For us, a core principle, or first principle, of metrics at Grainger is this is a team sport. It's not an individual sport; it's a team sport. We try to understand how teams are using the tools, and then we can start correlating over to the right.

We think about the overall human experience. People are using the tools, but what do they think about the tools? We're talking about sentiment analysis here. We're interested in time and context. I don't know how you felt, but we've heard a couple times at this conference that the time spent actually writing code is 30%. I don't know what your reaction is, but it kind of hurt my feelings, if I'm being honest with you. What do people believe? What's it feel like to use a tool, and what do they think the time and context is?

We think about perceived value. I think it's important to get perceived value. How much time are you saving? Philip did some interesting surveys where, six months ago, we asked how much time you're saving, and the following question is, what do you intend to do with that time? At that survey, maybe it was 12 months ago, it was, "We're going to spend it doing more PRs." We did the same survey, and the most recent one was, "We're saving X amount of time, and we intend to use that time to further our product outcomes and missions." I thought it was pretty awesome to see how our community evolved in such a short time.

So we have our people using tools, the utilization. How does it feel to use the tools? Now let's start leaning into the so what. That's our engineering effectiveness metrics here. This is getting to the so what. We're correlating team adoption metrics and looking at how that's impacting our team.

I talked about using DORA in the past, so we love DORA. It's been 10-plus years of DORA, but we feel like it's a bit of a lagging indicator. That's a great signal, but what are your leading indicators? DORA calls those DORA capabilities. For us, it's the signals for our sensible defaults. What are the signals for trunk-based development? What are the signals for team ownership of quality?

The reason this is important is because we're trying to correlate teams leaning hard into generative AI. Is their velocity increasing, but their quality and safety decreasing? Is velocity decreasing and quality staying the same? This will start leaning into the team impact, so we're getting a little closer to the so what.

Finally, the most elusive but highest-impact is the business value. This is really what you're trying to get at, but you have to build up to this. The business value is: how is AI impacting your strategic goals? Teams leaning in hard into AI, are they achieving the goals faster? Are they hitting the financial targets? Are they beating financial targets? This is where we're at. Again, love to compare notes with you all.

As we started this journey, we put this really simple graphic to show. On the horizontal axis, it turns out the ease to gather these metrics, you can see they go left to right, but the impact goes up. Keep that in mind as you go.

We try to think about this in a two-phased approach. What I mean there is we want to look holistically across the organization. We want to see: is the rising tide rising all ships? Focus on the organization. But as Philip talked about POC and pilot, when we're doing pilots, we're talking about actual product-facing teams with real use cases. When you start reasoning about those use cases and you're going to pilot with the team, really think about all four of these up front. Make sure you track the utilization, make sure you have the development signals, and what is the actual business outcome that's important to that team. When you bring those to the pilots, what we have found is we do some internal one-page case studies that help us frame how this tool may extrapolate across the organization. That's how we think about those locally and globally with those metrics.

That rounds us out. Here are our final takeaways. Embrace continuous delivery experimentation, small batch, fast feedback. Boy, that sounds really familiar, huh? If we go back to the 11 years of this conference that just two years ago was the DevOps Enterprise Summit, right? If you've glossed over the DevOps practices and principles, you're going to have a hard time with generative AI.

Context is a huge key to unlocking generative AI. The correlation here, the footnote, is if onboarding new engineers is difficult, onboarding GenAI is going to get harder. Take the time to commit documentation bankruptcy where it needs to happen.

Next final takeaway is execute inclusive pilots. Philip talked about this one. For us, what that means is go find the most opinionated, prickly cactus that you can find that's not leaning into AI, and meet them where they are.

With that, I'll just put them all up here so you can have them if you're interested in a picture. Vendors are going to come and go. There's no AI procurement playbook, turns out. Failure is part of the process. Speed and adaptability are your strategy.

Already covered the takeaway. Thank you, Gene. Thanks for the opportunity.