Architecting AI-Native Organizations: How to Redesign Work at Scale
Joe Beutler, Head of Solutions Engineering for Strategics at OpenAI, draws on firsthand experience inside OpenAI and across its largest enterprise customers to explain what structural changes actually enable AI-native organizations. He examines why most companies are stuck between broad workforce tools and a handful of top-down strategic initiatives—and what it takes to unlock automation at the team and department level. In this talk, you'll learn how to separate governance from transformation, why embedding engineering directly inside business functions is the key to scaling AI adoption, and how to use the Ask-Assist-Automate framework to move from early experiments to production-grade agentic workflows responsibly.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
To set the stage for the next talk, I had mentioned this morning that we're trying to figure out what are the conditions that must exist to get genuine 10X gains in productivity, as opposed to 20% gains — which maybe in another generation would've been something to brag about, but it seems like that's just not what the potential really is.
And so here to explain what I think a big piece of the puzzle is: Joe Beutler. He is head of solutions engineering for OpenAI's largest enterprise customers. I met him over two years ago, and I've learned so much from him. It helps that obviously he's a super switched-on guy, but also the fact that he's in a foundation AI lab — I think it shows us what is happening within their organization is going to predict what's going to be happening in our organization.
So when we last talked, Joe had shared with me a stat that blew me away. PWC benchmarked OpenAI's finance team and noted that it was only 20% of the size it should be. And so I'm sure they calculated and factored in revenue complexity, transaction volume, entity count, regulatory burden. But there's something interesting about that.
Joe was explaining what he thinks the structural reason is that enables that. It's because the CFO has an engineering manager working for her, working on integrating AI and engineering. So does the Chief Revenue Officer. In fact, they've embedded engineering managers throughout the company. And I think this phenomenon is actually going to be replicated across all organizations — in fact, I think you've even heard hints of that in the presentations over the last two days.
So here to talk about what he calls moving from centers of excellence to embedded innovation: Joe Beutler.
Joe Beutler
Well, Gene just did my whole talk for me, so I can just go ahead and sit down.
Nice to meet you all — those that I haven't met. I'm Joe Beutler. As Gene mentioned, I lead the solutions engineering function at OpenAI. I used to be across all of our strategic accounts, now focusing more on financial services. And one of the things that I wanted to share today is how we are structurally changing inside OpenAI. A lot of that is what we've learned from working with customers, and that's the advantage that I have being across a lot of our largest customers — we're seeing what everyone is trying to do as they're trying to adapt to the new realities that we have with AI.
So to kick off: there's basically been two approaches that we've seen for AI adoption. There's been this bottoms-up approach where people are providing workforce tools like ChatGPT to their employees to help them be more productive in their day-to-day work. And then from the top down, there are these large strategic initiatives — maybe one to three initiatives that companies are picking to go after — where they're going to try to build agentic systems that can automate significant workflows. So that might be automating your call center, or different things along those lines.
But what's been missing is what is in the middle, which is this concept of team agents. And so we're trying to get to where you can have more automation at the team and department level, instead of just these major AI initiatives — or "I'll just give everybody these tools and they can figure out how to be more productive themselves."
So that's the gap that I think a lot of the agent products that we're seeing come to market, and a lot of the stuff that we're working on, are trying to fill — to make sure that we're able to distribute the benefits of AI through all of the different organizations. And I'm super excited about this. I think that will address the big enterprise value gap that everyone keeps talking about.
So quickly: we've seen this firsthand inside OpenAI. One example is in our finance team. Here's one specific example where they identified a large opportunity to eliminate manual work across contract workflows. They now use the OpenAI API to extract and structure data and feed it into their downstream systems. And the result is that they're starting to see millions of dollars of impact across these workflows.
This is just one example from our finance team, which Gene mentioned — our CFO is probably one of our most AI-forward executives in the company. And more recently, she shared with us that her team built a GPT to help with due diligence questions across our massive fundraise that we just completed. That GPT was able to answer thousands of questions that we got in from investors. And so that's how we're starting to see those efficiencies where, as Gene mentioned, PWC benchmarked that the finance team is only 20% of the size that it should be for an org that is doing the type of business that they're doing — and that's not just against legacy finance orgs, that's also against comps in the market that are also more digitally native.
Another example is from our go-to-market team — so that's where I sit, all of our customer-facing employees. We identified a core constraint: our products were outpacing our ability to scale the sales team. The bottleneck was all this top-of-funnel work, which was qualifying leads, engaging prospects, and gathering the information needed to move deals forward. So we redesigned that workflow.
This is one of the things that I've been most excited about. As you can imagine, everybody wants to talk to us — that was very difficult to do, especially a couple of years ago when you could kind of count the sales team. The sales team could definitely fit in this room. And now obviously we're scaling that sales team, but part of the way that we've been able to scale our business, especially for more scaled parts of the market like SMB, is by building a lot of this automation. So we've automated everything from how we're analyzing leads to qualify them, figuring out if they should be routed to a salesperson or if it should be more of a self-service motion, to how we can actually collect purchase details to prepare a quote. For that scaled part of the market that doesn't qualify for talking to a salesperson, can we just automate that entire sales process? And that's effectively what we've done. We'll have an agent send back a quote, and then they decide whether or not they want to buy, and then it'll route those opportunities to sales if needed.
So we deploy our own technology across every function in the business, and it started with a clear mandate. Each functional leader was responsible for identifying the highest-value opportunities to drive cost savings and growth within their organization. And so what we found is that this can't just be one or two use cases across the whole company, or even per department — it's going to need to be dozens of use cases that people are surfacing as they recognize the potential impact for their org. Because the people closest to the work are the best positioned to see where AI creates leverage. Part of that is an enablement problem — people need to know how to use the tools so that they can start to recognize how they can potentially automate their own workflows. But then also, we need to combine that with domain ownership, where you have the right technical support and you can unlock better decisions on what to build and where agents can drive value within that org.
So what this has led to is three main takeaways that I'll share up front, and then we'll go into each one.
As we've embedded AI into these functions like finance and go-to-market, the lessons have been consistent. The bottleneck is rarely the model capability. We're finding that it is org design, ownership, and workflow clarity.
If you want to architect an AI-native organization, there are three things that you have to get right. The first is that business leaders need to own the outcomes. The second is that engineering needs to sit within that business function so that they're close enough to the people doing that work to understand how to actually automate the workflow. And the third is a framework I shared in Vegas — I'll share it again, it's evolved a little bit — which is to use the idea of an ask, assist, and automate framework to decide where to start and how to scale responsibly.
So we'll walk through each of those.
To start, one of the clearest lessons that we've seen is that you can't have the same person own transformation and governance, because one or the other tends to win out. We've seen department transformations succeed when the business leaders own the outcomes for their own organization — not central IT. If the head of a business unit is not accountable for the result, you probably don't have an agent; you have a demo.
Central IT should own the foundations — all the firm-wide governance and tooling, and the biggest cross-company initiatives — but the business units need to own the transformation in their own domain, because they know the work, they know the bottlenecks, and they know where AI can create the real value.
And so what we've seen with organizations like our CRO's org and our CFO's org is that they've actually started hiring dedicated innovation teams. One of the things that's really interesting about how we got here — somebody asked me about this the other day — is, how do you know as a CFO how to go out and hire an engineering leader to come in? It's something that happened a little bit more organically. I'll speak to what's happened in our sales org.
We had somebody who was a seller on our sales team who was just obsessed with AI and started building tooling for themselves, and then it started being used by their colleagues. And then we realized over time that we needed somebody who could actually own that tooling for the job function — because if you're just doing that on the side of your desk, it's not getting the attention it needs when somebody still has a full-time job. So we took that person, who was a domain expert, and started shifting them into a role where they could start to own these tools that were being built, full-time.
And if you follow that pattern, what we really see you need to be successful is these different phases. You have a domain expert who defines the requirements, the quality, and the edge cases for the agent or workflow automation that you're building. Then you also need an AI expert who advises on project selections, sets up the evals, and establishes the system behavior. Of course, fortunately for us internally, we have a lot of AI experts — but this is where my team spends the most time with our customers, where we step in to fill that role while we enable those customers to have their own AI experts who can start to identify and work on these workflows. And then the third is that you need a software engineer who can connect the context, the identity, and the telemetry.
So that's where you go from having the domain expert who knows how to do the job — they figure out how to use different tools, whether that's ChatGPT, or Codex, or any number of other tools on the market, to start to automate those workflows — but then when you actually want to establish this and bring it to something that is production-ready, you end up needing somebody from engineering to step in and help there.
What we ended up doing internally: our engineering teams were not happy that the CRO or the CFO was going to go and hire an engineer or an engineering leader directly into their organization. So it had to happen a little bit more organically. When we hired the first engineer into the sales org, they had to be interviewed by our engineering org so that they could make sure that person would be on the same comp ladder as the engineers — they wanted to make sure they were at the same quality bar.
So that team started as that domain expert who was tinkering and building tools for the team. Then we paired them with an engineer who could productionize those and start to build real automation. And then we brought in a head of innovation — that's the engineering leader who's now scaling out that team. It's been a really interesting evolution over the last couple of years.
So I started by talking about the org and team structure that we're seeing take shape in order to drive this innovation across the business. Now I'm going to go a level deeper and talk about what it takes to actually build these AI systems or agents once you have those teams in place.
When people hear "agents," they usually want to jump straight to automation. That's the big prize. That's where everyone's trying to go. But in my experience, I haven't really seen that work. It's really hard to go from zero to full automation, especially if you're in regulated environments or enterprise environments where you have all of those compliance things you need to worry about as well.
So you have to start with a strong foundation — with evals and governance — and then you can build toward automation. The way we've seen that work is that you start with ask, which is basically: if you're building any kind of agentic system, the first actions you want it to take are to reach into your data and do read-only operations. Have it go out, pull in information, answer questions. Then you'll be able to validate that it's able to answer the questions, find the sources correctly, do that the way that you want it to, before you start to give it any kind of access to write actions or anything that could be potentially destructive.
Then once you get really confident in those ask-level capabilities, you can move to the assist function, where users are able to start to complete tasks. You would have the agent recommend what action it thinks it should take, and then you have the human in the loop to validate — "Yes, this is the right thing, go ahead and execute that," or "No, this isn't quite right" — and that allows you to recognize where the agent's falling short before you actually let it go off and work on its own. What happens naturally is you realize that there are certain types of things the agent is always getting right — it figures out the right tool to call, the right action to take. And so you can just stop putting those in front of humans, and that's where you naturally end up at the automate phase.
I learned this framework working with T-Mobile, starting a couple of years ago, where they had really big ambitions for AI transformation. Their goal was to automate their entire call center — or at least 75% of it. We got them to come back a little bit. They had already automated about 60% of those customer interactions, and so this was targeting 75% of the remaining 40% — and that was a $3 billion call center for them. So this was a massive prize for them to go after.
They wanted to go, of course, straight to automate, as everyone does. And we kind of walked them through: "Hey, we have to build foundations. We need to build evals. You're going to need people who are the experts doing the job to validate the output and validate the next actions it's going to take." And so we were able to show them that it was actually best to start with ask — so that you could make sure you're pulling from the right data sources — and then starting to have that human in the loop before you start executing any write actions.
So with this foundation and framework, we've seen success with really any use case where you want to automate critical but defined workflows that are being done by people today. The biggest value is obviously at automate, but that is also where you have the most integration, the highest quality bar, the strongest evaluations, and the most operational ownership from the people doing the job today.
Aside from the challenges of getting support from the business users so that you can start to automate these things, it happens much more naturally when you start by building tools that help them do their job. When you go back to ask — if you're creating some kind of copilot, or whatever you want to call it, that helps them know how to do their job and answers questions for them — then you can start to hill climb toward that full automation. And once they see the value in the tool, they'll provide you feedback on how to improve it. So eventually you get this natural flywheel as you hill climb toward automation. Plus, you start to get value quickly as it grows, and then it grows exponentially as you unlock each new level.
So I want to make this concrete and share a couple of examples. To start, I'll talk about what we did with a large insurer to build an auto claims agent.
In the ask phase, they built a read-only chatbot that helps their team answer FAQs and pull together policy information, customer records — basically everything across the firm that they would need to answer questions when they're evaluating one of these cases. And once this is proven, you can make this chatbot customer-facing. So it can start as an internal tool for client-facing teams, and then eventually you could just make that external so it helps people directly.
Then once you have high confidence in that, you move to assist, where it starts to draft the analyst's work. It's able to summarize issues, flag gaps, and prepare recommendations for each next step — but still requires human approval before taking action.
And then in automate, you reach the prize: once the workflow is stable and the quality bar is cleared, it can begin handling standard low-risk cases end to end, and is only routing exceptions to humans for review when you get there.
One more example. Here's the same progression that we've seen with another customer in wealth management.
There are millions of people going to ChatGPT for financial advice today, and these are clear opportunities for businesses that have direct consumer exposure to build these types of automations. At ask, the agent helps advisors answer questions quickly — they can pull product details, account history, see prior interactions, and even get valuable market research context. This creates real value for the advisors at the firm very quickly.
At assist, it starts doing real work: it will draft prep notes, create follow-ups, and draft portfolio review materials that the advisors can review and approve before they're sent to clients.
And then once proven, you get to automate, where you can start to automate a lot of the routine work and client interactions that these advisors are doing today. Obviously we have to be careful with things like actual investment advice — I feel like I need a disclaimer here — but the reality is that millions of consumers are using ChatGPT to do tasks like this today. And so there's a real opportunity to start pushing into building products for consumers, even in more sensitive areas like wealth management.
If you're still not convinced, we've also done this across many different workflows with BBVA. BBVA is one of the fastest-moving financial institutions I've seen, which is obviously very difficult in a regulated industry. And they're building these types of processes across multiple departments in parallel — everything from customer service to sales agents that they call their AI banker, to risk automation and operations automation. And that's all following the same pattern of ask, assist, and automate.
So now today, building these high-quality agents takes a lot of work. You need that domain expert who knows how to do the job, the AI expert who understands what's possible with the tools today, and the software engineer who can actually build it if it requires tying into back-end systems. It's a lot of work, and it takes time.
And as I mentioned earlier, one of the biggest prizes I see right now in enterprise AI is no-code agent builders where you can start to build agents to automate your own workflows. This is one of the main focus areas for us right now.
On the "tomorrow" side, I think that's the end state that we want to get to — instead of having to have a software engineer to tie into APIs and your different data, and build custom evals and guardrails for each of these tools, we want to be able to productize that. And so that's one of the differences in how we approach a lot of these problems. We don't have a massive professional services business where we're trying to go in and get lucrative contracts that will last 10 years. We're trying to enable each individual to figure out how to use AI on their own and start to build these things for themselves. And so that could look like an agent-building platform that's no code, has continuous evals built in, has all the governance you need to get through your IT security, but also has connectors that tie into the different data sources that matter and provides skills so that you have repeatable things you can share across the business.
Now that you've seen some of these examples, the next question is usually: where do you start? This is how we apply the model internally with our own teams and with customers. And this is kind of an eyesore — I made it myself — but this is what I did with my team on our solutions engineering function. We went through and identified all of the different workflows that make up the job we do today. We tracked how we're using AI against those, how often we're doing those tasks, how much time we're spending on them, where we're using AI today, and where we think there's an opportunity to get to more assisted AI functions or full automation. And so that's how we identified which use cases could be most useful.
One of the difficult things is you can't build all of these at once, so you have to be able to pick your bets. You start with the experts on your team that are already building and using these agents, and then you can start to identify which of those bets you should elevate and move from the individuals doing the work to that centralized team that's then going to own getting those up to production value.
Once the agent is built and clears that performance benchmark, the real fun begins — everyone in the business functions or even cross-functional teams can start using that agent. If I build an agent for my own team that's doing a function that typically other people are reaching out to my team to do, a good example is our security questionnaire agent that someone on my team built. Typically, we'd have salespeople, customers, or customer support reps asking the solutions engineers who have that security expertise to answer those questionnaires. We now have an agent that they can go use directly, so they don't have to come to us.
But of course, these types of tools need management. We keep introducing new products, there are new security concerns that people will have. So we're starting to move toward a world where part of someone's job is as an AI agent manager, and we could see this becoming a full-time job in the near future. You can imagine that if you have somebody who built an agent and they're that domain expert, you're then going to need to manage that agent, make sure the output quality is high, and that it's staying up to date on the job function — the same way that you would need training and enablement for people on your team.
So I want to leave you with these three key insights to make sure that you're set up for success.
First: keep governance and transformation separate. Central IT should own the platform, guardrails, and shared tooling, while business leaders own the outcomes for their own organization.
Second: scale happens when engineering is embedded directly into the business function. They're close to the experts and the workflows that are selected for automation.
Third: every agent needs a domain owner. Someone has to be accountable for the quality bar, understand the edge cases, and know when the system is ready to scale.
If the last decade was about digitizing workflows, the next one will be about building and managing agentic workflows. And this is the real architecture question that is in front of all enterprises right now.
Gene always asks for these "help I'm looking for" slides. These are two areas that I'm currently working on a lot that I'm really obsessed with. Obviously, the second one — re-architecting teams — is what we talked about today. I'm also super interested in what's happening in the product development lifecycle, so outside just the core software development lifecycle. If you're working on that area, if you're already working on this, I'd love to hear what you're doing. If you're interested in experimenting with some of these concepts, I'd love to connect and see how we can collaborate.
Thanks, everyone.