Beyond the Hype: Lessons Learned on the Path to Enterprise AI Adoption at OpenAI

Log in to watch

Las Vegas 2024

Beyond the Hype: Lessons Learned on the Path to Enterprise AI Adoption at OpenAI

Joe Beutler

Solutions Engineer · OpenAI

Beyond the Hype: Lessons Learned on the Path to Enterprise AI Adoption at OpenAI

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

I met the next speaker, Joe Beutler, who is a Solutions Engineer at OpenAI. I met him at the LaunchDarkly conference a couple months ago, and he presented this incredible survey of how specific enterprises were solving their problems, often with the help of very talented people from OpenAI. It was inspiring and mind-expanding. So I was so delighted that he was willing to share his learnings here. This talk would've been great on the ultimate Gen AI Learning Day yesterday, but some stuff came up and I'm so happy we could accommodate him today. So here's Joe.

Joe Beutler

Alright. I'm the difficult one that has to use my own machine. So here we are. Hello everyone, I'm excited to be here with you today. I'm Joe Beutler, Lead Solutions Engineer at OpenAI.

We're going to try to do something today that has never before been done by an AI company. I'm going to try to take you beyond the hype and talk about actual lessons learned on the path to AI adoption in the enterprise. I'll talk through some examples from work that my team has done with a lot of our enterprise customers.

A little over a year ago when I started looking into working at OpenAI, I thought it was mostly just research and ChatGPT. I found that's a pretty common view in the enterprise — but OpenAI is also a deployment company. I'm going to walk you through what that means and share some examples of how my team has helped enterprises craft their AI strategy and execute on that, and move toward finding solutions that are impactful for their business at scale.

01Agenda

For a quick walkthrough of what we'll do today: - AI strategic vision - Multimodal AI (with customer examples) - Live demo combining capabilities — if the demo gods allow

02AI Strategic Vision — three ways enterprises derive value

We see companies deriving value from partnering with OpenAI in three ways.

1. Enabling your workforce with AI. This often starts by using tools like ChatGPT to assist a business unit like the finance department with forecasting, or the marketing department with creating new content. 2. Enhancing your operations — for example, automating tasks that are currently manual processes, which is very common in customer service. 3. Infusing AI into your products. This is really the emerging space — something we haven't seen a lot of successful implementations of. We work with customers using our APIs to help create a tailored experience for their customers, including high levels of personalization, that they can launch directly in their app or service.

03Framework — five levels of impact

A framework for how I think about deploying AI across your organization: - Individual impact — increasing employee productivity. - Team impact — using AI across your team so the efficiencies of individual impact compound into team-level productivity gains. - Org impact — automating operations across the organization with custom solutions using our APIs (or other APIs). This is what most companies should really be driving for. - Network impact — using AI to go outside your organization, impacting customers or partners. - Industry impact — revolutionary things that are trying to change entire industries.

04Healthcare progression example

With a healthcare organization we've worked with, the staircase looks like this: - Start by streamlining admin tasks to help create an AI-enabled workforce. This is where a lot of large enterprises are focused today, because this is the first step and it's really a change-management exercise. - Move into AI-enabled departments — automate operations at the next level, reach AI efficiencies that start to impact the organization. - Then to AI-infused products — in healthcare, looking at hyper-personalization of healthcare plans or health education provided to customers. - The holy grail is contextual AI — customized models or different approaches where you can do things that are truly innovative. We have a lot of healthcare customers trying to change their R&D processes, accelerate R&D for things like drug discovery.

05Prioritizing use cases

Enterprises we speak to sometimes come to us with hundreds of use cases. This is great, because the people that have these use cases are often the ones closest to the business and understand the business problems. But it's also overwhelming. As the previous presenter [Jason Yip] talked about — if you're doing AI transformation in your organization, you have to focus in on the problems that are going to have the most impact.

My team spends a lot of our time with customers helping them qualify and prioritize their use cases. We see the most success when companies have already started to build out their gen-AI muscles with smaller use cases — lower impact, but helps them understand the capabilities of the technology. Then we usually spend our time in the upper right of the qualification matrix, helping with the major projects where we think there can be value driven across their organization or across their industry.

06Multimodal AI

Multimodal AI is a new emerging capability we're seeing this year, and it's what I'm really most excited about — a lot of the use cases I'm focused on with my customers.

We're working with customers to develop solutions that combine more than one modality of the LLMs: text, voice, Vision (for image recognition), DALL·E (for image generation), and soon hopefully Sora for video generation. Bringing these together in a single model has opened up a new set of use cases that weren't viable before, when you had to make multiple calls to a model.

My team has helped deploy gen-AI solutions across core pillars: customer service; knowledge assistance (helping internal employees with tasks); audio services (using audio capabilities to create new products or services that weren't possible before); content generation and recommendation engines (where you get into personalization).

When we talk about multimodal solutions, we're referring to using the models to ingest or output data in formats other than text. Our multimodal models can do text + image, text + image + audio in, and also text + image + audio out. The most interesting multimodal solutions I've worked with have come in the audio services and content generation pillars.

07Customer Example 1 — Moderna (AI-Enabled Workforce)

The team at Moderna provides a great example of how leadership can drive impact with AI. When they rolled out ChatGPT, they tested it ruthlessly and tied usage to real-world business results.

After rigorous testing against a homegrown solution and other products, Moderna decided to roll it out to their entire base of knowledge workers. As is natural to ask anytime you're rolling out new software — does anyone actually use it? Their team has built over 700 custom GPTs in just the first two months after they launched. Those custom GPTs are essentially no-code mini-apps that enabled their workforce to create processes that weren't possible before.

They've also created their own GPT store — their own directory to help employees with discoverability of solutions that had already been created across the org. An example of one of their really popular GPTs is the Dose ID GPT, which is widely used by their researchers to advise on treatment doses.

08Customer Example 2 — Klarna (AI-Automated Operations)

Working with customers, we found that it's relatively easy for them to get to an 80% solution. Driving that last mile is what gets you from a prototype to a solution that realizes value in production.

Our work with Klarna was one of the first real proof points of gen AI in the enterprise. It's been referenced a lot, but the public narrative doesn't really tell the full story. Klarna isn't just automating the easy stuff — that's usually the first thing people say when we talk to them about customer service automation: 'oh, they probably just automated all the easy stuff, I've already done that.'

They're a digital-native FinTech, so they had already reached 50% automation on their customer service tickets. They worked with our teams to automate two-thirds of the remaining tickets that were still requiring human intervention. This required: 1. First, understanding the processes their support agents performed when interacting with multiple internal systems. 2. Second, rewriting those policies and procedures in a way the LLM can understand. 3. Third, connecting to their internal systems via their own API to read and write data.

The result: they were able to reach 85% automation across their customer service processes. This is one proof point, but we're seeing similar results working with customers at much larger scale. I'm hoping we'll have more to share about some of those customers soon.

09Customer Example 3 — Color Health (AI-Infused Product)

Color Health is using our Vision capabilities with GPT-4o to analyze medical records and create personalized care plans for cancer patients. They came to us with the ambitious goal of wanting to use LLMs to cure cancer. When they asked for our help to explore this unproven solution, I told them I'd be happy to do anything that could one-up my two doctor siblings and make my parents proud. (Sounds like there might be some other black sheep in the audience who went into tech instead of medicine.)

The current state of cancer care planning in lower-income or rural areas was really concerning. Patients in areas like California's Central Valley did not have access to specialized oncologists. Their primary care physicians were often having to send their case to the specialists at major hospitals in larger cities. The backlog for reviewing these cases could be weeks long, and often resulted in sending it back without a treatment plan because they had missing labs or other tests that needed to be run.

When Color rolled out their cancer co-pilot, it enabled the clinicians to review records in only five minutes by having it in a standard format where it pulled in all of the data. It also helped flag for the physicians creating those cases anything that was missing that they needed to include. We're really excited by the early results, and hopefully we can help them get closer to curing cancer as the models become even more intelligent.

10Customer Example 4 — Spotify (Multimodal AI in a Consumer Product)

Spotify is another example of multimodal AI infused in a consumer product. My team helped them build a system to translate their top podcasts into different languages. Audio translation is very expensive and time-consuming and very manual. For a free service like podcasts, it's hard to justify the investment to translate all of their popular podcasts — especially into the long tail of languages spoken by their international audiences.

Spotify used our Whisper model to transcribe their podcast audio, then passed that transcription to GPT-4 to translate into the desired language, then passed that translated transcript to our text-to-speech model to generate audio in the target language. Today, this is a three-step process. We're now working with customers using our new models with GPT-4o that have the multimodal capabilities to do all of this in one step.

We gave Spotify access to our private Voice Engine feature — essentially the ability to clone a voice. So they could use that for their podcast hosts and their guests. Now listeners can listen to popular podcasters in their own language. If you do speak another language, you should try listening to Lex Fridman's recent interview with Sam Altman — I think it's hilarious listening to Lex and Sam converse in Spanish.

11Live Demo — Nature Documentary

Now I have a demo that I put together to show how we can combine these multimodal capabilities to create something truly special. We're going to have some fun and create a nature documentary together.

To start: the prompt you see on the screen was a prompt that was used to generate a Sora video. We're not actually creating the Sora video in real time, but this was a video that was generated completely with Sora. It wasn't enhanced other than to overlay the audio track you'll see.

[Sora-generated video of a tree frog on a branch plays.]

All right, I think we get the idea — a tree frog on a branch.

Now I'm going to click Extract Frames. This is just going to use normal Python code to pull six different frames from that video. Now I'm going to click Analyze and Narrate. We're going to pass these six frames with a prompt asking the model to generate a narrator's script for a nature documentary. This is happening live in real time. Now we have our new script that we've never seen before.

For the next step, we're going to create a voiceover. I'm going to use the Voice Engine capability to clone someone's voice. I did ask for a Guinea pig, so I'm going to ask Gene to come out and help us. Give him a round of applause for being my Guinea pig.

Joe Beutler (to Gene Kim): Once I click this, I just need you to talk into the microphone on the laptop for about 15 seconds. You can say anything you want — talk about the day.

Gene Kim: I'm having a great day here. My favorite animals are pandas. Pandas have a reputation of being gentle, but they can be quite ferocious. I watch a lot of videos on YouTube of pandas attacking.

All right, so now I'm going to click Narrate Video. We're going to see how it does capturing his voice, and turning Gene into our narrator for our nature documentary.

[Voice-clone narration plays:] 'In the lush greenery of the forest, a vibrant frog with eyes like shimmering jewels ventures along a branch. Each careful step is a dance of life in this rich and diverse habitat.'

I'm gonna try that again — I think we have time. I don't know where it got that voice sample. Okay, live demos — gotta love 'em.

I actually did that on purpose so that you know that this wasn't all pre-done. Try it one more time.

Joe Beutler (to Gene Kim): Can you try leaning in a little more?

Gene Kim: Sure. I've so much enjoyed the day. It doesn't actually matter what I say, right? Because the purpose of this is to capture my voice, of which then it will be cloned ideally, right? And then be reused. So be aware of pandas. Okay.

All right. We'll try that. I wonder if we're getting some feedback from the speakers.

[Voice-clone narration in Gene's voice:] 'Amongst the verdant foliage, a vibrant frog moves gracefully along a branch overhanging a serene stream. Its striking colors are a dazzling spectacle, a brilliant signature of its species. Carefully, it navigates the bark. Each movement measured and deliberate. The wild is a stage in this amphibian, a skilled acrobat readying for its next performance.'

All right, now we'll try it with a surge of energy, translating it to Spanish.

[Spanish narration in Gene's voice plays.]

Do you have a preference for Japanese?

[Japanese narration in Gene's voice plays.]

All right, thanks for some fun feedback. Thank you.

Thank you, everyone, for bearing with me. It's always fun with A/V at these presentations.

12Closing — what help I'm looking for

They asked me to say what help I'm looking for. I'm curious what people are doing with gen AI. If anyone wants to reach out, my LinkedIn and X profile are on the slide. I'm curious what's working, what isn't, and what opportunities you see in the different areas you're working in. I also threw on some personal interests in case people just want to grab me and chat. Thank you.