Log in to watch

Log in or create a free account to watch this video.

Log in
Las Vegas 2023
Share
Download slides

Frontiers of Generative AI

Frontiers of Generative AI

Chapters

Full transcript

The complete talk, organized by section.

Joseph Enochs

Thank you, Gene. Thank you, Gene.

All right. Tray tables up, seats in their upright and locked position, seat belts securely fastened. Are you ready to embark on a journey into the frontiers of generative AI? Let's go. Let's go.

I've known Gene, like he said, for a decade, and we were doing an event earlier this year. When I brought up some of the things I've been doing around generative AI, that's all we talked about the entire day. We started talking sometimes every week about cool things that you can do in AI.

To start with, I can't overstate the interest everyone has in generative AI. Here's my boss, Joe Rumsey. When ChatGPT came out, he had a million questions for me, and he reached out to me and he was like, "Joe, hey, can you explain this to me like an old-school storage guy?" And I was like, "Absolutely I can," right? And so he's excited about it. I'm excited about it. The people talking to him are excited about it, and I bet that people around you are interested in it too.

Can you have him scroll the speaker notes?

As a quick introduction, again, I'm Joseph Enochs, managing director of EVT. I spent eight years serving in the Marine Corps, and afterwards I worked in various special projects, creating data products for many flight tests and a few space flights. At EVT, we spend a lot of time being of service and understanding our customers while helping them operationalize the emerging technologies.

You good, Gene?

Gene asked me to give everyone a primer, again, on generative AI, just in case you haven't spent hundreds of hours watching videos and reading papers and doing code experiments with LLMs, like I've been so fortunate to do. I'll talk about what Gene and I worked on and some of what we learned.

I'd like to start by sharing a little bit about the bewildering pace of AI. This slide represents trends from arXiv.org, which provides early-release scientific papers. And these are not papers that are like basic papers. These things are not like tweets. They're very serious. They take months to write, and you really have to spend some time focusing on them.

So again, starting here at the bottom left, you can see ML and AI, and you can also see language models. Then on the right-hand side, you can see large language models at the top. This is some of the things that a lot of people have been talking about with agents. I'll leave the reference to this paper that you can see here. This paper is all of the agents that have been released, and you can see some of the small ones here. People mentioned coding agents. People have mentioned some other agents, but every one of them is cataloged in this paper that we'll give you for reference.

One of my favorites is this one here on the right from Hugging Face. A little bit more about Hugging Face later, but they're really like the GitHub of AI right now. This particular model is one similar to what Mick talked about, where in the center you have the planning, which could be a smart LLM, like GPT-4, and then a Hugging Face LLM, like a LLaMA 7B or a Falcon model, that's really purpose-built for your particular use case.

But I think the most important thing here is to look at the timing you see on the right-hand side. You can see the time of these things coming out. Basically, decades of work are being done in weeks.

Again, things are moving so fast that even these big players are realizing it. This is a leaked memo from Google. It was validated, right? You can see here it's labeled, "We Have No Moat." Demis Hassabis, CEO of Google DeepMind, verified the memo, but he doesn't necessarily agree with all the points.

But for our purposes, I want to focus on a couple things, starting with the release of LLaMA, where you can see the dates down here. March 3rd, it was released. Then 16 days later, the community had fine-tuned a 13 billion parameter model to be in parity with Google Bard, which they spent a tremendous amount of money to build.

At the bottom right-hand side, I want you to pay close attention to training time. These experiments are happening within an hour, meaning that they want to try something new with a model, they can fine-tune it, and an hour later they're getting feedback.

This reminds me of early days of DevOps when we were talking about releases per day. This is the moment that we're doing that for AI.

Just on a personal note, me and some of my coworkers were like, "Hey, what would it take for us to build an agent?" We looked at this tool, perplexity.ai. If you haven't seen perplexity.ai, basically what it does is to avoid hallucination, you type in something that you're looking for, it will search the internet and pull back all the latest and related documents, look through those, and synthesize those into a response.

You can see this here on the right-hand side. At the top there, Perplexity, and on the right, you can see basically our version. So you've got a company that spent millions of dollars, and then you've got the community, us, about 48 hours and some elbow grease, and we can spit out very similar results.

Now, I'll leave these results here for you to see afterwards, but literally for me, I spend three or four hours a day just using this tool to save time finding valid references on the internet.

Talking about cost, Sam Altman, CEO of ChatGPT, estimated that there were a hundred million dollars spent in training GPT-4. You can see highlights of some other costs here associated with the cost of these other LLMs.

There's some other things on this chart, which I will actually share as well. It's a paper about all of the LLMs as they're released. This one has just been updated a few weeks back, but pay close attention to the years of the releases. You can see 2019, 2021, all the way up to 2023. Look at the trend. Look at the trend in blue. These are the open-source models that are peaking.

Look at the trend in the closed-source models, right? It looks like some sort of bell curve, but 2023 has become the year of the open-source model.

And don't think the big players are out of the game just yet. Notice at the top we have Google's DeepMind Gemini, and it's got 8 trillion parameters that it's planning on being trained on when it's at release. Basically, if you look at the token, 65 trillion tokens, that's basically every word ever written and spoken in any language from all time. So when that comes out, it's going to be probably indiscernible from magic.

So I want you to go and take a look at Hugging Face a little bit deeper. Dive into Hugging Face if you haven't. It keeps track of the leaderboards, benchmarks, models, and spaces. You can see here the LLM leaderboard. This is for the open source. And you can also see the Chatbot Arena, where the open-source and closed-source models sort of fight it out to see which one's on top.

I also wanted you to pay close attention, if you're a builder, to look at TheBloke and others. As we say when we're grabbing these models, TheBloke brings it again, because what TheBloke does is he will take a model that's been released, and he will quantize these models.

When they're released, let's just say a 7 billion parameter model is huge. Well, he quantizes that and compresses it down so that you can prototype with it quickly. He recently got some funding to continue doing that. He was really doing it on his own for a while.

And you can see here these Hugging Face Spaces. So you can go straight to the Spaces, and if you want to see Falcon 180B, you don't have the GPUs, you can use it here in the community and interact with the community and post your own models and own fine-tuning models.

So Gene asked me to give some sort of navigation associated with some of the things that you might see. This is a look at really kind of an ensemble or a mixture of models. This can also be an agent framework.

This is a tool that we built so that if people are doing retrieval augmented generation, you can have one of the blocks as a retrieval augmented generation. If you have fine-tuning plus retrieval augmented generation, or you can have GPT-3.5 versus GPT-4 versus Anthropic, whatever set of tools you want to validate, you can put those in there and see them all side by side in real time.

I want you to pay close attention to some of these other elements out here: your cloud providers, the frameworks, vector databases, which we talked about, orchestration, inferencing at the data center and edge, test and evaluation, and debugging. And here's a list of these players that you'll hear: obviously Pinecone, Weaviate, Chroma, and LangChain, you've heard of those on the left-hand side.

And a lot of people don't know this, that the Apple hardware for inferencing is actually one of the most cost-effective ways to do inferencing right now, is to use the M2s.

Now I want to take just a minute to sort of demystify tokens, encoding, embedding, and transformers. Basically, we have these documents that are broken down into paragraphs. Paragraphs are broken to sentences, sentences to words, and words are encoded into these tokens.

You can see these sentences here: "My favorite color is space, lowercase red," right? And the next one, "space, uppercase red." So look at this. Notice that the unique tokens for each word are the same up until that, right? The unique token for lowercase red and uppercase red, and then with red with no space. You can see how these are encoded.

So I want you to think of these tokens and everything we're talking about here, the documents, as really like puzzles. The tokens are the puzzle pieces. The embeddings are the relationship to the pieces and how those pieces fit together. And the transformers, these are just compressing these puzzles into language understanding.

So underneath the hood of this, and you can look at this on your own time, but underneath the hood, I will build this out so you can see how these relationships work. But the most important thing to see here is that transformers are a unifying architecture, which means that they don't only read and write, they can also see and hear.

So some strategic decisions and critical resources that you'll need. This is from the AI architecture, and this list is extensive, but for purposes of generative AI, you really want to be focusing on the right-hand side, which is really the things that we've been talking about today. If you want a more comprehensive view of this, we can definitely talk about this later.

But some critical resources: obviously your cloud providers, your platform providers, Hugging Face, because again, this is where all the models are at, and it's a center of gravity for those things, LangChain. I just can't stress enough that LangChain is really where things from the open-source community are being integrated, and arXiv.org.

Now, I'm not saying that you have to go out like Gene and I have been doing and reading the latest and greatest papers that come out. But if you have an AI project, somebody in your team needs to be doing this, because these things are moving very, very, very fast.

So a few things I'll read through and then hand it over to Gene. This is a rapidly changing landscape. There are really dangers of outdated understanding. You've got to embrace this advancement, right?

None of us are natives here. This is all coming. It's a quest, right? Nobody knows where it's going. Treat your data as treasure. That's really the only moat that people have right now, is their data. And these ethical components, be mindful of the development that you have.

And I want everybody to really prepare for expanded use of personal devices. I don't mean just cell phones and tablets. Have you guys seen this one? This is coming out from Meta here really soon in Ray-Ban. These large language models will be able to see and hear and speak to you in your ears. It's coming soon. You're going to need to prepare for it.

And with that, Gene.

Gene Kim

Yes. It was so invigorating to learn about all these things from Joseph.

One of the things I articulated to him some months ago was this problem I wanted to solve for years. Well, no, no, for months. We have all these 1,200-plus talks in the DevOps Enterprise video library. I think I've watched more than most, but I haven't watched all of them.

One of the things I want to do is find these amazing breakout talks, right? So that, you know, the ones who can bring the best to the primary stage. And so I was asking Joseph, wouldn't it be amazing to generate a one-page summary for the experience reports? Let's extract the business problem, the metrics, and the shared testimonials, right? As Paul was saying, it's about the value that was created.

For years, we've been generating transcripts using Rev AI. We've done that for all the talks, and each talk is typically around 6,000 tokens. And so when Joseph was saying how life moves fast, Joseph built all these tools to chunk up these talks so they would fit into a 2K token window. And you know that Patrick talked about this, it's called retrieval augmented generation, yesterday.

But then in August, OpenAI expanded GPT-3.5 to 16,000 tokens, so we didn't need that anymore. And Joseph was like, "Yep. Like I said, life moves fast." Patrick also mentioned the feeling where you build up a whole bunch of custom tooling and then another vendor comes in and actually provides a solution. So we felt that as well.

What's amazing is that if you go to the DevOps Enterprise Library now, for a couple selected talks, here's David Keane from HSBC. If you scroll down, there's a summary that we've delivered. And so it will say, "Shown below are the business problems, outcome of the metrics, present testimonials. These are prepared by me. I'll take all ownership for inaccuracies, assisted by AI."

What I found was that LLMs can be so good extracting these features from talks, especially since experience reports are given in such a standard format. It can pull out the metrics. It can pull out testimonials. I mean, it's just really, really great.

So learnings. These were all generated using GPT-3.5 Turbo. I used few-shot prompts, and I often used Claude 2 to generate the first-cut answer to give examples of what good summaries look like.

I guess the real big learning here is I couldn't have done this without Joseph's help. I'd ask a question, and he'd say, "Oh, yeah, I just read a paper on that." And I have not read any of the papers on arXiv, but Joseph has.

I think it just reinforces the notion that to go on interesting quests, you have to have interesting friends, right? It's hard to solve level-30 quests if you're a level-three caveman. All the more reason to hang out with really smart people.

Let me just summarize one thing that also blew me away. We're thinking about writing some supplemental materials for Wiring the Winning Organization. So I put into Claude AI from Anthropic, "Here's our definition of slowification, simplification, amplification. Here's a case study. Extract all of the examples of these three mechanisms." And I was blown away by how good it is.

Claude 2 is so good at extracting things around language and concepts. It was just so... hopefully you'll see more stuff that we're generating around that.

So with that, Joseph, want to close with help you're looking for?

Joseph Enochs

Yeah, sure. As we, there's a map behind here that we'll share around with foundational models and that journey, but I would love to connect with anybody.

These things, again, are near and dear to my heart about being able to bring these large language models to everyone, right? Not just to the major players, but to all of us. I want to find interesting ways. And if you want to talk more about how we can do that together, then come find me.

Gene Kim

And how will people find you, Joseph?

Joseph Enochs

At the networking session tomorrow.

Gene Kim

Very good. Thank you so much.

Joseph Enochs

Thank you.