Q&A with Dr. Ethan Mollick, Author of "Co-Intelligence"
Q&A with Dr. Ethan Mollick, Author of "Co-Intelligence"
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
All right. Fantastic. So on the topic of generative AI, if you have any interest in AI, you have probably followed the work of Dr. Ethan Mollick, currently associate professor at the Wharton School and at the University of Pennsylvania.
I was such an avid reader of his work, either on Twitter or on Substack. One useful thing: I fell in love with his writing because it was so clear that he was on the frontier of exploring how so many different parts of society should or could adopt these seeming miracles afforded by AI.
First off, my heartiest congratulations to you, Dr. Mollick, for releasing your book _Co-Intelligence_, which instantly became a New York Times bestseller. I absolutely loved the book, and I think it should be required reading for everyone, especially our business counterparts.
I got to meet Dr. Mollick when I was invited by my friend James Cham to an event before the book was published, and it just blew me away. Given that this community is being asked to lead AI and engineering initiatives for their organizations, I thought it would be so incredible to have him share his perspectives with this community. I'm so delighted that he said yes.
So, I've introduced you in my own words. Dr. Mollick, can you introduce yourself and what you've been working on that you've been having the most fun with these days?
Ethan Mollick
Oh, it's exciting. Thank you for introducing me, and thanks for the kind words about my book, which is right here.
Actually, the funny thing with writing a book like this is that I had to wrap most of it a year before it came out, or six months or so. So I feel pretty good that people are like, "It's a really good introduction to AI," and I'm like, if I imagine now that it's a good introduction, that means I was pretty far ahead seven months ago, before it came out. So, a good sign.
I'm a professor at Wharton. I study innovation and entrepreneurship, but I've been working for a very long time on using AI for things, since working with Marvin Minsky at the Media Lab. But I've never been the technical person, so I've always been the business-application-use and education person, which is why I think I've become useful: because I think about how this stuff is applied and what it matters.
What I'm having fun with right now is, I think the same three things are the really interesting trends in AI: specialized devices -- this is actually an AI in a box that I've been playing with -- large context windows, and agents. Those are the interesting things for what's coming up next, I think.
Q&A
01Hidden user innovation and incentives
Gene Kim: Oh, super, super. You said something in your book that struck me as really important, not only for technology leaders but all the leaders that technology leaders interact with. You wrote, "People are figuring out ways to use AI to make their jobs easier and better. The results are often breakthrough inventions, ways that AI could transform a business entirely, but the inventors aren't telling their companies about their discoveries. Instead, they're keeping them in secret." So they're hiding it from people.
Can you talk about this phenomenon, why you think it happens, and what should leaders be doing to encourage and celebrate these types of innovations so they can actually take advantage of it?
Ethan Mollick: Yeah. It's a really interesting problem. We've always known that innovation comes most from users of technology, because they're the ones with the need, right?
If you are the CTO, you want to see your company be more efficient, be better. If you are someone on the line who has to write an email message every day to hundreds of people, it's very cheap and easy for you to experiment about how to write a better email message, and you're highly incentivized to do that. The CTO is incentivized to try and figure out a general solution, but not to solve the problem.
So people experiment all the time with their own work, which is just a universal. Everyone's using LLMs to do their work, and they're just not telling you about it. One of the most universal things I see is that everybody's using LLMs to do all their performance reviews of their employees, which is obviously exactly what you don't want to see them doing, but absolutely everybody is doing that.
One of the things that I've been noticing is this: when I talk to people privately, they tell me they're using AI for everything, especially multimodal -- to take a picture of the screen or whatever and then ask the AI to do the work. I spoke to someone at a large bank who wrote the policy to ban ChatGPT use, and she used ChatGPT to do it.
So there's really a hierarchy of reasons. The first reason is unclear rules, especially rules where people don't know whether they'll be punished or not. A lot of companies are either not embracing it, or if they're embracing it, they're embracing it with all sorts of caveats: if you use it incorrectly, you'll be punished. No one knows what correct or incorrect is.
The second reason is Reddit is full of people who talk about how they're now viewed as wizards because they can get ten times the amount of work done. Why would they want to show you that that work is being done by AI?
Third is, if they do show you, are they going to get fired because you realize they don't need as many workers? We just caught the tail end of that really interesting thing on customer service. If you're a customer service agent, why would you tell anyone? Because all you're going to do is show that your job might be not useful.
Or even if you're not worried about that, maybe you'll end up in a situation where you don't get recognized for the extra work, or you just get assigned extra work to do, or you're just better off launching a startup anyway. All of those overlapping reasons make it really hard for people to be willing to talk about what they're doing.
Gene Kim: What would your advice be to break those terrible conditions, where we're getting the opposite outcomes of what we want?
Ethan Mollick: Part of this is about your company culture that you already had. A company that has a sharing culture, where you trust the administration is not going to fire you, is going to be very different than one where it's a competitive culture and people get laid off all the time. You have to live with the culture you've got, first of all. Everything we know about building a good culture matters.
Then I think you have to radically rethink incentives. What is the incentive not to show? I've talked with companies that have done some fairly radical things, from giving $10,000 rewards at the end of every week to whoever comes with the best prompt, to thinking about, do we promise we'll never fire anyone for the next year because of generative AI? I think you have to be realistic about the incentives people have.
Gene Kim: One of the mind-expanding things that you put in your book was some of the bonuses being up to a year's salary, which I thought was awesome and very mind-expanding.
Ethan Mollick: It's funny. There's an old science fiction author named Robert Heinlein, and that was one of the things he had said in his book. He talked about how, to have people build robots in one of his books, how the U.S. got an economic boom was if you replaced yourself at your job, you were paid for that job for life, and then people kind of did the work.
02RAG, hallucination, and changing model economics
Gene Kim: So good. There was another comment that you had made at James's event that sort of jolted me. You said something along the lines about your fears about the use of retrieval-augmented generation. In fact, that was one of the techniques used in the previous session that you caught the tail end of.
This is absolutely one of the most widely used techniques in AI, where we feed source documents to create things like chatbots. But you mentioned how you were skeptical about this practice because of how inherently AI makes stuff up all the time, and it's difficult to detect these confabulations effectively.
Can you talk about why -- I'm not sure if the word is skeptical -- or what your concerns are, despite the software industry putting massive bets on this technology, hoping to make it viable? And some of your surprises of RAG gone wrong?
Ethan Mollick: I think there are a few things that I worry about with RAG. The big-picture view is there's also a huge amount of bets going on, on a current technology, that I see even companies with really smart people making. It's like, let's bet on these LLMs the way they are right now. Let's bet on a cost structure. We have to minimize things and throw stuff off to a Llama 2 instance running locally. How do we minimize costs when costs are still dropping exponentially and ability is still increasing exponentially?
We're still seeing those changes. The Groq-hosted version of Llama 3 is insanely cheap at this point. There are a lot of people building infrastructure around the limitations of a technology where they're going to release the product in six or eight months, and all of the decisions you made and constraints are already going to be an issue.
There's also a mindset issue. A lot of companies are used to thinking about scale as something where cost is the number one thing, as opposed to thinking about now you have a cost-ability tradeoff that's very direct.
But leaving aside those sets of issues, I think the bigger issue with RAG is that a lot of technologists I talk to don't seem to understand or don't fully absorb the fact that you can build the world's best RAG pipeline, but once stuff is handed off to the LLM, it does weird things. Those weird things get compounded with how users use the system.
For example, a really reasonable thing that a user might ask the RAG system to do is, "Tell me what's important about this project I'm working on." I will not name it, but I've been using one of the best systems that works with RAG in your documents. What it does is the RAG search, when I asked for the most important thing, ends up pulling back a few documents. I don't know why, but in the case of my organization, it pulled back a Salesforce installation guide, a document I wrote, which was great, about a paper, and then a memo from the dean of our school.
The AI then made an incredibly plausible argument about why these three things fit together as the most important thing that I need to be worrying about. Like, we obviously need to be focusing on the Salesforce installation, and this is a priority for the dean, and of course Ethan's work indicates why this is such a big deal. It was completely plausible. There were some quotes made up from my documents that seemed really reasonable, that I had to read my document to remember I didn't write them.
So the problem is you have an absolutely convincing machine at the other end. You could deliver the perfect documents to it, but it might still not just hallucinate, but make up useful information that isn't valuable. I don't see people's RAG pipeline testing that end-user piece nearly as much as they test all the elements going into it. I think that weirdness makes RAG harder to use.
The last piece, because I've talked at great length about this, is you get much less hallucination and much smarter AI reasoning over large context windows. One other option is, when is it just worth loading everything into the context window? And everyone's like, well, it's very, very expensive to do that right now. Yes, for now it's very expensive to do that. Is it expensive to do it 12 months from now? Is this what you want to build a product around, a solution that is built around today's limitations for technology that, as far as we could tell, is still doubling capability every five to 15 months?
Gene Kim: Fantastic. I realize as I was choosing the word skeptical that by no means am I trying to diminish the work of our friends at Parloa, et cetera.
Ethan Mollick: No, it's great. I totally get it. I think RAG will have value, but I worry that as a general solution for solving every problem -- first of all, the core idea that your own documents matter so much for most organizations is an open question. I don't see testing what's already in the GPT, what retrieval or what your context-window stuff does. There are definitely use cases for RAG. The customer service thing may be a perfect answer because it's bounded and they have the right kinds of issues, but as a general solution to everything, where RAG will solve the problem, I have my doubts.
Gene Kim: It's so interesting. I wanted to bring up that question because I think it is of value to anyone who is influencing or implementing bets on AI. I guess I actually relish the engineering problem of how do you ensure the responses are correct?
As an engineer -- and by the way, one of the things that came up in that James Cham session was the notion that you said humans are good at detecting type one errors, when the computer doesn't find something we're looking for, but we're much worse at detecting type two errors, when a computer returns erroneous information, which I thought was such an interesting insight.
Ethan Mollick: Absolutely. We're just not used to it either, right? If your search box says "answer not found," you're kind of annoyed, but you're not like, "That answer can't exist anywhere." But if you get a competent answer every time -- and again, the AI is a person; figure instructions. It's very hard with it to bound what a person does. In the same way, it's going to be hard to bound what an LLM does when it's most useful.
That doesn't mean you can't force it down a narrow pathway like the customer service example we were seeing, right? But the real power of these AIs for transformation inside organizations is often treated like a person that does analysis and extra work. That is not as easy to do.
03Advice for technology leaders
Gene Kim: I love that. In your book you called it cyborgs and centaurs. In fact, let's go there. This is an audience of technology leaders, often leading tens, hundreds, or in some cases thousands of software professionals working on some of the most important initiatives in their organization.
Let's zoom way out. What advice would you give to this community to help them and their teams take maximal advantage of this incredible technology that's impacting all of us, to help them thrive and win in the marketplace?
Ethan Mollick: The first thing is, for goodness' sake, just use it. The thing that worries me most is when I meet organizations where people are not using AI or they're only using it for coding, because I think that's a very narrow viewpoint. There is basically a parallel world of AI as coding tool, and that coexists very loosely with the larger AI use by everybody else who isn't a coder.
So you have to use it for everything. That's one of the principles of the book: invite AI to everything you do. I think you absolutely have to do that. You have to use your 10 hours of whatever frontier LLM you're using. By the way, if you're using LLMs at any part of your work, you should be using that LLM too to understand what it's good or bad at, when it gets mad at you -- that set of stuff.
The second thing to realize is that you have to build for the future. Even if GPT-4 is the best we get -- and I strongly suspect from things I'm seeing that we are not done, but who knows when this will plateau -- there's so much stuff left to do. You heard a little bit in the last presentation about that. We don't even know how to prompt really well.
The best way to get Llama 2 to do a 100-question math prompt is to pretend you're in Star Trek. That increases the accuracy. If you say, "We're approaching anomaly; answer in the form of stardate equals; we've escaped it, Captain." And the best way to have it answer a 200-question math prompt is to say, "The president's advisors need your help. The country is in danger." Those increase the accuracy of outcomes for reasons that are absolutely unclear.
So there's a lot of room for improvement. All that means is that you have to plan for a future. You have to be thinking about what happens when these systems get better. IT folks are not used to thinking about a fast-moving world this way.
By the way, part of that means that everybody else on the planet has access to at least as good an LLM as you have access to. Again, if you're a large IT leader with a thousand software professionals, you're used to having capabilities that no one else has. For LLMs, you probably have less capabilities because you have more constraints on your use than the average kid or anyone else who has access to an LLM that's as good as yours.
Gene Kim: Briefly before we get to the last question, I'm fascinated by so many of these experience reports where people on the bleeding edge are losing money on every transaction -- again, laughed at by everybody -- but they're counting on the exponential decline in prices that the frontier models are pushing them toward. Does that resonate with your own experiences, that that's a good bet?
Ethan Mollick: Yeah. You're doing R&D expenses, right? The issue is when you stop and build is a big deal. I've been talking about the wait equation, this idea from space travel that if you want to go to Alpha Ari, actually the fastest way to do this is to wait 150 years, because the speed of spaceships is increasing faster than if you left; you'd be passed by somebody on the way.
There is an advantage to experimenting and waiting there. You can be waiting to build this out while viewing it as the biggest deal in the world. I like to have a whole bunch of tests that are just failed by GPT-4 right now. As soon as a new model comes out, I'm going to test those things to see if GPT-5 can do them successfully.
I think a focus on cost first is going to be a problem, because if the cost savings are as big as they could be from this set of stuff, if you're trying to build the rational, cost-effective system right now and you're not following the trend lines, you're kind of making a mistake.
So if you're under pressure from your organization to build something now, fine. Build them something that uses LangChain and Llama 3 or Llama 2, whatever you want. Do that. Fine. But you should be having an eye to the future. How do we hotswap out the brains of the system for a better system as soon as they come along?
Gene Kim: Fantastic. My friend Brian Scott and his colleague from Adobe are responsible for the rollout and governing and gating of AI to tens of thousands of Adobe engineers who want to use this technology. It's such a great story. My buddy Dr. Kersten will talk about how we can do things as developers that couple us to the frontier LLM, which makes switching costs very high, which is such a great insight.
So my last question is, for the last 10 years I've asked everyone who speaks at this conference one question: what is the help that you are looking for? Are there things that this community can do to help you in anything that you care about?
04Help wanted: public examples and better benchmarks
Ethan Mollick: I think one thing we need to do, as a group, is establish some agency. One of the startling things about my position inside this industry, which I was not expecting, is that I have influence. People use the words I use to describe things. Stuff that I write matters. Part of that is I'm early to share about what's going on, and there aren't that many public examples.
So you get to help shape where this is going. I think being public about what you're doing matters a lot. I'm always interested in talking to people about research and how they're using it in their organization. Success stories are really important. Failure stories are important. But I also think helping decide where things are going by showing examples of success that are positive uses of AI, that increase the happiness and thriving of your workforce and don't destroy it -- that's really important to share.
I also think another thing we need to be doing is seizing control of some of the benchmarks and approaches. I am shocked that everyone still uses -- everyone's measure of how good an AI is, is the MMLU, which is just a random set of mostly math problems. The people who make this are coders; they're obsessed with math problems and coding problems.
The truth is they're not obsessed with the stuff we just saw. How good is it in conversation? How good is it in solving day-to-day problems? How ethically does it act? I'd like to see more benchmarking and public benchmarking from organizations: this is how good these systems are at different things, which could actually change the entire direction of where AI is heading.
So part of this is about keeping some collective conversation going over a very important evolving technology. I'm always happy to talk to people who want to share stories or research and are willing to do that. People always ask you about success stories, and I'm always happy to talk about those. But I think the thing is, share with each other. It's early days. We can bend the curve of what this thing does, but we need to be talking about it.
Gene Kim: Fantastic. When you mentioned benchmarks, that's something that the technology industry is very familiar with: how they can be weaponized or used for good, and the power of good benchmarks. So if people want to help with this and they have stories to share, how do you want to be reached?
Ethan Mollick: Either Twitter direct message tends to be good, or else you can email me at [inaudible]@upenn.edu. I'm an overwhelmed academic, so I cannot promise I will respond right away, but I will do my best to get back to you.
Gene Kim: Fantastic. Dr. Mollick, thank you so much for sharing your insights, and again, congratulations on the fantastic book. I look forward to more adventures to come.
Ethan Mollick: Thank you so much. Bye-bye. Thank you.