Beyond the IDE: Toward Multi-Agent Orchestration

Log in to watch

Las Vegas 2025

Beyond the IDE: Toward Multi-Agent Orchestration

Coding agents like Claude Code and Sourcegraph Amp are the powerful new AI tools of 2025. But using them requires great skill -- so much that Gene Kim and I wrote a book about it, called Vibe Coding! Despite their pitfalls, the future is clear: Using multiple agents in parallel is how engineers will work. What will orchestrating them look like? Current IDEs don't fit the bill. In this talk I'll explore the next big evolution in agentic coding.

Chapters

Full transcript

The complete talk, organized by section.

Steve Yegge

Thanks, Gene. Hey everybody.

It's good to be back. We've got a pretty full house.

I'm not really as fancy or polished as some of the other speakers, and I probably didn't practice this as much as I should have, so I'll be winging it a little bit. But on the plus side, I'm only gonna go for 20 minutes, so we'll get us closer to lunch.

So let's see where we are on the slide. Ah, yes, there we are. So today I'm gonna be talking to you about a transformation in tech that's absolutely unprecedented. And that transformation is me.

That's me in 2022, peak COVID. And like Gene said, I had just given up. And when you give up on programming and your identity is tied up to programming, then you give up on everything. It's because programming had become too difficult, right? And I say this as somebody who's pretty decent at it — the ROI just wasn't there anymore. And it was sad because I had so many things that I wanted to do. I had so much ambition. I had this computer game I've been working on for 30 years that still — I think it's gonna have its day in the sun — a gazillion other projects. And all of them just killed me. I was like, ah, I can't do it anymore.

And it's funny because it's actually AI that's brought me out of this funk. Because when GPT came out and it wrote code, I was so excited. And everyone else was like, "No, it's bad code." And I was like, I don't think that's really the point. Things proceeded, and we wound up on this track. By the way, I picked up running last summer at age 55. I've never run before. Yeah, you can tell I'm a runner because I'm telling you about it. But now I run 25 miles every week. So this has been quite the transformation.

So I'm gonna move right along. You've seen this a lot. People have said, well, 2022–2023 was all about code completions, right? And then it really picked up in 2024 with Cody, Cursor. Everybody's been about a year behind where I want them to be.

When chat became viable with GPT-4, that was the moment the models were smart enough to edit thousand-line files reliably. And that was a tipping point where you could now work on real codebases. So I was running around yelling at everyone, "You should use chat to code." And people looked at me like I was nuts. They're like, "No — completions, code completions." Completion Acceptance Rate, CAR. You guys remember that metric, right? They would use chat for like the occasional wiki lookup or whatever. So I was mad, but I continued to push it really, really hard to see how much I could get out of chat. There's a 5x boost if you knew how to do it well.

And then, just as chat was really starting to peak, we started hearing rumors coming out of Anthropic that their engineers were not using their IDEs anymore. They were using a command-line tool to write code. Of course, all of us senior engineers went, "No, that's not what's happening. That's stupid and that's not gonna happen." And sure enough, those rumors were absolutely correct. And Claude Code came out and completely changed the world for everybody — except for 99% of programmers out there. Fewer than 1% are actually using it. It came out in March. And all your engineers right now are like, "Oh yeah, I use Cursor." It's like, no, no — it's moved on. The form factor has moved on.

Completions were like a 30% boost. Chat is like a 3 to 5x boost, right? Agents — coding agents — by the way, who here was in our workshop on Monday? Ah, pretty eye-opening, right? We had like 50 or 60 people come in and get to use coding agents for the first time. And it's a very different experience.

Because with chat, you're in the middle of the loop. The AI bosses you around: "Okay, go run this tool. Tell me what the answer was." And you're just slinging answers back and forth. I mean, why do people even like that? Don't get me started.

So what the brilliant people at Anthropic did was they put chat in a loop. They said, let's just put that chat in a loop and have it handle all that manual slinging for you and see where it gets us. And it killed RAG — well, it didn't kill RAG, RAG is really important, but it certainly killed RAG for coding assistance overnight. Because the agent was able to just go look. RAG is retrieval-augmented generation — it's all this fancy front-end we put in front of our data stores so the AI can access it. And in Claude Code it was like, "Nah, gimme grep. I'm just gonna look." And it would just go look around on your hard disk like a person would and say, "Okay, I think this is what's going on." Absolutely amazing.

So all year, Gene and I had to rewrite our book after Claude Code came out, because this was obviously the next form factor, obviously better than chat. Everybody should have switched to it instantly. Why didn't they? Well, because it's hard. It's incredibly productive, but even when I'm working on a system that I'm building and I know the architecture inside and out and I know exactly what I want, the AI will do any number of really, really bad things at the drop of a hat.

With vibe coding, you pretty much just say, "I want it to look like this," and it comes up with a blurry version. And then you refine it — which sounds horrifying to people, and yet that's what we do for a living: incremental refinement. That's how software is built. This is just speeding it up. But it turns out to be so difficult that Gene and I wound up writing a book about it. Our blog posts were all getting to 35 pages. There's a book missing here. So we wrote this book. It was an incredible adventure, an incredible journey. We had to rewrite the entire book when Claude Code came out. We started in December, wrote about chat. And Claude Code actually helped us see the evergreen principles as the form factor changes, because there are things that don't change.

Boy, I was really, really worried that this book was going to be completely obsolete by today — as I stood on stage talking to you. I wasn't even telling people to read it. I was so embarrassed that we were gonna write something that everybody would already know or wouldn't even be relevant anymore. Because that's how fast this stuff is moving. Well, it turns out — I don't know, we got incredibly lucky, or were prescient, or whatever — but nobody's using agentic coding yet. It's probably 1–2% of the world's developers. And this book is about agentic coding. So it's landing at exactly the right time, which I think is pretty cool.

This is what the book's about. They lie, they cheat, they steal. Okay — you have a new team of robotic sous chefs and you're the head chef. You're no longer cooking. You're responsible for the whole dinner for all of your guests. And your robotic chefs are really talented, but they will also do crazy things. We write about all of them in the book. And that's why people haven't switched yet. It's got a learning curve.

You give them Claude Code, you give them Sourcegraph Amp, you give them OpenAI Codex. If it's not on that list, it's a bad tool. Maybe Gemini CLI, maybe the Amazon one, but it has to be an agentic in-terminal coding experience. And people just can't do it. That's why we wrote the book. And if you do read the book and get that mindset and you practice and you build that muscle, then you join the camp of people who can't sleep. Topo was telling us last night — his wife said, "What's wrong with you?" Because he codes all the time now. It's just beautiful. But to get over that hump is really, really hard. It's proving really hard for developers.

And here's the number one reason. Every company has a monolith and a bunch of other stuff — microservices, little things. And AI really struggles with monoliths. They can only, you know, their context window is yay big — maybe a megabyte — and your codebase is gigabytes or bigger. So in order for them to find their way around that codebase, they'll go look around using grep, fill up about half their context window, and then start thinking, "Oh no, I need some context left for thinking. I'm done. I know everything I need to know about this codebase. Let's go. I'm gonna build you a new visibility system and a new logging system." And it's gonna go off and do a bunch of really bad things.

So how do you solve it? We talked to Andrew Glover at OpenAI, and he said a bunch of the developers there — low single to low double digits — have started using Codex. And the switch is so stark because I haven't opened an IDE on my computer in months, and I've written more code this year than probably a quarter of what I did in my entire career.

The developers at OpenAI who picked Codex as opposed to Cursor or Windsurf are so much more productive than the ones who aren't using it that they're now starting to have real concerns at performance review time. Because how do you compare somebody who is obviously 10 times as productive as their peers by whatever measure you want? He told us that more of the AI-generated PRs get turned back at code review time — which makes sense — but the ones that make it through are dwarfing the contributions of the people doing it the old way with chat.

And so they said, "Okay, how come you're not using it?" to the ChatGPT developers. ChatGPT is arguably the world's biggest monolith, right? It grew super fast. You got a billion users in a week or something like that. So yeah, it's not the prettiest code, no doubt. And the AI struggles with it. So you've got this uneven distribution of benefits, and it's really annoying because the people who are enjoying those benefits are all saying, "You gotta refactor it into microservices — that's how you get AI modularity." Which is true. And the ChatGPT people are like, "We're busy, can't you see?" And it's a huge chicken-and-egg problem. I mean, how do you even get a project together to refactor your big monolith?

There are fortunately solutions that don't require refactoring your monolith. You can get the AI to understand it, but it's so new as a discipline that I would say probably in this room there are only three or four people who have a solution here. One of them is Bruno Passos from Booking.com — I was just chatting with him before this talk.

The signposting thing: imagine your codebase is a mountain, and your coding agent is like a little fire truck team trying to figure out what to do on this mountain. It's gonna help a lot if you have some fire roads for them to get around. Just carve it up for them. Bruno was telling me that they used LLMs to go into their old legacy system and build a system model that was queryable. There's another customer of ours at Sourcegraph that I met in Britain — I can't tell you who they are, but they're doing something similar. So there are multiple companies doing this. You have an old legacy codebase that people don't understand very well. You have the model go in and analyze it and produce all the documentation, signposts, and directions that coding agents — and other kinds of agents — will need in order to find their way around your monolith. You augment that with some good search engines, because the agents can use any tool, they can use a search engine, and it's great. Code search engines have complicated syntax — well, the AI knows the syntax, so that problem is solved. And then your senior engineers are the ones who'll ultimately nudge the AI to do the right thing in this big context. They're the ones who'll say, "This logging system you're building us — go use that one." So it's solvable, but very few people are doing it.

As a result, the people benefiting from AI are the ones in a few domains: new projects, really well-factored microservices, certain languages.

So I want to dive in just a little bit, one level down, into how vibe coding is different from regular coding, and help understand why developers are having trouble adopting it. And then I'm gonna make some predictions for you that you can bank on. That's actually my job now — predicting the future. And I'm actually getting pretty good at it. I can see around corners about eight or nine months, I think. It's because I'm on the ground pushing it harder than any sane person would do.

This doesn't look like regular coding to me. There ain't no mention of an IDE. No syntax, no language. You have a plan, you onboard your agents, and then you babysit them. It's a completely new workflow. It has a really high cognitive overhead. What we found is that if you're running one agent, you're immediately gonna get bored waiting for it, so you're gonna fire up another agent. That game lasts until you've got four of them running and then your head explodes, because the context switching is incredibly difficult. But if you put them all on the same project, I found you can go all the way up to 15 or 20 agents, because for some reason the context switching is lower. It can be a little bit calmer. But look — these are all muscles you have to build as a developer in order to get value out of these things.

I have vibe coding projects that last minutes, and I have vibe coding projects that last months. I have one that finished up a couple of weeks ago that I started while Gene and I were midway through our book. In the book I was saying, "Oh, this thing will be done in a week." And it took months and months and months to finish — porting all of my tests from my video game, a thousand test suites, from one language to another. So these things can take a long time. It's not easy.

And especially if you're trying to do it with a swarm — which is kind of what you need to do, because a lot of these problems are parallelizable. Adrian Cockcroft was the first one talking to us about swarming agents back in June. And a month or two later I picked it up: you can actually spin off a bunch of agents and they can do a bunch of work for you. But then there's a problem — they can't see what each other are doing. So they build perfectly good systems that are not merged together. I found that this has kind of a map-reduce workflow, where you swarm to go off and do a bunch of stuff but then you gotta merge all that work back together again.

In the new world where agents are moving way faster than a regular programmer, let's say you give the web portal to one agent and the event system to another. Well, by the time the event system one finishes, they may have changed the system so much that the web portal one — whose changes are coming in later — now has to literally re-implement their entire thing on this new system. And so what we have is a merge queue problem. This is a problem that all of you will be dealing with soon — when coding is no longer the bottleneck, merging becomes the bottleneck. There are products emerging to address this kind of thing.

We talk about these loops in the book. And you should give your developers the book. Give them training, give them whatever they need, because they need to pick up this agentic coding. Even when we get to the next form factor — which I'll tell you about in a minute — you're still gonna need all these skills. The LLMs still lie, they still cheat, they still steal. By "lie, cheat, and steal" I mean: they'll lie about being finished — "Party time, it's production ready!" — and it doesn't even compile yet. They cheat — they'll just hack your tests, give you cardboard muffins instead of real muffins. We've talked about all kinds of scenarios in the book. And they steal — they'll take your database table from you just like that and there's no backup. So yeah, this is a hard, hard, hard space. And we're asking all of your developers to just jump in and do it in monoliths. I think it's almost impossible.

So what I decided, after watching my own work this year, is that this form factor today — that I love so much and have been walking around all year telling everyone to use — coding agents — this ain't it. This isn't the final form factor. Completions wasn't the final form factor. Chat wasn't the final form factor. Agents, I think it stands to reason, are too hard today and we need to move to — well, probably back to a UI would be my guess.

So what does it look like? I know exactly what it looks like because I just wrote a book about it. Gene and I put in a bunch of workflows that, if you do them, will get you the best results out of coding agents. And many of them are quite mechanical: please do a code review; please fix the bugs from the code review; go fix the tests; now do another review. It's the same thing over and over again. I mean, how many of the things you're typing into a coding agent could have been handled by a model? "I don't care which of those two things you do next, just do one of them." "Continue." So seriously, there — we are missing model supervision.

I predicted this back in March with "The Revenge of the Junior Developer." I predicted that supervisor models would come along and run coding agents, and we would start doing fleets of agents, swarms of agents — which is gonna be incredibly expensive. I did a lot of financials around how screwed all the companies are who didn't budget for AI and agentic coding assistance. And it caught the attention of Dario, who invited me up to his office in San Francisco, and we had a nice long chat about all this.

Oh my gosh — there it is. Okay, this thing is spot on. It's still happening.

So if coding agents are not the final form factor, what is? I'm gonna finish my talk by telling you what I think it looks like.

A couple of people have already come out with these orchestrators. You can't just have Claude Code run Claude Code — it doesn't really work that way. You need something much more rich and complex. These things are already causing cost overruns, people are already freaking out about overruns. But they're pointing in the right direction directionally.

The one I'm building — this is what my desktop looks like right now. Up there, no IDE, just terminals. This is not for everyone. What I'm building is I'm automating my own workflow. I shopped it around all year. I went to a bunch of different people and said, "Here's what we need to build. This is the next form factor." And they're all like, "Eh, someone else could build that" — or they didn't know what I was talking about, because they're not in it pushing it harder than any human being.

Who else here runs 12 coding agents at the same time? I may be the only person who's stupid enough to do that. Wait, I saw a hand up — are you stupid like me? Because it's stupid. But I push it that hard to see what the boundaries are and what tooling we're missing. And I learned a lot, and now I'm building it.

I was hoping to unveil it on stage today, but I only started three weeks ago. I had this huge revelation that there's only one way to solve this problem. I'm automating workflows — the workflows that Gene and I documented in our book, the ones I use every day, all day long. Anybody here know how you automate workflows? There's only one way to do it. Come on, someone shout it out. Where's Cornelia?

Temporal.

Everybody who's trying to solve this problem without Temporal — which is a third-party software package that came from Uber, actually; Max, who invented it, invented it when he was at Google with me, and Google was like, "Nah, we don't want it, we already got 50 workflow engines, what do we need another one for?" So he went to Uber, and now Uber runs on Temporal and Netflix runs on Temporal, and everybody who has workflows that actually work runs on Temporal — everybody trying to do this without Temporal is going to struggle.

So I said, all right, Temporal is going to be the substrate for the next agent orchestration. And I've been coding literally 12 to 15 hours a day. My agents are still running in the background right now; I can't let them stop.

And I'm building a system called Vibe Coder. What else would it be called? What it's gonna do is it's going to be just like a coding agent, except it's going to do all of the stuff that you currently have to tell your engineers to do. Do a code review now. Make sure it still compiles. Go check that you didn't break the security stuff. Fix the tests. All this routine work that you gotta deal with with coding agents — it's gonna be handled for you. So by the time you see it, it's gonna be beautiful. And Bruno has volunteered to be my guinea pig.

So with that — that's the future. You gotta learn coding agents now. You gotta keep your eye open for when coding agents get better with systems like the orchestrators — like Repo Agent, Conductor, my Vibe Coder that's coming, and any others. Only use the ones that use Temporal. And then — the call to action: now's the time. Anybody in your organization who's still using chat, you need to have that talk with them. You have to say, "That was really cool like six months ago." Okay, so have the chat.

And lastly, the workshop was amazing. It really went really, really well. People are still coding on their projects from the workshop — who's still coding on their project from the workshop? Look at that, hands up all over the place. This really was a lot of people's first experience with one of these coding agents. It will expand your horizons. So if you liked it, if you're interested in the workshop, if you want to see more of this — DM me or put in a little emoji on the message that Gene's gonna put into the channel, and we'll get in touch with you and get the workshop to you. Because this is the best way to get started.

And with that, thank you for listening.