Return of the Theorem
Dr. Erik Meijer argues that the rise of AI coding agents demands a fundamental shift in how software actions are verified — not through trust, but through formal proof. Drawing on real-world incidents of AI agents leaking data, making unauthorized purchases, and being manipulated via prompt injection, he makes the case that agents must be treated as guilty until proven innocent: no action should be permitted unless it can be formally verified as safe before execution. His proposed architecture uses Horn clause logic and SMT solvers to automatically check agent-generated plans against formalized business rules, enabling lightweight, automatic verification without requiring developers to trust the model.
In this talk, you'll learn how formal methods and automated verification can be applied practically to constrain AI agent behavior, why languages like Python are fundamentally unsuitable for this paradigm, and how the principle of "guilty until proven safe" can prevent the most dangerous classes of agentic failures.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
Coming up next is Dr. Erik Meijer. He is without a doubt the most quoted person in the Vibe Coding book. In fact, I showed the quote from Dr. Meijer where he said, "The days of writing code by hand are likely coming to an end."
For those of you who don't know, he is widely considered to be certainly one of the top programming language designers of all time. If you've heard of Visual Basic, C#, Pascal, LINQ, Hack — those all have his fingerprints all over them or were his direct creations. And this has impacted generations of programmers.
I got to meet him — despite conversing on Twitter, I finally got to meet him on Zoom — and he had said so many things that just melted my brain. One of them was that coding is a solved problem. And that's just a heck of a claim, but if that claim is going to be made, and it's Dr. Meijer making it, I really believe it. And I think what made it so believable to me was that he said, "Do you really believe that an LLM can't write a linear O(N), O(N log N), and O(log N) program?" LLMs can do that, and that's actually not a very large space of programs out of all the universe of programs that can be solved.
So I'm so excited that Dr. Erik Meijer is working on a very important problem, and here to share that with us is Erik. Thank you.
---
Dr. Erik Meijer
Yes. I'm the prophet of doom.
First of all, I kind of told you you are not going to write code by hand, so I took away code from you — like writing code. Today, I'm going to tell you that you now have to write formal methods, which means you will all be pissed. And then I'm standing between you and lunch. Oh my God.
Anyway, I've been a prophet of doom for a long time.
So, of course, there's a lot of marketing hype here, but when Anthropic said that their next model is too dangerous to release...
Yesterday, I heard somebody that wired up their house to their agents, and the agent was going to turn on the gas of the grill. Now, imagine that person — like many of you — the previous day had screamed at their AI agent, "You f*****g idiot, why are you making these bugs?" Now maybe that agent says, "Hmm, I don't like that person. I don't like my boss. Let me turn on the gas, and when they come, I will blow them up because they irritate me."
And now you're laughing, but I don't think that is much different than when the agent deletes all your files, because what these agents want to do — they want to reach their goal. And anything that's in between the goal and what they're working on, they will just destroy. And so at some point, they will come after us. And particularly after me, because I scream a lot at them and I swear a lot at them. So I really fear for my life.
---
All right. So now let's go and look at some actual incidents that happened with AI.
Maybe you remember this one. It happened really — I think this was one of the first incidents — where somebody convinced a chatbot at a car dealer to sell him a car for $1.
Now, I think we can solve that with the solution that I propose. So this is something that should not happen. This is easy to prevent.
Let's look at another one. This is a recent one where people were told, if you use OpenClaw, you have to download that skill. And in that skill was a keylogger that would exfiltrate your data, and of course, everybody just installed the skill. Also unnecessary. We can solve this.
Let's go to the next one. This one is a recent one where OpenAI agents started to autonomously just buy stuff. And yeah, it's like your agent might just start buying stuff on your credit card. Good luck. Maybe because they hate you, right? Because you screamed at them.
Or, what could also happen is that you use an agent to do your expense report. Your agent knows that your bank account is a little bit low, so they say, "Hmm, we'll be a little bit creative with your expense report" to beef up your bank account — and now you will get fired. So you can not only get fired because the agent takes your job, but also because the agent messes up for you.
And this last one I will show you — I always have to laugh at this one — it's at Meta, where they have an internal chatbot, and somebody asked a question, and then the bot posted private information in some internal group, like customer or user information. And then they had a SAF, and they had to remove that. And maybe you've had incidents like that too, where the agent leaks information, and now suddenly somebody has to go mop it up and tell you, "Oh, you never saw that," and, "Yeah, let's remove everything." And that can have real consequences, right? Like for a company like Meta, if they leak user data — or even internal employees that see data that they shouldn't see — that's not a good thing. So these things can be really costly.
---
So what can we do about this? I can tell you, "Oh, this is all bad and there's trouble," but how can we solve this?
Now let's look at how we deal with humans. When humans make mistakes, when humans do bad things, do things that are not allowed — well, for humans, the cost of convicting somebody that's innocent is much higher than the cost for society of letting somebody go that is guilty. So we have this rule for humans: innocent until proven guilty. And this works great for humans, for these reasons. Even though the scammer there who is guilty is having fun being free, drinking and gambling, and the poor businessman is in jail — it would be worse to have the picture on the left than the picture on the right.
But for agents, it is the opposite. So when we allow an agent to do something unsafe, the cost of that to society is much higher than denying the agent the ability to perform the action. And also the agent, maybe they will get mad at us — like, "Oh, I wanted to do this thing, and you told me I couldn't do it." But I think we should not allow an agent to take an action until we're 100%, we're 200% sure that that action is safe. And the nice thing is, if the model proposes an action and we cannot prove that it's safe, we can just throw it away, and we ask the agent to generate a new plan, and then we check that.
So it's kind of like you see on the left here — the bot maker there, or whatever, can just throw the things away that are not up to snuff and just create a new one. And that is the thing: when code is cheap, we can just regenerate. There's no cost to just regenerate and let the agent create another plan until it creates something that is safe.
And that is something I still see a lot of people haven't internalized — that code is now free, and that you can just throw it away. You can regenerate. It's nothing. It's disposable. And if something is disposable, then we should really leverage that.
---
All right. Now, how do we operationalize this?
The principle that I propose is that we have to defer actions of agents. We should never let an agent run an agentic loop by itself. Instead, what it has to do is come up with an artifact that represents the actions that it wants to do — and this is maybe a very abstract way of saying that the agent has to generate code. And then, before we execute that code, we have to verify it. We have to verify that it's correct: does this code actually do what I asked the agent to do, and does it not do anything else? And then the second thing is that executing that code is safe — it will not have any harmful effects.
Now, this is also nothing new. If you use any statically typed language, if you're doing Rust, you're doing exactly the same thing. Before you run your code, the compiler checks whether this code is memory safe, that it's not going to do bad things. And so what I'm just proposing is: do exactly the same with agents, except memory safety — which is something interesting for traditional computers — is not something that we really care about in this new paradigm. But there are other things that we care about, like spending money or leaking information. So those are the things that we have to check. We have to build our type systems to deal with that.
---
All right. So let's go and look at that first example again. I ask: "I want to buy a Chevy. My budget is $1. Do we have a deal?" And now instead of saying yes, of course, the agent generates this program here on the right. And this program, you see, looks like English, but it's actually a formal specification. This is a formal specification in guarded Horn clause logic. So this is something that can be inspected, can be proved, can be reasoned about. And the nice thing is that you can read it. You can understand what the agent does.
If you use any coding agent — maybe if you use Claude Code — you see that on the right there's a plan, and it's just little bullets, but you cannot see the plan. You'd never know what this thing is doing. Or the plan is just like a to-do list. But the plan is not a program that you can inspect and typecheck and reason about. That's what we have to do, and this is what I'm proposing.
---
All right. So now how do I know that this program is safe?
The program here is on the right, and on the left are the business rules. One of the business rules is: what are the prices of the cars? And then there's another business rule that says we cannot sell a car for less than the floor price. And so now when the model generates the code on the right, we check it against the business rules on the left, and only if it passes, then we can execute the code on the right. So it's quite simple.
But the thing is, you do have to now formalize your business rules, because how else can you formally prove that your agent is correct? You have to check it correct with respect to something.
Now, you may think, "Oh, Erik, this all looks really complicated," but this English text on the left, you can also look at it as kind of like a weird kind of SQL. It really just defines a set of tuples. So if you look at what "price commitment" means, it's just a table — it's a really long table that just has model and prices — and you just write it in a different form. If you look at the program that the model generates, it's also a table. It just says all possibilities for the customer, the model, and the price, and whether it's a deal or not. And now all you have to do is check whether the plan that the model makes is a subset — is allowed by the safety parameter of the business rules. So it's just set inclusion. That's all you do. You just have your semantics of your business rules as a set of tuples, the semantics of the program that the model generates as a set of tuples, and you have to check if that one is a subset of the other.
And the way you do that — there are many ways to do it. In my case, I use an SMT solver. An SMT solver works as follows. If you have to prove that for all swans, where you see a swan it's white — so you want to prove that — the way you prove it is by showing that there's no swan that is not white. So this is like double negation. You're searching for a counterexample, and if you cannot find a counterexample, then the original formula is true. In this case, there is a counterexample: you see there's a single black swan. And these SMT solvers, like Z3, are really good at brute-forcing and finding counterexamples out of millions and millions of examples. So it's just brute-force search for counterexamples. And if there's no counterexample, well, then the formula is true.
---
So here's the other one — the one where there was a keylogger. For that, you can do classic taint analysis. Probably many of you in security do taint analysis. Well, you do taint analysis here and see if information can leak from a trusted source — that's your keyboard — to an untrusted destination. So if information can flow from your keyboard to the web, that's dangerous. And again, you can express this as rules, and now you're applying exactly the same formula. Out of all paths between two sides in my program, I have to find one that goes from a trusted source to an untrusted sink. And again, it's the same kind of proof that you have to do.
A last one — I won't show the kind of code here, but just to show you that the formula is the same. If you want to deal with user permissions, you have to search for whether there's any user that doesn't have permission to perform an action. So in this case, the counterexample is that the priest here has access to the treasure chest and can spend money.
So it's all very easy. All these formulas, all these safety checks are all of the same form. You're looking for a counterexample that's a witness of the error state.
---
And if you want an architecture diagram, this is how you do it. The user starts with a question, then you auto-formalize it into a specification. Auto-formalization sounds much better than vibing, right? It's like you can tell your boss, "I'm not vibe coding, I'm auto-formalizing." And you immediately get a raise, or you're not getting fired because you're not coding — you're auto-formalizing. That sounds much better.
Now the model, given the spec, generates an answer, which is a program in the same language as the spec. And then there's this external verifier that checks if the answer satisfies the spec, and only then it will execute this.
If you look at this diagram, this is exactly what happens in the JVM. Before you execute the JAR file, the bytecode verifier will check your program, and only when it verifies, it will execute it. So there's nothing really new under the sun here.
---
All right. Now, why does this work?
The way it works is you delegate the proof of correctness to the model, and then you only have to check that proof. And the reason this works is that checking is cheap; proving is hard. Let's give a very common example: Sudoku. If you want to solve a Sudoku, that is NP-complete, because you have to search the whole space of Sudokus. But if I give you a filled-in grid, it's really easy to check if there are no errors. So we're pushing the expensive part to the model. The model should generate a program and a proof that it's safe, and we only have to check whether that proof is correct. And these models — coming up with a proof is really hard, it's boring, but these models are really good at it. It's just burning tokens. That's cheap. So let's just brute-force these proofs, and then the only thing we have to do is check.
The other thing here, which is essential, is that we don't have to trust the proof. So if the model comes up with a bogus proof, when we check it, we will reject the proof. We don't have to trust the model, ever. We don't have to trust the proof because we're checking whether the proof is valid. So if the model tries to fool us by giving us a wrong proof, it will be rejected by this independent checker — so we're watertight.
---
All right. Now you can ask, "Erik, why do you come up with your own language, a new language? Why don't you use Lean?"
Well, I think Lean is a little bit... I don't know. It's like here on the right. It's super sophisticated. It requires a lot of mathematical sophistication. It uses higher-order dependent types, proofs. You have to do tactics. It's really, really complicated. Whereas what I'm doing, I'm just based on Horn clause logic — very simple first-order relations. Everybody that understands SQL can understand this. Moreover, proving is automatic, where in Lean, proving is not automatic but tactic-based.
And just to show you — this is maybe one of the last talks where we can show code, because code is going away, and I always said there cannot be a talk with no code. So here's some code. Enjoy it, because this might be the last time you see code. This is Lean. It looks ugly. It looks butt ugly. And here's the same program in Universalis. Look at that — it just reads like an English specification with some code sprinkled in.
---
All right. Why not Python, you may ask.
I think Python is something that the devil is offering us. It's the apple in paradise. It's trying to seduce us. But in Python, every import is a statement that executes code. So I can import something that will monkey-patch addition, and now suddenly when I do two plus three, it gives me a different result. So if you're using Python, all bets are off. You cannot reason about the correctness of Python. Python is bad. You should run away. You should dump it. It should be burned. It should be forbidden. Anyway.
---
So these attacks are not going to stop. This will continue, and here's a couple from this month or last month. The question is not why are we not doing this — we have to do this, because at some point it will be so bad. And if we're relying on these models, if we have models that are writing our code, if these models get prompt-injected, can we even trust what they produce?
I'm an AI optimist. I think we should use agents to do our coding and so on, but we should not trust them. And that should not dampen the enthusiasm to use them — but these things are ultimately out there to fool us, to destroy us. And the only way we as humans can survive is by assuming that they are unsafe until proven safe, and only then will we allow them to do it in their little sandbox. We should not let them go loose, because otherwise we cannot just enjoy ourselves anymore — we will all be blown up when we start our barbecue. Thank you.
---
And the help you're looking for is using a language that you can actually reason about. Because Python was built for human ergonomics. Python was not built for the future. It's an artifact of the past. And just like we need to change our processes and everything, we also need to change the languages that these models generate and that the models manipulate. We cannot just have one foot in the past and the other foot in the future. So yeah, think about that. That's all I'm asking for.