Exploring Code Conversion using AI
We conducted an experiment using generative AI to convert a small ColdFusion application to modern TypeScript, tackling a common enterprise challenge: modernizing aging codebases that would traditionally take weeks or months to update manually. We saw both the impressive acceleration AI can provide and the sobering realities of what it takes to move AI-generated code into production. While we successfully demonstrated functional code conversion at speed in a controlled environment, we discovered that the real challenge isn’t just generating working code—it’s building the validation frameworks, security measures, and organizational confidence needed for enterprise deployment.
Hear about our journey and where we want to take this exciting effort next!
Chapters
Full transcript
The complete talk, organized by section.
Jeff Gallimore
Well, it's 1:50. Keep the trains running on time.
This talk, what I'm going to share about is a largely personal journey on an experiment that was seeking, for me and others, to answer a couple of key questions.
One was this one: the last two years have been dominated by the AI conversation, and I really wanted to separate what was hype from what was real.
And then this other question was because of a challenge that we were being presented with: what do you do when you believe that there is this new technology or tool or technique or something that is going to really help you, your team, your organization, but you are blocked from using it in whatever the relevant environment is or the daily work? How do we move forward in those sorts of things?
So this is about that journey for me over the last probably three or four months.
A little more about me, just so you know where this is coming from. This will serve as context for the rest of our time together. I started my career in engineering. I don't do daily engineering work anymore, but I do still live vicariously through our engineers, some of my favorite people. The last real production code I shipped, in any real way, in an enterprise environment was over 20 years ago. So just put that in context as I start talking about all the stuff that I did or didn't do or what have you. Please, no judgment.
I've never been a ColdFusion developer. I've never been a TypeScript developer in any meaningful way. And yet I often live by this philosophy, and you'll hear me saying, "Hey, sure. Let's try it. What's the worst that could happen?"
So just use that as context as we walk through this experiment and this journey.
So here's the situation. One of our customers, and I work for a service provider, has a portfolio of ColdFusion applications, and for reasons, they would like to migrate to a more modern technology stack. We believe that we could help.
But on one hand, we're not allowed to use GenAI technology within that environment. There are organizational rules. There are constraints. They are a risk-averse organization, highly regulated, and so that is not accessible to us right now.
We also can't get the existing code, that ColdFusion code, out of the environment and into something where we can work with it, for many of the same reasons. So we can't bring the code and the technology together.
So what are we going to do? It felt like this: we had this sports car that could go really, really fast, but we were stuck in the mud. We just can't go anywhere. We can't use the power. What do we do in that sort of situation?
This is actually a picture of me three months ago. No, not really.
So the answer that I usually go to is, well, let's try an experiment. Remember: sure, let's try it. What's the worst that could happen?
So I started thinking about what experiment could we do that could help us learn, help us advance the ball, and figure some stuff out.
And so the TL;DR on that experiment was we successfully converted a simple ColdFusion sample app that we found out in the public domain. Believe it or not, it had automated tests associated with it. How about that? And the last commit to that repo was eight years ago. Sure, let's try it. What could go wrong?
We converted that to a functionally equivalent TypeScript app using GenAI. I'm going to go into all the details about what we did, how we did it, all that stuff. Along the way, we established some better practices and processes for what effective code conversions using AI might look like and how we might approach those.
And I think this is where the rubber meets the road in the results conversation. I believe, from that experience, that AI can dramatically accelerate the conversion, and human expertise in the loop still guiding it is still essential. Humans have to stay involved in this process.
All right, so I said dramatically accelerated. How much is dramatically? Conservatively, I think based on the conversations I had and what we saw, it's probably an order of magnitude. For the sample ColdFusion app that we tackled, we had a converted app with tests running in 90 minutes using AI, start to finish.
One of the ColdFusion developers who was involved in this estimated that to tackle that without AI probably would have taken six to eight weeks. Now, whether you believe six to eight weeks or whether you believe 90 minutes, you are still at a very large gap between with AI and without AI.
You don't have to read the disclaimer there. It's funny: the whole point of that disclaimer at the bottom that you can't read is really, take all of this, please, please, with a grain of salt. Whether that's one grain of salt or mountains of grains of salt, your mileage may vary. Whatever cliche or metaphor you want to put on that, this is the experiment that we ran, that I ran. Depending on the context of the environment that you have and what you're trying to tackle, it may look different for you.
So why did we try to tackle this at all? What was the objective? Why experiment with this?
Well, first, we believe that the technology had really advanced very, very quickly with all the AI and the coding agents and the practices, and that it could really accelerate this move to a more modern technology stack to reduce the technical debt within that environment and also, right along with that, improve the agility of the organization and the teams.
We believe that we could increase efficiency and team capacity by injecting these tools and technologies and practices into the environment and the daily work.
And then something that is also true about me: I'm always interested in what we can learn from things. That's why I do all this experimentation. So what is it we can learn? And in particular in this case, how do we approach future migrations? What's the same? What's different? How do we make similarities and comparisons and contrast to other situations?
One of the things that I also learned from this, maybe not learned but sort of this aha moment, is that this sort of modernization problem or migration problem is not unique to this particular organization. In this case it looked like ColdFusion, but it might look like one Java version to another Java version, or this service to that service, or this framework to that framework. There are a lot of these kinds of migration problems or conversion problems that exist in any organization with any sort of complexity or scale, or that has been around for a long time.
And so how do we find and surface those kinds of opportunities?
This was the statement or the feel that emerged from that: where do we have this feeling? Where are people saying this: "Ugh, we really should upgrade this or modernize this or fix this, but it's going to take a lot of time. It's going to take a lot of money, and we're just going to be left with the functionally equivalent same thing when we started, and is it really worth it?"
The hypothesis here is that this technology and these practices could start to change the value equation and lower the bar for some of these projects.
So here's the experiment setup. This is where you're going to want to poke holes and challenge this. It's like, does this really relate to an enterprise environment? I am here for that. This is one of the reasons I wanted to do this: to learn.
The environment that I was working in was a local environment. It was not cloud. I could see how it would translate to the cloud, but that's not what I did.
I mentioned this already: it was a sample ColdFusion application from the public domain. So I went onto GitHub, and I did a lot of Google searching to see if I could actually find a reasonable-enough code base that we could start to tackle.
It's running on an open source Lucee server. There were roughly 30 files, 2,600 lines of code. So again, not a really big app by any stretch of the imagination. There were some tests that actually worked.
Remember, I'm not a ColdFusion developer. I never have been one. I've never even played one on TV, although I am sitting on the stage representing myself to some degree. So I had to get this thing up and running. I had to make sure that it actually worked. Can I do that? Yes, in fact, I did. I got it up and running from scratch. I used LLMs extensively to help me do that. And lo and behold, I got the app running with all of the tests.
The target that we wanted to hit was TypeScript generally: Node.js/Express backend, pretty common; React and TypeScript frontend, again, pretty common. It could have been anything. I could have just picked whatever your version of a modern technology stack was, and I could have plugged that in there too.
The agent that I used was Anthropic's Claude Code, and I put the models in there that it was using. I had auto-select on, so it figured out which model it wanted to use.
And then there were a couple of important artifacts that were part of this learning for me on this journey. Two of them in particular. One was this `CLAUDE.md` file, and there's an equivalent to all of these other coding assistants or coding agents.
So the `CLAUDE.md` file is where you can capture your context and your guidance. So things like architecture frameworks that you should use, frameworks that you shouldn't use, what your testing conventions are, what your security controls are, considerations, all of the things that define a good app or a good system in your environment can go into this `CLAUDE.md` file.
Just to give you a sense of scale on this `CLAUDE.md` file, for those of you who haven't done efforts like this yet, we're not talking like a paragraph or even a page. These can be really large documents that run into maybe the tens of pages, to be able to encapsulate all of the standards and the conventions that you want to incorporate into that code.
And then the other artifact that was really important was this to-do file. I think this is pretty much the same. It's named the same thing across a lot of different coding agents: `TODO.md`. This is your plan. Think about all the tasks that need to happen from start to finish in the context of this effort.
I'm starting with a code base that I probably don't really understand all that well. Let the agent get in, analyze it, figure out what's going on, and then all of the tasks step by step in relatively granular detail to be able to get all the way over to the converted target.
And then the validation: how do we know that this thing actually worked? How am I going to verify that? The answer, shocker, is automated testing. That was one of the reasons why this public domain repo was so attractive: it actually had automated tests associated with it. I could have created it, but I didn't understand the code base. It already came along with it, so fantastic.
And then I wanted to make sure that whatever that converted application was also had functionally equivalent tests. Why is that important? Because I wanted to make sure that the functionality converted. It's one thing to make sure that the code converts and it runs and all that stuff, but I wanted to make sure that all of that functionality was preserved.
All right, so here were the results. Remember, we started with roughly 30 ColdFusion files. We ended up with 20 TypeScript files, five of which were for tests, roughly the same order of magnitude in terms of lines of code: 3,100 versus 2,600. And that was roughly split between app code and test code.
There were 111 automated tests in five test suites. Out of the box, we got 73% overall code conversion. That was based on a number of lines, and there were some other metrics that were roughly equivalent to that.
The coding agent and the conversion process required my intervention at multiple points in that process. I'll give you one example of that, because I didn't know any better. When I first started the conversion process and the agent said, "Hey, I'm done. I've finished with the conversion," and I could see in the console the service was up and running, and I think that's great. I asked it, "What URL do I go to, to test it?" And it hadn't created a frontend for it. It only had created the backend. I'm like, "Okay, please go ahead and create a frontend for this so I can actually use the application that is not just an API."
So those are the kinds of interventions, in this case when I was first starting, that I had to make, and I didn't know I had to make them.
Another result of this, and I mentioned this a little bit earlier in the objective, was I ended up getting a lot of learnings from what the agent was good at and, in particular, what it was not good at: like, hey, create a frontend. There were a lot of errors that had hit along the way and issues that it didn't tackle along the way.
So when stuff didn't work, I fed that learning back into the guidance file and then the plan file, using itself as the source of a lot of that stuff. It created this really neat feedback loop for this process. Those results got fed back into the guidance files and the plan file so that now we can reuse those as the foundation for whatever that next conversion might look like.
Now, the real numbers conversation. I said 90 minutes. It was 90 minutes of elapsed time to get to a converted, tested, working, running target app built in TypeScript, and it cost $33 in tokens. I don't know how you all feel about that, but that seems like a lift to me for fairly little money.
All right, so a quick perspective on tooling. As I was asking Claude about this talk and what would improve it, it said, "Hey, you're not talking about how you selected the tooling." This could be another complete talk. This is another topic entirely. We could spend hours or days talking about different tools and the benefits and drawbacks of those.
There are many capable tools and technologies. We're going to see and hear about a lot of them here at the summit. What I was trying to do, though, was to optimize for speed and value of learning, especially as I was getting started. I wanted minimal friction in the environment for this experiment. I wanted to prove out whether this was even feasible.
So I picked a small sample application. I picked the tool that was most convenient to me. I picked one that I was actually engaged in wanting to learn more about, which was Claude Code. I'll also say that, using some of the other ones like GitHub Copilot and Cursor, I also wanted a more purely agentic experience. Dealing with a Claude Code type of experience where there is no IDE for you, versus Cursor or GitHub Copilot, seemed kind of interesting to me, so I wanted to try that out.
The biggest lesson in all of this for me, and again, it is the lesson or a principle that I apply to a lot of other situations, is just start somewhere. This really isn't about a particular tool. Again, there are a lot of capable tools out there that you could start with, but do get started. That's how you're going to learn. That's how you're going to make progress.
Okay, demo time. What could possibly go wrong, right?
If this is going to work, I'm going to really quickly show you what this looks like. I probably could have just put screenshots in here, but what fun is that?
All right, so what I'm doing here is I'm bringing up the original ColdFusion application so you can see what that looks like, and I'm also bringing up the frontend and the backend of the converted TypeScript application.
Just so you know, I have not changed any of the ColdFusion code by hand. I have not changed any of the TypeScript code by hand. The agent has done all of that stuff for me.
All right, so if I'm lucky. All right, I am. How about that? This is what happens when you live a charmed life.
Okay, so this is pulling up the frontend. You're going to get to see what -- okay, so this is the application. This is the sample ColdFusion application. It's a news clipping application. You go in, you can save articles and stuff like that. You can put in the title and the text and the link and all that kind of good stuff. So this is what the ColdFusion application looks like.
And this is what the TypeScript application looks like. Again, functionally equivalent. It looks a little bit different, but again, functionally equivalent.
But I mentioned the tests. So if I'm lucky, I'll see if my luck continues here. Run the tests. Here we go.
Okay. So now it's running the test suite that it had created. Remember, 111 automated tests in five suites. It's running it, and then you get to see this here. I could pop up the HTML page, which is really small.
But the point here is that some of those numbers are green, some of those numbers are yellow, some of those numbers are red, and some of them that actually matter are not zero. So the agent went in and did some testing, created some tests, and it's actually doing something hopefully meaningful in that code base.
All right, so that's the demo.
Now the comparison. Is this apples to apples to an enterprise environment? No, but sort of. So let's talk about what is apples to apples and apples to oranges and apples to any other fruit or object that you want to compare it to.
Similarities and differences. What's the same about what I did with this experiment? The architectural patterns are the same: the fact that there is a database, a data store that's underneath that, and there's some ORM stuff that's going on in there to save and extract data from a data source; the business logic migration and with the service layers and all that kind of good stuff that's in there; and then testing methodology. There is automated testing on both the original source, and there is testing on the target.
Now, what's the difference? The obvious one is production apps in any enterprise environment are probably going to have orders-of-magnitude larger numbers of files or source code that's included in them. I had 30 in the sample.
Authentication: there isn't any authentication in this. There's going to need to be some sort of integration to some enterprise single sign-on or authentication solution.
Complex integrations: this is about as vanilla and straightforward as you can get on the sample application. Simple web application with a data store on the backend. There's nothing in here calling web services or message queues or APIs or going in and out of environments. There's nothing of that kind of complexity.
And there isn't any data migration. So data migrations are really important parts of modernizations or migration efforts. I didn't have to consider that. But there are also patterns for how to deal with a lot of that stuff that we've tackled before.
This is the conversion process that ended up emerging from this effort. Now, if you had asked me what I needed to do at the beginning of this effort, this would not have come up, at least in this structure. So I had to do it to figure this out.
It's interesting that once I have done this and I put this, reduced this to paper, I'm seeing lots of validation in other conversations, including one or at least one or two of the talks this morning had something similar to this that I could see the connection. So the takeaway for me on this is, "Hey, I'm not crazy. At least I'm not on this one."
And then there are a lot of next steps that I want to take and that we want to take. Clearly, experimenting with larger or more complex code bases that are more representative in the enterprise environment. How big can these code bases get before the results of the conversion efforts start to suffer? We will maybe have to change the approach a little bit.
Different AI coding agents and configurations: what does Copilot look like versus Cursor versus Sourcegraph versus using AWS Bedrock as a service for the LLMs to serve up the backend of the coding agents?
Different sources and targets instead of ColdFusion and TypeScript. It might be two different things.
Something that I'm really excited about, even more so now being at the summit and hearing about a lot of the experiences from others, is more sophisticated agentic workflows and using subagents and parallelization and swarms, and how they get the orchestration right to increase the productivity even more.
And then certainly addressing security and other enterprise concerns. That's got to be in there, so put a pin in that. I'm going to come right back to that in just a second.
What's the right level of investment in testing and documentation at the beginning, in that preparation phase? Or do we just YOLO it and let the agent go after it, to see how far it can get, how fast, and what ends up happening? Maybe that's a better learning loop than investing a lot of time in understanding and documentation.
Where to use generative AI versus more deterministic AI? That's a choice. It's probably not going to be one or the other. It's probably going to be both in different flavors and at different times.
So I mentioned the enterprise concerns, particularly with security. In the conversations that I've had as I've shared this experiment, this is the issue or the concern that has emerged most often, that seems to be the focal point or the crucible issue for getting the real unlock from productivity from this. That is: how do we gain sufficient confidence to deploy AI-converted code or AI-generated code or AI-assisted code into production, where it could actually deliver value?
You can generate functionally equivalent code in hours, maybe days, what used to take us weeks or longer. But functionally equivalent isn't the same as production ready. Something I vibe-code for my own use is not something that I'm going to run in a production environment for a customer.
The hypothesis, the approach that we're going to take to tackle this, is continually asking, are we ready to deploy this into production? And then responding and dealing with the inevitable no: and here's why, this is what we need to do. We're just going to tackle this stuff, and eventually we're going to get to a yes.
Also a perspective, opinion if you will: there's probably many lessons that we can learn from all of the other transformations and change efforts that we've had over the course of years and decades that we could probably apply to this too.
And we can probably also use the foundations that we've already built, most obviously in the form of CI/CD pipelines that have a lot of these controls that are already baked in, to this point of how do we increase our production readiness.
All right, a few realizations along the way, just to share with you from this journey that I had. Yes, I believe that AI can accelerate code conversions. Now it's a matter of how much and how well, following the lessons from leaders who are pushing the envelope on this stuff, many of whom are here at this conference. I don't have to reinvent the wheel, although there is definitely learning about what it is and how to apply that in my particular context.
The change management lesson here is if you feel like you're blocked, I felt like I was blocked, there's probably some small experiment that you can find that's going to be relevant, that helps advance the ball and change the conversation.
And speaking of doing things, you may be surprised at what unexpected opportunities your efforts create. You get the opportunity for those serendipitous interactions that didn't exist before.
All right, help. I'm looking for, like, I don't think I'm crazy, but maybe I am. I don't know. This seemed to work. This seemed to be promising. If you've got opinions on what I've done, what we've done, for, against, good, bad, indifferent, I'd love to hear them. If you've tried some of this stuff, I would love to hear about that too.
And then I'd also really like to understand: how does your organization build production-ready confidence to tackle that crucible issue?
That's me. That's all my info. Thanks for listening.