AI in a Sustainable Way

Log in to watch

Las Vegas 2025

AI in a Sustainable Way

IBM Fellow, CTO for CIO Technology Platform · IBM

AI is everywhere but so is the drive to build more data centers that drives the need for more power and water to run those data centers. We need a more sustainable solution, when Deep Seek first announced it hit the industry by storm, but what it really did was start causing the question can we really do this in a more sustainable way.

This session will discuss the use of specialized smaller models the ability to train those models in cost effective ways with your own enterprise data without exposing that data externally and the industry trends around specialized chips for inferencing. These smaller models can be more effective while using significantly less energy and water.

Chapters

Full transcript

The complete talk, organized by section.

Rosalind Radcliffe

Well, it is two o'clock, so I'll go ahead and get started. They do not have my presentation, so I'm going to do this without my presentation until my presentation shows up, and then we will see where I am in this talk. This is going to be fun, but I can always talk about this subject.

We're talking about sustainability and AI. Let me introduce myself to start out with, for those of you who don't know me: Rosalind Radcliffe. I'm currently IBM Fellow, CTO for the Z ecosystem organization. My prior role was CTO for the IBM CIO office, so I was responsible for all of IBM's internal systems and our support that we provide, which is 200-plus thousand employees, that kind of scope.

We are a rather large company. We do a lot of different things. IBM no longer makes PCs, but we still make mainframes. We make a lot of different systems, and so we're in a lot of areas. When I think about AI and the use of AI, we have lots of different challenges that many companies face, but we all have to look at something that's really important. I think about my grandchildren when I think about this sentence: we're building nuclear power plants to build data centers. We're turning on Three Mile Island. If you're old enough, you know what the reaction to that should be.

Not that I have anything against nuclear power. I have it sitting in my neighborhood, in my backyard virtually. It provides a lot of power to North Carolina for our businesses, for humans, for all of us to do our jobs. But building new nuclear power plants just for data centers because AI is using so much power means we have to think about and recognize that every time we're using these GPUs, we're burning power and water. A lot of data centers are in Texas. They don't have a lot of water in Texas. So we have this really big problem with what we're doing. We need to look at AI and the value it provides and ensure that we're doing the right thing. We're using the right AI at the right time.

IBM GA'd this year the Z17, the next generation of our Z boxes. We also have the LinuxONE, which is a LinuxONE version of the Z box. The reason I bring it up: with the Z17, we produced a new hardware generation that uses less energy than the prior generation, and it can have a fully loaded AI drawer. I can reduce my energy footprint and still do AI inferencing in the machine, which is just one example.

NVIDIA is working with storage companies to ensure that maybe we can use storage to cache information so I don't have to redo all of my calculations. I can store a set of them so I don't have to regenerate. I can save power by doing this.

I didn't want to talk about this one without this chart, so I kind of skipped this. When I think about AI, we have to make sure we're planning all of the aspects around AI: the sustainability aspects from the standpoint of energy consumption, but also the ethical aspects and how we're running it. I used IBM and our governance program as an example. We have an AI ethics board. We don't do things that don't meet the ethics board criteria. We don't even start them. That's another way to make sure you're not spending energy, CPU cycles, GPU cycles on something you're never going to be able to do.

Think about it when we look at our data: making sure that we have the data that should be allowed with an AI, and bring AI to the data so you're not moving data around. We control the lifecycle of the AI models to make sure they're not changing over time. They're still doing what we expect, and by closely managing and monitoring them, we can help make sure we are providing the right capabilities at the right time.

This is a key piece of a process. This is an internal IBM process. This is how we build things. It tells us how we internally use things and it tells us what we build for our clients and for externals. One of the things Watson said years ago: computers cannot be held responsible; humans are. So computers can't make decisions. That's one of the founding principles of our AI ethics board. They can't make decisions that affect real things.

It doesn't mean we aren't using it for automation. Let's be a lot careful here. Yes, AI and GenTech AI is going to do things for us. It is going to collect data for us. It's going to process data for us. It's going to automate systems. Yes, that's not making a decision. Decisions are made by humans. That also helps in this process. If we make sure we have this from the very beginning, we can have the right plan.

These are the facts. This is a Z or LinuxONE, either system, but 38% less power than an equivalent Intel process. Think about that. If I'm building nuclear power plants, if I can choose to run even a percentage of work on something that is specialized and doesn't burn as much power, then I am more efficient for my grandchildren, everyone's grandchildren, and for the world as we go forward.

Everything's not going to run on a LinuxONE or Z box. I'm not saying that. But when you have 70% of the world's structured data sitting on Z already, bringing AI to the data makes a lot of sense. There are other manufacturers, other people looking at what are the ways to be more efficient. What we have to do as an industry is push for this. This is IBM at a core: we're going to be more energy efficient. That's what we want to do. But we need to push the industry to say, why am I continuing to build chips that burn more power? Can't I be more efficient? If our AIU can be added to our system and we still decrease power utilization, then this is not impossible. Any other chip manufacturer could look at power utilization if told that that matters.

So we need to think about that. We need to think about this processing and making sure we're doing the right thing.

I'm going to use examples throughout a lot of this, but this example does not matter where you're running. If you're doing AI, are you using the right kind of AI for what you're doing? You don't need a large language model for everything. In fact, a large language model for everything will get you bad, bad results. So not only is it not energy efficient; it's not a good idea. There are lots of things that traditional AI and machine learning models work so much better for. Rules engines are still great. They're wonderful for the purpose that they provide. Making sure we're doing things consistently, we use less power, we improve the environment, we make sure we're using the right AI at the right time, and we can be more sustainable.

We need to make sure that as we're building our solutions, we're building it as part of the platform. This happens to be the IBM CIO platform in architecture form. Yes, I understand this is not a true architecture picture, but it's realistic-ish. It shows we have lots of systems. We're in cloud and we're on prem. We have Z, we have Power, we have everything on the planet. You notice in there AI/ML, AI foundation, is part of the overall platform provided to the organization.

If you are doing something with AI, you are not going out and getting whatever you want. You get the function provided to you. If you want to do a new AI project, you go through path to production in your application and data pipelines. You get an environment provisioned for you as you request it. It's automated. It's there for you. You have the infrastructure necessary to do that POC, and therefore we know it will run in the environment and will get to production because it started using the same pre-approved tools, models, et cetera.

Most people have limits on models. You can't use every model under the sun. You get to pick the ones the company has approved. The ones we have approved are available in the platform. It's not just one model; it is a set of models, but it's available there for you as part of the platform. So you are not, as a user, as a developer, as a system, provisioning. That means we can control things better. That means for the GPUs or the Z systems we have, we can make sure we're optimizing that use. We don't have systems sitting idle because Joe bought the GPU, so only Joe gets to use it, or that department or that organization bought the GPU. No, we can make optimal use of the processing in the system by bringing it into the overall processing.

As we think about this capability, making sure within your own companies you've set up the same kind of process makes it more sustainable. As things change, the platform provider can fix it. You can move it to a more efficient location if that's appropriate. You can choose where the workload needs to run so that you can be efficient with your hardware.

In this picture, we can run AI in multiple places. Where's the right place to run this? Should I be running something on the Z? Should I be running it on a GPU? Does it even need a specialty processor? The other piece of this is we work very carefully so users run models on their laptop as appropriate.

How many of you in your organizations have said these use cases actually run optimal on the local developer's machine? I can use my MacBook. Well, it would be better if I didn't have an M1, a better chip. I can do some stuff on my M1, but if you have a current developer machine, I can actually run a set of the models locally and be very efficient with that. I'm not then burning more processing. I'm not burning more energy, and it's more cost efficient. Am I going to run a really large model on my laptop? No, but there are lots of use cases that small models work very well for.

When we think about it, we should use the smallest model possible for any use case. Smallest model possible, because it uses the least energy and actually gives you the best result. I did say smallest model possible. I didn't say smallest model available in the world, because there are some very small models that don't work for every use case. You've got to pick the right model.

I've got some use case examples here of things we've done internally with AI that also drive this point home. When we think about anything we're doing, the first thing you want to do is eliminate and simplify. You don't want to AI-enable a process that really shouldn't exist in the first place. Maybe we can get rid of those. Don't even automate it. Just get rid of it. What can we eliminate? IBM has been around a long time, so we probably have more processes than some people, but every company has things that maybe we don't need to do anymore. Eliminate is the first key area.

How do I make this easy when it comes to IT support? Lots of people say, well, you just didn't want to have the people there. No, wait a minute. I personally don't have a lot of luck when I call any help desk. Do I really want to talk to a help desk? Anybody like calling help desks? No. So how about let's make it easy. Instead of having to call a help desk, let's have the Q&A available. Let's have chat available backed by RAG and backed by actual answers. Let's have automation there.

The favorite one is when someone can't get into their Mac because the password doesn't work. Why this happens, I don't know, but they got the password. No, it still screws up. It can send you a reset key and you're back and working. You don't have to talk to a person. It's available as part of the process. I really don't know why we need that, but it does happen. It's happened to me, so I know it happens. There are a bunch of little things like that that are simplified through this process: an easier and a better experience for users.

Then we have chat. If I use a phone, which we used to do, my people on the other end have to speak the language of the person that's having the problem. If I have chat, the person gets to chat in their language and I get to respond in my language. The translation can happen in between. Large language models, some have gotten very good at various languages, and they aren't that large. They're smaller large language models that can do the translation between the various different languages. So I don't have to worry about what's on the other end of this discussion. We do have multilingual people, but we don't necessarily have enough at the right period of time for all the requests. I am sure we don't have every language for every country we work in. We are in too many countries for that.

English is our business language, but we do want people. So models, and the smallest model possible. In some cases this is actually running on their laptop, doing the translation. I know that's not for everybody because everybody doesn't have the laptop that's going to run this, but some of them do. So we can offload that power as we move forward.

This is just one example of what we have done looking at processing, but it helps tell you the story of what should I do, where, and how should I move forward. In each case, what we do is look at what's the most energy efficient, reasonable, cost effective, that meets our ethical and business guidelines for what we're going to do. Looking at the smallest model possible.

When this started, we were using very small Granite models, and we could do it, or the language model from Granite, because it had the capability and we could train it on the particular things we did. But it also was highly backed by RAG because we really don't want the hallucination that happens, especially with help desk. We really don't want hallucination in the middle of that. So we want to look at what's the best way to do this and the efficient way to do this with the smallest amount of data possible.

The other example that we have started, as I like to say, long before the AI buzz. AskHR existed not based on large language models, not based on that kind of thing. It was using more traditional AI and more work to try and help provide access to systems. In fact, in many cases, I don't think it really was AI at all, but you have to use buzzwords. It was a set of services that would allow us to automate and have a conversation for solving problems. By adding large language models into the conversation part, it got a whole lot better, yes, but all the rest of the stuff doesn't have to be. It can use the most efficient process possible to make sure that it's providing these use cases to users.

This is an interesting internal HR example. It's actually one we talk about the most because we changed HR systems January 1st. Underneath the covers of the tool, we went from one HR system to another. Our managers still type the same thing in their chat, and in the background it worked in the new tool. So in many ways we had to do almost zero training to move from one tool to another because the way they experience the work process for HR was through their chat experience.

As an employee, I don't find this as successful. I'm not a manager, though, so I don't do most of the HR processes. I don't have to do all the complex things about salaries and hiring and all that stuff. I still use the new tool to find my paychecks. I figured out how to do that. But from a manager standpoint, significant help in this process. It's an example of using a small model to be as efficient as possible.

Another example, and this is what happens when you let people free with technology: they like to play. Our developer experience team wanted to figure out how we could be more efficient with our code. How do we do the things that developers hate to do? How many developers like to write documentation? Raise your hand if you're a developer and you like to write documentation. Really wonderful. Thank you. You're the one in a million, but I appreciate it, because I really do.

So it would generate documentation. We built a pipeline that used the tools efficiently to create documentation, create unit tests, create summaries, index the code, and give it back to a developer as a pull request. So here you go. Here's your work: the stuff you don't want to do, that doesn't require the intelligent imagination of building something new. Creating a unit test, it can do that. It can read the code and give you documentation. I need my developers focused on that creativity of building something new.

But it also would chunk and vectorize the data from the code so that when someone came in and said, I need to call a Box API, bad example, but okay, what it would do is give you back the implementation that's already been done for calling a Box API with all the security in it. So instead of having an LLM generate the code for you, it's giving you examples that you've already coded, that you already have in production. You're not having to have the LLM go do all that work and burn energy. You are getting back code that you know works, that you know has the security checks in it. By the way, that might not be there if you generated it. It'll have all your standards in it because it's already running there.

So it helped reduce the effort of building because people could reuse. We do have a large code base, and probably have written lots of code examples that can be used for this. It makes it easier for teams to be more effective. I won't say this is highly deployed because we do have a problem in the chunk and vectorized space with our CISO. Our code is very proprietary, and the CISO doesn't like somebody else, some other team, seeing that other team's code. If you have more open code, this is easier to do. If you don't, yeah.

One of the things that I told a story about last time I was here was our watsonx challenge. Internally, we do a challenge every year for the last three years. The first year, we gave everybody access to all sorts of tools and processes and burned lots of energy. Lots of people played, lots of people learned. The second year we did a very similar thing. This year we've learned our lesson in the last two years. We had lots of people building POCs of things that would never go live. We had hundreds of IBMers, well, thousands, 167,000 this year participating. Imagine that all building POCs, all doing things that are never going to go to production. That doesn't really make people happy. It also burns a lot of power.

This year, what we did was change the model. We said, you're going to build a business case. You're going to get all the training. You're going to get your learning about AI, but you're not necessarily going to build something until you go to phase two. In phase one we had 15,550 proposals submitted, and only 116 of them got selected as something that we would actually invest in. They built 116 MVPs. Of those 116, 12 were winning solutions which will go forward and be invested in by the business.

So we had many fewer people building things that are just going to get thrown away. That's the problem with hackathons and challenges. You build a lot of things, and then people get disappointed because their thing's getting thrown away. This time there are fewer people disappointed that their stuff's going to get thrown away. To be honest, there are probably a few more of those 116 that advanced that will end up being done. Because we did build an MVP, it was to the point of not just a POC but an MVP. We had coaches with each one of these teams so we could be more efficient in how we worked. We didn't have everybody playing with LLMs in a way that we could just burn through our GPUs. We allowed only a small subset of teams to actually build the process, and therefore it was much more energy efficient. It was actually more efficient for the organization. I hope it will end up being a more positive experience for many people because the last few years we've had people say, I built something and now it didn't go anywhere. This time we have fewer people who get to say that, and I think it will be a lot fewer in the end.

This is an example of how we went from way too much power consumption to understanding what's the impact of what we're causing to happen. I think this is partially through the maturity of using AI. We need to think about how we're doing it. Our new version of Watson Code Assistant that is used inside IBM has a little meter on it: how much have you burned? How much money have you burned so far? So it's clear to developers what they're doing, and we need to add that to more of the things that we're doing. But it gives people an insight into what they're doing in the environment.

One of the things that we were asked is, what's the problem that still remains? This is my ask: when you go back, think about the power and the water that you're using as a company in your AI processes. What's the right way as an industry to help us recognize that this is a problem that we have to look at? How can we improve our utilization? Maybe showing the real cost will help, or I'm open to other suggestions.

I think we all need help in this area because we all need to figure out how we don't build a power plant next to every data center so that we can run AI models to create recipes or take your choice of random something. I'm not saying we shouldn't use AI, but we need to be responsible in what we do.

Thank you for coming. Hopefully this was helpful.