Every AI Engineer Deserves an AI Platform, and Other Lessons Learned

Log in to watch

Las Vegas 2024

Download slides

Every AI Engineer Deserves an AI Platform, and Other Lessons Learned

Patrick Debois

Dev(Sec)Ops Advisor & Author

Every AI Engineer Deserves an AI Platform, and Other Lessons Learned

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

In the DevOps community, one of the central figures is Patrick Debois. So I met him in 2010 when he ran the first DevOpsDays conference in the United States. And from the very first moment that I was there, I knew that I had found my tribe. So we know him as the godfather of DevOps. He coined the word in 2009. And over the last two years, many of us have watched with amazement as he continues his adventures in genAI, and was actually responsible for bringing AI features to market back when he was the VP of Engineering at Showpad. And he continues his pioneering work, understanding how technology leaders build and run these services in production. And so he's a co-author in the DevOps Handbook. He's always on the frontier of something important, and so it shouldn't surprise us that he's, you know, in the space right now. So I'm so excited that up next is Patrick Debois.

Patrick Debois

Hey Gene. I track who would've thought last year, you know, four presentations now, a whole day.

So what's important, you know, DevOps is that — I got bored after 15 years of DevOps. This kind of thing reignited me, and I figured out why. There is something about this automation intelligence — that's how I look at this AI, like helping people automate things. And that's something we've done for years and we worked around.

So Gene mentioned it. I had the opportunity to be early on at Showpad delivering this to customers. Like a year and a half ago we were at the sweet spot. Showpad was a content management system for sales and marketing. So you can imagine every competitor pressure was on.

But then you see, you start with one team as one does with new technology. You have a few teams, maybe you're at the stage, and then our CEO said: we want every team to use that new technology. It's not the data science team that I want to give it to. Every team needs to be enabled to do that.

And with every change, like DevOps, the people in the new field want to have an identity, right? It's always worse — and we'll hear about the AI engineer as a term where you find the stories under this. This is kind of like a label. DevOps was a label to get the stories out — like in this conference. AI engineer is one of the others. You can find good people advancing the field in there.

You know, maybe this is also bringing a technology shift and an organizational shift. We've noticed with DevOps kind of there was an organizational shift. What I saw now is the data folks that were somewhere in their data dungeon or data lake as they call it — that had to come closer to production. We'll hear later about this today. That is kind of shifting, right? We had shift-left for security, but now we're shifting the data folks, right? Because we want it to be used in production. Finally, it will bring the value that they all promised us.

And to scale things out — we've seen yesterday, Team Topologies. It's known in this community. Funny enough, I did the same talk in the AI engineer community. Only 5% of the people knew what Team Topologies were. So definitely we're in two worlds of silos again. So I'm trying to do my best to kind of bridge it.

So we talked about: where do we put this? And it's a question I get like, where do you put the AI team? Is it more like the mobile team where we kind of put them in one team? So more like the feature team or the stream-aligned team? Or do we actually want to scale this out to the whole organization as a component to use?

So what I'm going to talk about is to do it in a platform team. But I'll leave it up to you whether that's something you want to do. But I do believe that this technology is like a cloud technology and will eventually end up in all the teams that we're providing value through the business for.

So how do I see this? A platform team I've never seen is disjoint with being and also doing enablement. That's the — as a product, you do the enablement, and then you put governance in there. So always the three things, for me, hang together.

So what goes into a generalAI platform — and it's been talked about by John — you know, access to the models. We don't want every of these groups to find out to kind of accept the models. So that central piece: who owns the governance, who puts the models, who gets the access? Definitely one of the first parts that that team's getting questions about.

Then we want to have all that data indexed. So things like vector databases — another infrastructure component they can bring into their remit.

And RAG as a service — previous talk, we're all RAG people, right? So maybe there's RAG ops, right? We're now babysitting the RAG pipelines, but it is just another piece that we need to have in the infrastructure that's going to be used by multiple teams.

The models — we will have version control, registry if we're doing some fine tuning. So nothing new there, but just a new set of capabilities. And you might not have had AI before because that's not your dataset, but now with the whole fine tuning, you might have to do versioning on your own stuff as well.

The unified proxies — who gets access, what rules do we put on — another infrastructure component if you want to scale this out to multiple teams that that team is providing functionality for.

And then much as we did with kind of the monitoring and observability — whatever we're doing with the prompts, there is also kind of this traceability and debugging that we want to do. We don't want each team to do this differently, because imagine it goes from one team and like the API calls go to the other teams. That's very similar — we want to have this tracing functionality. And I know the traditional monitoring vendors, observability vendors are building this in, but it's another piece of the component when you go to production.

And then much like you had health checks to run in production for APIs, we now want to run health checks for these LLM functionalities, because maybe the models get changed. You got an update from OpenAI, right? What's happening in production? So you have these probes and kind of see what happens. It's one of the things that that team can provide.

Feedback as a service — it was mentioned this morning, thumbs up, thumbs down. In our experience, people have a variety of reasons to put a thumbs down because they had a bad coffee, like, you know, whatever. So we started looking for other signals. And you'll see that very subtle — for example, the OpenAI interface — when you would copy something, that's an indicator they find the result useful. If they say try again, they find it not useful. So they might not be inclined to do thumbs up, thumbs down, but these are other signals. And then if you want to even get a little bit better feedback, you say: here's an editor, change it the way you like it when it was generated. So you get to see actually the improved part of which they liked and they didn't like. So you don't want each team to build a feedback service like that. But this is, again, another shared component that you want to have that kind of platform team provide to your developers' teams.

And like, you know, the cloud — there is an explosion of tools, right? So keeping on top of that — that's just the platform infrastructure piece that I mentioned. But it definitely makes sense to centralize it and not have every feature team kind of do this on their own. And yes, I know it's the "build it, you run it," but you know — within guardrails, within standards. It's something we heard about yesterday as well.

Second part: enablement. So we built this whole infrastructure layer, and we think about this as a platform as a product, but we want our teams to actually use this technology.

So one thing we did was we encourage prototyping, right? They don't have to go to OpenAI, but we provide them the environments that they can play around — because that's the models that they're getting from that team. That's the pieces they're going to play with. And so this could include non-devs — also product owners — to make this accessible. So that's kind of enabling things on your early journey, so they get a feeling of the power of these things.

You might want to help them and standardize on a framework. Although the caveat is: after six months everybody hates frameworks. At least, you know, that's my data. And the reason is because the frameworks are brittle, things are moving so fast. They're often a thin veneer layer of new capabilities and abstractions, right? I assume that's probably going to settle in the future. But right now there is a tension between using a framework and not a framework. In the beginning definitely helps you, but once you get a little bit under the hood, you kind of are able to do it yourself.

Then you want to have them set up a local dev environment or a testing environment. There's many tools now — local models are getting up to par in a good quality that they can be used, you know, not only for coding but for the variety of tasks.

And it is also important — as it was mentioned in the kind of talk about Adobe — if you don't have a good business use case about doing the AI for the sake of AI… I always say: up until kind of now, maybe you run your project on the marketing budget. But if it becomes production budget, you better have customers paying for this, right?

The other thing I saw is that when you go into a company and they're kind of like, "Hey, what do we do? Fine tuning, yes/no, what's your genAI strategy?" — it's like, okay, listen to what the customer wants, right? That's also an indicator when you go in: don't prematurely optimize on cost. This is chaotic times. You don't know. This is an investment. This is learning, right? This is your education budget while you're doing things. So of course, if you're not going to pay more — well, you're going to learn less in this phase. That's the state we're in.

And as a developer — you know, in this community we talk a lot about developer experience for coding and building stuff — but the developer experience, like I mentioned, the frameworks and all the things to build genAI applications, they are really bad, right? They're brittle, you know, they're not matured. But that's the state that we're in. This was the same with cloud early days — everything changed from week to week. That is just happening. So a lot of problems on getting access to the data, the frameworks, outdated information on the internet because things move so fast. So you have to overcome that kind of pain from the developer experience.

The biggest pain we've found — and it's been now a few times where there is a new technology, whether that's mobile, serverless, and so on — the first half year everybody's like, "let's get it working," and then the next six months it's like, "s**t, we need tests." I don't know why it is — we can never do this in the beginning. But it is very specific here — how do you test genAI? It's something that the developers are not used to. Test deterministic things — so for them it is really something they have to learn how to deal with this.

And it becomes really important because that landscape is moving so fast that if you change the model, your output will be different. If you change a prompt, you add an extra semicolon, it will give a different result. So it's that brittle. So you do need to test.

So we said: last year was LLMs, what I'm hearing this year is about getting the testing — the evals, right? That kind of, definitely, a pain. But you have to help the developers overcome this pain by explaining that to them.

And the way I typically explain this is: there's exact testing — what we're used to. There's pattern testing. We can ask a model — a specific sentiment, and we can see what are the question words related to the answer. Or we can ask another LLM. And that's kind of what you see: if there's a problem with AI, we're just going to use more AI, right? So it's a little bit weird, but that's the state we're in.

Now, we saw the previous talk — this is definitely one of the things you'll have to overcome. So part of this product — part of this is productivity of the engineering. So we had a lot of cases now presented where it is like there's more productivity. This one explicitly says that the review times go up, because the code generated by the models actually is bigger.

So it shows one of the points: that the role of the developer will be changing. We'll be less producing, creating — we'll be more managing. Ironically, we'll be more of an ops person than a dev person again, right? The revenge is coming hard.

Um, so in the beginning of DevOps, there was this paper called "The Ironies of Automation." And for those who remember that, right — there's a certain thing: like the more you automate, the harder it gets to understand when things fail. And this is a paper you can read that is actually transposing this to the genAI things. And a few things is that, for example, when you're doing the automation, when you're doing the co-generation, it has to adapt to the mental model of the user. So if you're a junior, has to explain things as a junior. If he's an end user — kind of, that understanding is good. But if you're not producing anymore, you don't are trained anymore at producing, you're only at reviewing. But if you never produce, how will you ever review? So that's why it's really hard, as mentioned in the previous talk, that a junior will have that problem.

So in DevOps, the journey was: automate — that's the state we're in in genAI. Automate, right? We figured out CI/CD — that's the evals. But then you saw like a whole continuing on: hey, we need to have better monitoring, we need to do resilience engineering, we need to do observability. And then because we haven't trained anymore to do systems when they fail, let's do chaos engineering and introduce failures because we can get better at dealing with failures again. So that learning is probably where we're going to fill our time that we saved in producing. So there's no escaping the time. But as we learn — some company will just skip the learning and they only produce, right? But that's kind of the tricky part.

Governance — I'll go over that quickly. That's the last part. You know, the tools are coming — they're just watching your desktop. It's not just what you put in your IDE; it's just recording your desktop. So you need — when your employees, they need to be aware of these safety risks. You want to opt out. It's not always that easy with those tools, right? You spend a lot of time like: where do you actually press the button to opt out on these models? Some will enable it. Again — it took OpenAI many months to actually have a button opt out. Really strange. But licensing will be talked about later as well. You know, what's the provenance of your models and so on.

And then the risk levels, right? So all these things you want to kind of instill with that whole team — that you explain that to your developers, that your whole organization. That is not just about the coding and the producing — guardrails, that's part of that making failure safe.

PII security metrics — those tools all exist. But if you think about, "Oh, we're happy that the chatbot worked," you know — the other 80%, that's all this work.

So the steps to scale this: use the platform, have it enabled, do the governance in there. And if you want to go beyond platform teams, there's something called the Unfix model. And you can go beyond just saying, "well, it's the infrastructure platform, we're providing this to all the tools." Maybe you need a unified way of presenting this to the customer. So the experience team aligns this across all your products in the same way — uses the same color, uses the same sparkles, uses the same things to do that.

So in a nutshell, I want you to put your genAI team next to all the others. And that's probably how I think you can scale this.

And I hear you — hiring people is going to be hard, right? We've seen the same ratio with DevSecOps and kind of that way. But I think you can hire just the full-stack developer. You don't need the machine learning data scientists right now. They're happy to do this — we're good at integration, we've done this before. And we have a workshop coming up for that — if you want to bootstrap your engineers in that and talk to me later. And thanks for that.