Building (and Keeping) GenAI Teams
Michelle Gill, Sr Director of Engineering at GitLab, argues that building a GenAI team isn't just about hiring brilliant people — it's about managing the explosive tension between expert opinion and the unforgiving pace of AI innovation. Drawing on two years leading all AI, ML, and data science functions at GitLab, she diagnoses two root causes of GenAI team dysfunction: the "brilliant mind dilemma," where teams of experts generate competing solutions faster than they can ship, and the pressure of an industry that renders months of work obsolete overnight. Her practical framework covers how to structure a center of excellence, when to move people out of high-pressure roles, and how to keep elite engineers engaged once you've built them into a tiger team.
In this talk, you'll learn how to identify the right talent for GenAI teams, flatten organizational structures to accelerate decisions, apply tools like DRIs and time-boxed experimentation to cut through expert disagreement, and retain top AI engineers in a market where Anthropic and OpenAI are always hiring.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
All right. The first talk of the morning is Michelle Gill. She's Senior Director of Engineering at GitLab. She's currently responsible for advancing Core DevOps, CI/CD pipelines, source code management, planning, and more. Previously she was responsible for data science, ML, AI, and platform engineering teams.
So in short, she's helped lead many of her organization's highest-priority AI projects. As a result, she was able to assemble some of those experienced and passionate people from across the company into her organization.
I love this experience report because it shows us what life is like when you're on the frontier of these AI projects, which I suspect will be very handy for you to know as many of you get inspired or get chosen to lead similar efforts in your organization.
She will share some of the surprising situations that she found herself in, including what you do when the technology is changing so quickly, the dynamics that emerge when everyone's so senior and opinionated, and what it's like when this insane pace of marketplace competition is so much more. So here's Michelle.
Michelle Gill
Thank you. Hi everyone. I'm so excited to be here. Actually, I'm so nervous to be here, but Jeff Gallimore backstage and Kemp Beck a few days ago said, channel that energy into excitement. So I'm very excited to be here.
I'm going to talk to you today about building and keeping GenAI teams. You're going to need your best people, but that's just the easy part.
So what is it like leading teams from the bleeding edge? I describe it as: it's like conducting an orchestra where everyone wants to play their own version of the virtuoso. And so you need to balance their individual brilliant contributions and force them to align on a direction for the symphony. And in the background, the tempo keeps picking up pace. This is the pace of the industry. And of course, your competitors are just waiting for you to not act or make the wrong decision.
And who am I to say? Gene mentioned some of this, but from April 2023 — this is sort of like the rise of the AI engineer — until about the same time this year, April 2025, I managed all aspects of AI, ML, data science, you name it, at GitLab. So that includes AI engineers. They're actually building the features that are taking advantage of GenAI, but also model validation teams. This isn't spoken of as much. They're evaluating the outputs of either frontier models or the features that have been built.
ML engineers or MLOps engineering, they're laying the foundation for all of this technology. They're supporting the infrastructure that it's all running on. And then of course, data science and research teams who are just scouring really new advancements to understand: are there any applicable use cases to our company? So this is really the whole gauntlet of AI, ML, all the different personas, all the different minds.
And Gene kind of asked me, what did I do so right or wrong to get this opportunity? Let me tell you, it felt like I did something wrong.
[Unclear name] said in one of the day-one talks something to the effect of, technology isn't the barrier. People are the barrier. And I think in general, people are my sweet spot. I love to build cultures of adaptability, cultures of challenging one another. I think that really made me the right fit for a high-pressure, tight-deadline kind of situation like this. But not everyone is cut out for that. That's really, I think, what I'm going to talk to you today about: how to cut ourselves out for that.
So that was about me. This is about you. This is going to apply to you if you have or will have an internal tooling team or a platform team or a framework team or a foundation team. We all know these words. So this is — oh, my slide's a little messed up up there — but this is something like a centralized team at your company that's powering other feature development teams. Maybe they provide access to LLMs. Maybe they're responsible for your OpenAI membership. Or the second reason might be, if you have developed products at your company that have now reached a level of maturity where you need a special role. Maybe a hiring manager has come running to you and they've said, we have no idea what we're doing anymore. We must hire an ML engineer or an AI engineer, or whatever the case is. Some of the challenges I'm going to go through today would resonate with you if that's the case.
01Step one: identify the talent
Okay, what do you do first? Step one, identify the talent that you need to put on this team. Like I said, you're going to put your best people on the case. So I describe them as having three attributes.
The first would be natural curiosity. These are people who are absorbing advancements like a sponge. Any new advancement that comes out, they want to know about it, they want to experiment with it. You don't have to tell them to do that. They're just naturally curious.
The next would be grit. Grit is described by Angela Duckworth as having passion and perseverance. These team members are not going to be able to Google the answers to some of these problems. Solutions don't exist for them on Stack Overflow, and they're going to get faced with a bunch of brick walls every time they think of a really cool idea that's never going to make it to production. So they must have the tenacity to continuously keep going no matter what stands in their way.
Finally, technical versatility across ML, AI, and/or software engineering. You really don't want an expert in any of these things, but you're not looking for a jack of all trades either. You need someone who's flexible, they're adaptable, maybe they're language-agnostic, they're open to possibilities.
02Step two: form the center of excellence
Step two: DevOps 101. We know how to do this. You form your center of excellence. This is like I was describing a minute ago. This team now will be responsible for providing programmatic access to models. We've had a few talks already about this. This is like an API gateway that is maybe branching out to all different kinds of providers. But because of that job mission or charter for them, they're going to become subject matter experts on LLM nuance.
So they'll know things like prompt engineering techniques, evaluation best practices. They can tell you the difference between Claude and OpenAI. They can tell you the difference between Claude 3 and Claude 3.4. By the nature of their job, this means they're required to be up to date on all of the latest trends and advancements. They're your go-to for all of those things.
So organizationally, this is something like I'm describing that it might look like in the center. You've got your center of excellence. This is one team. Maybe you've got 10 engineers on this team, but it's branching out and powering all of these other feature teams. Those feature teams could be building chat bot, they could be building code assistance, one-off feature functionality. Maybe they're traditional feature development teams that are just experimenting with AI. But the fact is, all of it is being bottlenecked to this one center of excellence, this one super-expert team.
Finally, the team is ready. If you have done everything right, why are there so many problems now?
03Why the problems start
What do I mean by that? This is, I think, where [unclear name] and I agree: problems like achieving consensus. That's because with these different perspectives across AI, ML, and software engineering, those are different perspectives and solutions too. So there's an increase in competing solutions. But with AI, there's also an increase in experimentation, which also leads to an increase in competing solutions.
Or what about political reasons, like someone wants to be extra visible because they're up for promotion? AI is very exciting. You could make huge advancements in it. Or what if you have — and we have several examples of this — a self-taught ML engineer and they're on the same team as an ML engineer who just received their PhD from university, and now they're working on the same project with the same charter?
Next, what about accelerating a project? Last year, three things happened in very quick succession to one another that we needed to take advantage of at GitLab. The first was chat bots really took off. The second was context windows dramatically expanded, and so this meant we could stuff more and more context into an LLM to get a better output. And the third was AI agents started to make their way into production.
So all three times, if we wanted to really take advantage of those advancements and maybe be first to market in some of those cases, we needed to pull from our center of excellence because that's where all of our subject matter experts were.
Now, all three times — and like I said, it was just very back-to-back to one another — we did go ahead and plug those holes. We backfilled. So we would take somebody off the team, we put somebody on the team, and we kept that rotation of knowledge going. But a lot of people don't do that. And if we hadn't done that, I don't think we'd be on stable ground today. It's that center of excellence or that platform that's really most important for you.
Next, what about architectural proposals? I talked a minute ago about the competing solutions. The thing with architecture is it could last for a decade. There was one week that I fielded seven architectural proposals for the exact same problem. It was all context retrieval methods. By the way, we were already working on an architecture for context retrieval methods when these seven came in. So as there's an increase in all of these experimentations, there's also an increase in: what does the next decade look like, and what kind of architecture do we need to sustain that?
Finally, what about the supporting roles like your PM or your EM? First of all, they're on shifting landscape too. How would you like to solve for a business problem that has never existed in the first place? But also, how are they supposed to manage this team that keeps flying right past them?
So there are all of these problems that I've listed. I bet you all may be thinking of more in your head right now, but I'm boiling it down to maybe two root causes.
04The brilliant mind dilemma
The first is something I call the brilliant mind dilemma. This is something like, if there are 10 experts on the team, that means I'm going to get 10 really awesome solutions out of that team, which means I'm going to have 10 discussions, which means now I have to make 10 decisions, which means now I have to tell nine people no, which means they're going to disagree with me. And now I have to argue about that also, and it's just going to keep going.
So the irony about this is if I had a team of 10 average engineers, they would deliver way faster because they would have clear guidance and direction as opposed to this group of brilliant minds that can't seem to get along in the first place.
So these are definitely the people you should have picked to be on your team. That's because of their depth of experience, how passionate they are to innovate, their ability to challenge each other a lot — way too much all the time. But it's going to come with strong opinions. There are going to be disagreement loops. There will be parallel solutions.
Let me give you an example of the impact that a brilliant mind can have. Your customer support team is drowning in tickets. They're working nights and weekends trying to keep up with the pace, but they really need a solution that's going to last long term. One of them is technically savvy and they start experimenting with embeddings stored in a vector database using the automatically generated documentation that your engineers already provide. It works. They are able to automatically close a significant number of issues, and this could be life-changing for them. On Friday, they high-five each other because they've got a working prototype and they hand it off to your ML team for implementation.
Monday rolls around, your ML team takes a look. It's not too far off. We are currently talking about something called RAG, retrieval-augmented generation, and it will do what they need it to do for the customer support use case, but it'll be more stable and reliable. We've been talking about it for a while. We think we can get it there.
But week three rolls around. If we're going to be talking about something like RAG, we need to be talking about something called knowledge graph instead, because RAG and embeddings is only going to apply to that customer support use case. And if we use something called knowledge graph, we can stitch context together from all across the stack and it would apply to way more use cases than just customer support.
But five weeks later, someone told infrastructure and they give us a real reality check: it's not going to scale. It isn't highly available. And did you even ask how we're going to distribute this to our customers?
Meanwhile, your competitor already published a blog post that says they can close 60% of customer support tickets automatically. That's not even the point. Your customer support team really needed a solution to their problem, but you spent two months talking about it and there's still not a solution for them in place.
So there's this tension between being able to trust your experts to guide you into the right direction and the unforgiving pace of innovation.
05The unforgiving pace of innovation
And that's the second problem: the unforgiving pace of innovation. I have something that I call the two-month rule, and you can ask anybody about this. This is true. I tell my teams, if you have an idea for a blueprint, a re-architecture, an experiment, a feature, an architecture, any of these things, none of them are worth doing if it's going to take you longer than two months to accomplish.
If you give me an estimate that says it's going to take you eight months to deliver, why would I be supportive of that if in eight months' time an LLM could come about that automatically serves that use case on its own?
Most of all, that actually applies to people too. Even people are subject to the two-month rule. If you can't stay updated, stay aligned, or stay on par with direction or some of these advancements, the space isn't for everybody. And sometimes they need to move out of the way too.
And I'll give you an example of that. Jonathan was a leader that I worked with who took on an underperforming team. This team stood in the way of engineering delivering, and that's because they were meant to provide engineering with guidance on what to deliver and measurements on how we would measure the success of our delivery. So it was really critical that this team was performing.
Jonathan did everything right and I watched him do it. He set clear expectations with his team. He gave them frequent feedback regularly. He did it in real time, and he was genuinely investing in their career development. They were improving. I was so relieved to see it because we really needed this to happen. But here's the truth: they improved incrementally and the industry was moving exponentially. So their need for coaching was still actively causing us delays. Our features were still being blocked. Engineers were still waiting for decisions, and our competitors were still shipping.
So in any other industry or instance, I think we would've said, we see you're trying. We see you're improving, so we know that you can get there. So we're going to give you more time. But AI doesn't care about more time. The brutal reality is it only cares about results and timing.
So it doesn't have to be that way. It's not all doom and gloom. And by the way, moving people out of the way doesn't mean you have to fire them. I've already heard a lot of times at the conference so far, like, what are you going to do with your team? Fire them. You don't have to fire them. You can provide safe off-ramps into less-pressure areas of your company.
Some other things that we have seen work: first, flatten your organizational structures to promote faster decision making. That's going to help eliminate some of the bureaucracy that the teams face when they struggle to decide on a direction.
Next, just like with the two-month rule, shorten your timelines alongside whatever that pace of innovation is for you and your company. For me, it was two months. For you, it could be a quarter, it could be a year, whatever the case. But you'll use that to determine: when should I uplevel my talent? When should I fail fast? When should I move on?
Set unapologetically high standards for this team because, as I was describing earlier, they're going to face endless rewrite opportunities, endless brick walls. They'll debate with their peers, they'll debate with their management team, and they'll debate with other departments to get their jobs done.
Finally, manage those multidisciplinary experts who have the strong opinions with tools. Give them tools to succeed, like, for example, decision frameworks. I'll give you some examples of that. But most of all, support them in achieving that consensus with their peers.
06Flattening the hierarchy and making decisions
So this is that same org chart that I showed you earlier, only now this is what I'm talking about in terms of flattening that hierarchy. Before this, on the left, you can see each feature team here, branching off the center of excellence, has their own EM, PM, and leader. And actually, you can keep the same EM, PM, and leader. That's fine. And actually, you can have dozens of centers of excellence and dozens of feature teams.
The difference here on the right, though, is that it's all wrapped up under one leader, or maybe it's a group of leaders. But you don't want too many layers between who's making that decision and who's escalating that decision. And the important piece here is that it's all wrapped under a single AI portfolio. So there are no disparate leaders. They're all on the same team.
All right, more specific ways that I was talking about. First, for promoting faster decision making, going back to that context retrieval example where I had to field seven architectural proposals in the same week for the same problem. Think about something like a DRI or an SSME, so a directly responsible individual or a subject matter expert. I'm going to get seven context retrieval method blueprints, and I'm going to send them to the context retrieval method blueprint expert. And they're responsible now for making the decision, setting the success criteria, and informing all of the stakeholders. And I'm going to do this with every single decision that I need to make, because weekly there are dozens.
Next, once they pick a direction to go in, stick to the direction to go in. A lot of the times with this space, AI theory is competing with AI theory. We don't always know which results are going to be better. I don't know the difference between RAG results and knowledge graph results other than the number of use cases. I can give you evaluation outputs to say which one is better, but all of it is anecdotal. Most of it is running on synthetic data.
One more thing on this: when you stick to that approach, time-box it. Say, you know, we're just going to go with this for a month. I know you're going to have more ideas in that month, but we're not going to hear them. We're just going to wait until the end of the month. That's going to be our baseline for results. Then we can move forward.
Now this next one: moving forward using that evidence. Again, AI theory is competing with AI theory. Sometimes that evidence needs to come down a notch. For example, last year we were working on our chat bot getting to general availability, and arbitrarily we set the bar at 80%. That meant 80% of the time, we wanted it to respond accurately to whatever question the customer asked. And we worked for weeks and nights and weekends and we couldn't get it past 65%. And then finally, when we were going to pull the plug for general availability, we ran an internal audit and we found humans answer correctly 40% of the time. We had met that quality bar in terms of better than a human weeks before.
Finally, meet those experts where they're at. Talk in their language. When a PM comes running into your team talking about what a great new idea, can we please work on this next, ask him, who is that idea going to serve? What's the use case for that idea?
Or when the software engineer is panicking about, what if this happens in production, what about this edge case, ask them, what has happened in production? What is that edge case so that we can solve for it? Or with ML experts, ask them to quantify their concerns in terms of accuracy points, similarity scores, metrics.
07Retaining the team
Finally, if you've done all that right, you actually have to retain them. You've built a tiger team who could just go work for Anthropic or OpenAI. Now you have to keep them engaged. Barriers that will slow them down, like a lack of vision, an unappealing product idea, nonstop debates — that could disconnect them and they could leave.
Next, have you thought about their pay or career progression? Because they're under constant pressure to deliver, and they have increased complexity now that they've had to learn.
Finally, continuous learning. I'm so glad we're all here dedicating time to our continued education, but I bet you don't do it regularly. And I wonder if you're prioritizing that on your team's behalf too. The fact is, that's what landed them on the team in the first place. So while you're putting pressure on them to deliver, you can't let them lose sight of the fact that they must continue staying up to date.
Now, as much attention as AI has garnered, I hope this shows it's really, I think, the people who deserve that attention the most. It's the people who continue to blossom and surprise me and expand.
So that's all I got. Here's the help I'm looking for. If you've been successful integrating GenAI teams with software engineering teams, or if you have strong opinions about how DevOps will evolve in the industry, come see me. I want to talk about it. We're at Booth 8. Thank you. Thank you.