Forget Vibe Coding, Vibe Leadership is the Real Risk

Log in to watch

Las Vegas 2025

Forget Vibe Coding, Vibe Leadership is the Real Risk

AI-powered coding tools are fundamentally shifting the way developers write software. Junior developers can ship code faster than ever and senior developers are being overwhelmed with code reviews. AI suggestions are delivered with such confidence that even official docs get dismissed. On the surface, everything seems productive. But dig a little deeper, and you’ll find fragile codebases, missed fundamentals, and tech debt piling up faster than teams can track.

This isn’t a problem with AI, it’s a problem with leadership. Too many teams are running on vibe leadership: a false sense of progress without the oversight and accountability to match. If vibe coding is risky, vibe leadership is catastrophic.

In this talk, we’ll explore:

- How AI tools are changing the dynamics of software development for better and worse

- Why high-performing teams need comprehension and critical thinking more than ever

- Leadership pitfalls that can turn AI acceleration into tech debt

- Practical strategies for engineering leaders to foster responsible AI adoption, deeper code reviews, and sustainable development practices

AI isn’t going away. But if we don’t shift from vibe leadership to intentional leadership, we risk turning today’s speed into tomorrow’s slowdown.

Chapters

Full transcript

The complete talk, organized by section.

Ben Lloyd Pearson

All right, looks like everyone has made their way in here today, so I think we will go ahead and get started. I want to ask: who vibe coded for the first time this week? Just by a show of hands, probably 10% of you. That is really awesome. I love seeing that, because it has probably been a little over a year since I first did that, and the positivity around it this week has been awesome. If many of you had done it for the first time a year ago, you may have had a very different perspective on it based on where the technology was.

We are not going to talk about vibe coding today. This is about how we lead teams that are going to be vibe coding underneath you, and how going into that world without a strategy is what is really going to create the risk for your organization. If you have not vibe coded yet, it is cool, but you should probably try it sometime soon. Pick a tool and just play around with it. It is a lot of fun. But in the same way that vibe coding can go off the rails, vibe strategies can also go off the rails. We are going to talk about those strategies today.

There are three big sections of my talk today: first, how AI is going to change your SDLC over the coming years; second, where you are going to encounter new challenges; and third, how to set yourself up today for success. We are going to talk about pragmatic things that you can do today.

I forgot to introduce myself. My name is Ben Lloyd Pearson. I do DevX AI enablement for a company called LinearB. We are an AI productivity platform for engineering leaders. The more fun part of my job is that I get to run a podcast and a Substack called Dev Interrupted. We do AI news for engineering leaders, and we bring in really awesome people for full-length interviews. This week we had Ken Kocienda, who invented autocorrect, and had a really wonderful discussion with him. If you are looking for a new podcast or Substack to follow, check us out at devinterrupted.substack.com.

As part of this work between Dev Interrupted and LinearB, I speak to a lot of engineering experts every single week about AI. It has basically become the only thing that anyone wants to talk about from a leadership perspective. We are hearing common themes. Leaders want to measure the effectiveness of AI. They want to know how AI is affecting DevX and productivity. They want to maintain software quality while implementing AI across their organization. They want to make sure they are adopting the best tools. They want to prove the strategic value to their executives. There are a lot of questions, and I have learned a lot of unique approaches through these conversations. There are also common themes emerging, and that is what has made this past year exciting for me and what we are going to talk about today.

The TL;DR is: automation is the name of the game. If your organization today is set up to automate processes and workflows, chances are you are going to be in a good position to adopt AI within your organization as well. If you caught the presentation on day one from Toyota, they mentioned jidoka, or the concept of automation with a human touch. That is my personal word for 2025. I started reading the Toyota Production System book for the first time this year because I love it. These are automated workflows that have human-in-the-loop mechanisms, and that is what is happening with AI for organizations that have figured out how to adopt it within their workflows.

I have to include a meme: the agents are coming. If you saw the talk from the gentleman at OpenAI today, this word is overused, so of course the buzzwords are also coming. But I want to help decipher some of this noise today, and it is not actually all that scary.

I want to start with the 2025 Gen AI Impact Report. This was a partnership between Dev Interrupted and LinearB where we did research to study how AI is impacting the typical engineering organization SDLC. We surveyed engineering leaders from more than 350 organizations about how they were using AI in their software delivery workflows. We broke the software delivery process into 14 discrete steps spanning planning, design, coding, and release, and we asked a simple question: who managed that step of the process? Was it AI, was it a human, or did AI and humans work together?

From that, we created a chart. If you want the report, there is a link at the top, linearb.io/resources, where we have this report and other research reports. What I am trying to show is the holistic picture of where AI is right now taking over your SDLC. It should not surprise anyone that code generation is the first thing that sees the most AI activity. Things like writing code and tests are reporting about two-thirds AI and mixed AI-human generated activity. Other use cases, like writing PR descriptions, are starting to be fully automated by AI. Reviewing code has a lot of AI code review tools on the market; LinearB built our own as well. Validating releases also scored highly for being managed completely by AI at this point, so these are rapidly being adopted.

The key takeaway is that this agent transformation is happening, but it is uneven across the SDLC. Everything to the right and left of writing code still has a lot of greenfield and potential for AI adoption. Project planning, requirements, and architecture still have humans heavily involved. Until that changes, you are going to have new and interesting bottlenecks.

Many of us are already accustomed to having some form of rudimentary agent working on our code bases. We encounter basic maintenance bots like Dependabot or security services from Snyk or Mend. We have also encountered large enterprises that built custom bots that merge thousands of PRs every month. That is an extreme case, but we are now seeing AI start to become those bots that affect code bases.

One example I love is a story we covered at Dev Interrupted. Google published research recently about using AI to assist with a large 32-bit migration project. It is not exactly the most exciting work that every developer wants to get out of bed and focus on every day. They built a multi-step workflow that identifies references across the code base, categorizes code changes, writes the migration code, and eventually merges the changes. Within that workflow, there were human-in-the-loop moments where the LLMs simply were not capable of making the right decisions, or they needed human assistance. They looped in a human when necessary.

They migrated 600 code changes. Developers accepted 75% of the changes from their AI workflow, and when surveyed about how much time they thought they saved, they said about 50%. When you hear tech executives making bold statements like, "50% of our code is generated by AI," if that is true, I believe this is the type of stuff they are talking about. They are not building new product innovations and rapidly iterating new features. They are doing mundane, toilsome work with AI in these types of workflows.

What is emerging is a timeline. Back in 2023, we started adopting coding assistants like Copilot, Cursor, and ChatGPT. These provided about a 10% to 30% productivity improvement by making it easier to generate code. Today, an increasing number of agentic systems are emerging, and they are starting to push productivity beyond the 50% mark, such as in the Google use case. Those things will mature into products over the next couple of years. Right now, most examples are bespoke custom things that organizations are building internally. The next wave is when we introduce higher degrees of autonomy to these agentic systems. That is where promises like 2x to 3x, or maybe even 10x in some situations, start to come to fruition, but we are probably still at least a couple of years from that being true.

The problem is that a conventional delivery pipeline probably is not ready for this change. There are new risks and bottlenecks introduced and magnified. If you caught Nathan Harvey's presentation about the DORA report and how AI magnifies everything within your organization, if you have existing bottlenecks and risks, AI is going to magnify those challenges. That is what I want to dive into now.

AI differs from a typical human-powered software delivery pipeline in a few basic ways. First, LLMs lack the context of your quality standards. Think about the equation two plus two equals what? Most people would say four, and you are correct 99% of the time. But if you ask what two plus two equals in the context of the book 1984, the answer is actually five in some situations, because it is an allegory for propaganda and how reality can be shaped. When you think about "two plus two equals" as tokens, there is a chance the output is not what you expect. From an engineering perspective, you either need to spend time up front providing that context to the model or spend time afterward reworking it to fix it.

Second, LLMs create code at a pace that we have never experienced before. I encourage people to get familiar with disposable code: have a process for rapid ideation, validation, incorporation, and destruction. Get used to new code being created potentially from non-engineering resources like marketing or product people. Have a way to quickly validate it, incorporate it into your main platform, and destroy it when it is no longer necessary. Even if it does not get incorporated into your platform, you can still use disposable code internally to solve challenges rapidly.

Third, AI is going to create new and unexpected delivery bottlenecks. Maybe you can generate code or new features faster, but can you generate vision for your product faster? If the answer is no, you may not actually be able to move faster as an organization. There is also a lot we are still learning. Questions remain around who owns the process and code generated by these tools, how we upskill junior developers while maintaining the value and integrity of senior developers and accommodating their different needs, what delivery gates we need, and the unknown unknowns we will encounter as these technologies develop.

I want to highlight three big risks that we encounter time and time again, leaning on people from the Dev Interrupted community. The first is data silos. Brandon Jung, VP of ecosystem at Tabnine, talked about how agentic AI often fails without clean, centralized knowledge bases. Fragmented documentation reduces effectiveness and reliability. Data hygiene, high-quality data, and training resources are all important. He promoted the idea of a golden repo: think about your idealized code base or piece of code and use that to help models understand your organization's standards.

The second risk is tooling and infrastructure debt. If you are on outdated or non-modern tooling and infrastructure, or you have manual operations and ticket-based workflows with many human interactions, those things do not scale for AI-driven development. A common pattern is platform engineering being leaned on to help solve this: give them the directive to mature your pipelines, abstract complexity away, and enable safe and scalable AI adoption. That advice came from Cory O'Daniel, CEO of Massdriver.

The last tip came from Brooke Hartley Moy, CEO of Infactory, who talked about oversight, governance, and trust. Governance, when used correctly, enables trust across your organization. The more systems you put in place to validate AI models and help the human in the loop ensure they are making the right decisions, the more trust you build with AI workflows and the more effective they will be.

There are big risks involved, and there are three common reactions that are not productive. The first is to freeze: things are uncertain, so wait for leaders to emerge in the AI space, halt investments until things clear up. The risk is that your competitors are not going to do that, and maybe you are giving them an advantage. The second reaction is to ignore it: teams may say they do not need AI, AI does not work for their use case because of their tech stack or operating model, or they are not allowed to use AI because of compliance or security requirements. These are teams that continue with business as usual. The third reaction is to do everything everywhere all at once: buy all the tools, AI all the products, vibe code everything, and create a bunch of noise. None of these will help you be more productive.

For the rest of the session, I want to focus on building a pragmatic AI strategy, keeping it simple and focusing on two things: automation and visibility. If your organization is built for automation, you are built for AI. Visibility means making the right decisions, having data to validate that you are doing the right thing, and getting buy-in from executive stakeholders.

Starting with visibility: everyone should have a baseline understanding of software delivery quality and efficiency, however you define that. Engineering metrics are commoditized at this point, whether you are vibe coding dashboards in your free time or, in my opinion, you should just have an MCP into your data. Everything is just MCPs at this point. Even in the uncertain world of AI, conventional frameworks still apply: DORA, SPACE, and newer AI measurement frameworks. You do not need to spend a lot on fancy dashboards and visualizations. Get a baseline in place today so that as you adopt new technologies, you have something to reference and track. Your goal should always be value delivered, not hours eliminated. Reducing toil is great, but only if it helps the business have a greater impact.

We have too many measurement frameworks, and naturally I have my own as well, or one I helped contribute to. I want to keep it simple because this is evolving rapidly. The industry should focus on two big things in the AI journey: adoption and impact. Adoption asks whether your developers are actually using the AI tools you give them. Impact asks whether that AI provides value to your organization.

For adoption, there is quantitative and qualitative analysis. Quantitatively, look at tool adoption and usage data. Qualitatively, get developer feedback about their perception of those tools. The basic question is: are developers using the AI tools, and are they happy about it? If so, what are they doing successfully so you can surface wins to the rest of the organization? If not, why are they struggling, and what can you do to unblock them? Keep it simple.

For impact, conventional quantitative metrics frameworks come in. Benchmarking can be helpful internally through before-versus-after analysis, cohort analysis, and comparison against industry peers. Benchmarks set a stake in the ground so you know where you are relative to others or a point in time. Qualitatively, use survey insights because we are in an experimental phase of AI adoption. It is about surfacing wins and failures quickly to help improve your organization and iterate on ideas.

For AI adoption and usage, most AI providers now have APIs and digital signatures in Git history that make it easy to track when they add code to your code base. Most track active users and suggested accepted code rates. These are great for surface-level usage analysis, but I highly discourage setting goals around them. When an executive says, "We produced 50% of our code with AI this month, and we want to do 60% next month," that is a very dangerous goal without many qualifiers. Really, all you want to know is whether developers find the tool useful enough to use regularly.

Surveys should always accompany a new tool rollout. You want to capture an aggregate score representing the overall success rate, like comparing tool A at 4.2 out of five against tool B at 2.7 out of five. That gives a relative comparison of overall satisfaction. You also want specific feedback that uncovers discrepancies between teams, roles, and projects. Every developer experiences AI differently, and direct qualitative questions are the quickest way to surface that information.

For quantitative metrics, there are already well-established frameworks like DORA and SPACE, and newer frameworks like DevEx and Core Four. At the end of the day, what matters is that metrics have an agreed-upon definition, tie developer effort to strategic executive value, and are actionable to make improvements. Developers need to feel that improving the metric will improve the organization.

MCP seems to be the name of the game. We view MCP as an artifact generator. Once you have data that gives the context you need, along with tools to access it, MCP becomes your way to generate whatever artifact you need, whether it is a visualization, analysis, or other context. It is becoming a super layer for interfaces and user experiences.

Benchmarks are important because you need to know where you stand. Every organization is different and every team is different. Setting the stake in the ground is important. LinearB publishes an engineering benchmarks report every year; use it to get a relative benchmark. If you are not in the elite category for everything, do not feel like a failure. There is always context behind these things. It is a good way to know where you are relative to the world around you.

That was visibility. Now let us talk about automation. For most teams today, the pull request has become the new major bottleneck for most software delivery workflows, and human-driven workflows are not scaling to meet the needs. We like the Ship / Show / Ask framework proposed by Rowan Willock as a great way to approach automating AI. I will provide specific examples.

Start with cues that tell you when a process can be automated. These are words like always, these, only after, and never. For example: new code always needs to be reviewed by a certain team; I never ask a certain person to review code; pull requests always need to reference certain information, like project management information.

One low, easy way to get into this is that most organizations have safe changes that do not need much scrutiny before they are merged. For these, make it as simple as automating hygiene processes. Provide context for historical reference and make it easy to understand why code exists at a glance. The goal is to make it so humans do not have to click green buttons for things that do not create risk for your organization. We have talked to organizations that spend hundreds of hours of development time a month with developers going into GitHub to click a couple of green buttons to merge a PR that does not give them risk. That is the basic idea of "ship": something can be shipped without a human in the loop to review anything.

More services are going to operate according to the semi-autonomous bot pattern as AI rolls out. Whenever these services have to await human approval, that becomes the new bottleneck. LLMs can help automate decision making for what is safe versus what needs human attention before review. For a dependency management policy, for example, all security patches might get automatically merged into the code base, while a minor update gets automatically approved but still requires a human to merge it.

You can use AI to reduce the toil of providing context to the rest of the team. Every organization should be using AI for PR descriptions, generating documentation, and improving knowledge sharing. Automatic pull request descriptions are like the first ubiquitous use of AI in the typical engineering organization, even more so than code generation. Nobody wants to write pull request descriptions. They should be automated because the tools can do it. This improves context for both human and AI reviewers and reduces toil whenever a human has to review a pull request.

As you automate fundamental context sharing, it opens the door for higher-level operations. Our team fully automates post-sprint reports. There is never a question about what is being shipped into our code base because PR summaries are automated, so now we can automate the summary of what work gets done every week. Immediately we save every developer potentially 20 to 30 minutes every week, sometimes more.

Then we get to "ask," which increasingly starts with asking AI. The biggest benefit of an AI code review tool is the shortened feedback loop that preserves human attention. It also creates the opportunity to fully automate some PR approvals. For example, for prompt changes within our product, if AI is okay with it, we allow a human to merge it without another human review. For code changes that affect more important things, we still require a second human reviewer.

If AI code review is not enough, you want to bring in human attention, but before that you need to provide all the context humans need so they know quickly what they are looking at. One of our favorite automations tells you how long a PR takes to review, which helps both the author and reviewer deduce the mental load of reviewing that PR.

The last step is when you loop in the human. We use GitStream to build custom rules that determine who should review all of our PRs, and it manages the process for us. The important thing is that teams can configure policies for their unique practices while centralized policies are enforced when necessary. Many tools are too general purpose to meet everyone's needs, so there has to be flexibility: centralized control plus customization for each team.

I will leave you with a few final tips for automating the AI era. Use deterministic automations when possible. Even though we have AI, do not solve every problem with it; some problems are deterministic and need regular automation. Give your teams room to experiment; this is the time to experiment today. When they find wins, surface them so others can learn. Measure your success, know what is working and what is not, and when you fail, audit it to learn and improve. Finally, remember the human in the loop and design these systems to leverage the strengths of the people who work for you. I do not think AI is replacing people, and I do not even like the statement that people using AI will replace people who do not use AI. I think this will become a tool we all leverage to take knowledge work a lot further.

There are disruptions coming fast. I have free affirmations if you would like an affirmation after the talk. Thank you for coming to my talk.