Scaling AI Adoption Across 3000+ Developers at Booking
When Booking.com introduced AI coding tools to over 3,000 developers, adoption was slower than expected due to organizational as well as technical challenges. With a strategy focused on experimentation, real business use cases, and close measurement of impact, we turned that early experience into a foundation for widespread adoption at scale. Now, two thirds of our developers use AI tools consistently, resulting in accelerated time to market, more time for innovation, and fast-tracked modernization projects. This talk will share the practical lessons we learned about driving AI adoption as an organizational capability, not just an individual productivity tool.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
To start us off, we have two presenters who I greatly admire for their incredible cerebral and deliberate approach to measuring the impact of AI and dev productivity.
Bruno Passos is Group Product Manager for Developer Experience at Booking.com. They're the world's largest travel agency with over 3,000 developers. Bruno's mission is to eliminate developer roadblocks so they can do their best work. Over the last year, Bruno has been heavily involved in their GenAI innovation efforts within engineering, and I'm so grateful for the case studies that he has shared, which informed so much of my own thinking, which went into the Vibe Coding book, as well as the 2025 DORA research effort, which will be announced tomorrow.
Bruno will share some of the problems they're attempting to solve, the reasons they're pushing for the adoption of AI tools to developers, their strategy, challenges, and the rewards. He'll be co-presenting with the amazing Laura Tacho, CTO of DX, the engineering platform led by awesome productivity researchers. Here is Bruno and Laura.
Bruno Passos
Good morning, everyone. Gene, thank you so much for the introduction. It's very kind, but I don't believe we could be here today if it wasn't for the work that this incredible woman and her team do on a daily basis.
Laura Tacho
Thanks, Bruno. We are here today to talk about our story and our vision toward scaling AI across an enterprise with more than 3,000 developers.
Bruno Passos
Before I go into that, I would like to talk a little bit more about scale. At Booking.com, we have about 1.5 million room nights sold every single day, with 500 million monthly visitors to our website, and around 24,000 employees across the globe, with roughly $23 billion revenue as of 2024.
From an engineering perspective, and that's what we are here to talk about, Booking.com serves, or has, more than 3,000 developers, and this is what me and my team call customers. We serve 250,000 merge requests in a given year, with 2.8 million CI pipelines running in a given year, and a very healthy and balanced legacy codebase. That's a joke, and that's why we are here to talk about how we are trying to advance on this effort.
Laura Tacho
Bruno and I go back quite a while. DX and Booking.com joined forces in 2023 because Bruno and his team are only what I can describe as obsessed with developer experience, and they were very committed to understanding developer productivity, finding the bottlenecks, and removing them. 2023 was also the year that Bruno and his team started experimenting with very early AI tools. This was kind of at the very beginning of the genesis and explosion of AI-assisted development.
Bruno Passos
Like I mentioned in the beginning, the framework and the platform that DX provides has been incredibly powerful for us to be able to move forward on what we are trying to do.
Laura Tacho
If you haven't heard of DX, we're a developer intelligence platform. We are at the forefront of developer productivity research and AI impact. We have the honor and privilege, truly, of working with teams like Bruno's at Booking.com, Tesco, ADP, Vanguard, so many of you today in the audience are proud customers, and it's really our pleasure to be able to work with you and partner with you on solving these very difficult problems.
Our story today is the story of Booking.com, and how an organization with 3,000 developers and a 25-plus-year-old codebase was able to scale AI adoption so that right now, this very day, over 75% of the developers in Bruno's organization use AI. This is really tremendous, and they did it in two years. We're going to talk about how they got there, the laser focus on business problems that they had at the very beginning, their early experiments, the early data that they gathered, and how they turned measurement into a core strategic pillar. We'll talk about their path toward widespread adoption, finally get into some of the great results and impact so Bruno can brag a little bit, and talk about what's next for Booking.
Bruno Passos
Let me start by introducing our approach that we are true to even today. We focused on three main pillars.
We wanted to accelerate our tech modernization, so removing that 25-plus years of legacy codebase. Our metric here was: can we reduce the time to complete the project? We had a rough estimate that it would take about 10 years to replatform our codebase, and so our goal from an AI enablement perspective is to roughly halve that time.
The second pillar is removing the toil from the day-to-day of our developers. Today they focus way too much on maintenance and fixing bugs rather than innovating. Our metric here is innovation rate, and our target is about 80% of the time that the developers spend in front of their computers coding should be innovating instead of fixing.
Last but not least, we want to ship code faster and with better quality. We are aiming to reduce change failure rate long term, and we want to increase our PR throughput.
Everything that we do at Booking, and if anybody had the chance to research how we develop things at Booking, is heavily based on experimentation and setting hypotheses. This would be no different for AI, and that was really important for us to be able to talk about what we're talking about here, because it kept us true to what we were trying to do and going farther and farther away from the hype. We were able to ground ourselves on clear hypotheses, and also allowed us to monitor the progress we were making, as well as comparing between non-users within our organization and heavy users of AI within the organization.
Like Laura mentioned, about two years ago we started with a tool called Sourcegraph Cody. Why we started there: Sourcegraph has a really powerful search tool that we use within the company that allowed us to move faster searching code within our legacy codebase. Cody took advantage of the code that was already indexed in our codebase to help us experiment with AI within Booking.com.
We had a few barriers. Our engineers were subscribing to the tool but stopping using it almost immediately, and we didn't know why that was. The second one is we needed to find a clear framework to measure the impact of what we were trying to do.
Laura Tacho
There were some early indications that measurements like time saved per developer were going to be something important to track ROI, but there were so many other questions. For an organization as committed to measurement and experimentation and scientific approaches, hours saved just didn't cut it. We had to get deeper into other impact measurements like PR throughput, quality, security, code maintainability, and developer experience. There were strong signals that hours saved was a good place to start, but we knew that we had to go further.
I want to fast forward for all of your benefit two years into the future to this very day. Bruno and his team, along with other teams like Dropbox, we have worked together and studied them, and we put all of that information into a recommendation for how you also can measure AI impact in your own organizations so that you don't have to go through the grueling two years of work and discovery that Bruno and his team did. You can find the full white paper underneath this QR code here, or stop by the DX booth and we'll be happy to talk to you about it.
Learning from Bruno's experience at Booking.com, we wanted to know: how are developers using AI? How are they adopting it? What is the actual impact on engineering performance? And then what does this all cost, so we can put together a strong ROI case?
One of the things that Bruno and his organization, and I think so many of you here today, are finding yourselves stuck against is a lot of hype out in the media, and then seeing results on the ground that maybe don't live up to what you're reading about. Data beats hype every single time, and this is something that Bruno and his team really took to heart early. With all of these high expectations and sensational headlines, the only way to stay grounded and make the right choice for what's right for your organization is to stay grounded in the data, which means that you need to measure and be able to observe exactly how AI is impacting your organization and your practices.
Bruno Passos
Let's talk a little bit about the path that we took in order to make sure that the majority of our developers use the tool.
Like we mentioned from the beginning, stay focused on data whenever possible. Partnering with DX has been incredible. We were able to delegate that to them and really focus on our developers rather than trying to reinvent how we measure the impact of it.
Uncover the barriers, discover the barriers that your developers are going through in order to be able to make sure that you overcome them and they ultimately adopt the tool. Then enablement: whatever we discovered from our developers, we were able to feed into enablement and trainings, and building a community of developers that teach each other the new tool.
We had a few organizational challenges. As you can imagine from our scale and the profile of our company, we have very heavy compliance, security, and procurement processes. That was a real bottleneck for us to be able to try different tools. We started with Sourcegraph Cody, but from the beginning we had the charter to be able to try as many tools as we wanted, to give our developers the chance to use whatever they wanted to use. At that time, we had a timeline of about six months to introduce a new tool within the company. We knew that didn't cut.
We also had some developer hesitations. At the beginning, when I mentioned that developers were starting to use but stopping almost immediately, we started double-clicking on that and we realized that there was a real fear toward losing their jobs. I'm sure we've all heard about that. There was uncertainty about where in the codebase they should be or could be using GenAI. Last but not least, we had a lot of copyright and licensing questions that our developers were asking. That was really the beginning of us being able to uncover some of those barriers for our developers, and we saw immediate adoption past that.
From a developer perspective, we also saw that they didn't understand the technology. Teaching them the basic concepts, like prompt engineering, how to give the right context to an LLM, especially when we went into some of our legacy codebase uncovering, was really important for us to be able to give the LLM upfront some of the context so we could get better results on the other end. Also, teaching them how LLMs work was really important for them to have a better chance of success. Last but not least, we were really clear about where they could and couldn't be using GenAI within Booking.com, and so that made them more creative and less fearful toward experimenting.
Laura Tacho
One of the things that I've always admired about Bruno and his team throughout this whole process is they stayed really curious. They took a multilayered approach with looking at data, doing interviews, talking to developers on the ground to figure out what those barriers were so they could form a really good strategy about overcoming them.
Bruno Passos
We had to start addressing some of those organizational challenges. We started measuring some of the impact of AI, but also sharing really broadly with our leaders and our developers. I'm going to go a little bit more specific on leadership here because it's really important that we understand as leaders not only how to use it, but also what's the impact of what's going on within the organization.
I'm sure you've all been to conferences and talks with your peers where you brag about how many hours saved your developers have had because of AI. It was important to teach leadership that that wasn't the only metric to look at. We started educating leadership not only from a metric perspective, but also getting their hands dirty with some vibe coding and GenAI. That was really, really powerful.
We started centralizing some of the efforts. We're split into several different business units, and what we didn't want to happen is that all of these business units individually were going through the barriers that we discovered in the beginning. So we centralized how we did procurement, legal, compliance, security, so we could fast-track how they used the technology.
Last but not least, as we were evaluating different partners, we started building our framework to be able to evaluate them equally and fairly, so we could really stay true to our values, our GenAI values.
We started reimagining some of that training. We started getting developers to fix real business problems. At that point, it was very easy for them to start coding or vibe coding on a greenfield project. For us, it was really important to train them and get them to start solving a real business problem. We started pairing with them in two-day workshops: one where we taught the basics of GenAI, but in the second one, they brought a business problem that they thought could be possible to fix with GenAI. That was really interesting for us to get the ball rolling.
We adopted a concept from AWS called Experience-Based Accelerators, or EBAs, and that was really powerful. I want to emphasize here that this is because of GenAI, but very little using GenAI. That's one of the positive things I see: GenAI is enabling us to think and work differently from an organizational perspective. The EBAs were three- to five-day experiences where we put as many developers as possible to focus on one business problem.
A little caveat here that is important: if you try to go down that route, also bring the provider that is providing your AI tools. They will be super helpful to bridge the gap that you don't have from a knowledge perspective there. That has been really powerful and that's helping us build that community. We are building a huge community on Slack where the providers are also there. We are open and honest with all our providers that we are trying different providers, and so we mix them into that community as well.
Then leadership by example: once we started grabbing leadership by the hand and actually getting them to vibe code and understand what we were doing, doing BRs only on measuring the impact of AI, they were able to distill the message to the organization really, really well. We saw that adoption started going higher.
Laura Tacho
One of the things that had always been a backbone of the approach that Booking.com has taken is keeping developers and developer experience at the center of everything. Building trust was really important because an early hesitation was concerns about job security or maybe not understanding exactly what the role of AI was playing in the organization.
Throughout this learning and enablement update and this strategy rollout, it was really important that they kept developers and leaders in the same room to talk openly about the real problems that developers were having. Not just about the tool, not just about the technology, not just about the hype or about the expected results, but really what they were experiencing on the ground in their day-to-day work. As much as AI is a huge business lever, it has real impact on the day-to-day work of individual developers, and it was very important to keep that discourse open to build trust with the development community.
Data was a big part of building this trust because, as Bruno said, they were able to show the real results to the communities and to the leaders that mattered and needed to see it.
Bruno's organization and Booking.com in general had already invested so much time in understanding what engineering excellence looked like. They used the DX Core 4 framework that measures across speed, effectiveness, quality, and impact to define engineering performance. With that really strong foundation, they could see how AI was changing all four of those dimensions, and it made it much easier to see the real impact on a business level of what these new tools were bringing to their organization.
On top of that, they needed to have some very specific AI measurement insights about adoption, or about percentage of code being written, or number of pull requests or merge requests being written with the assistance of AI, to really understand the penetration of the tool, who's using it, and who's benefiting the most. With this really strong foundation, and then the AI metrics on top of it, they had a very comprehensive view into exactly what was happening, but then whether or not it was working.
If you don't already understand what matters to your organization when it comes to engineering performance, it is very difficult to move beyond vanity metrics like lines of code generated by AI, or adoption rates that might look really nice on a dashboard but actually don't move the needle for your business at all. Because they had such a strong foundation, it was easy for them to see the impact, and it's a necessary investment to measure the impact of AI.
Bruno Passos
What have been the results and impact so far that we've seen?
One thing that I want to emphasize here is that what we've seen up until now has very little to do with the travel industry and very much from a tech perspective. I think that the lessons here can be applied to every one of you.
We started with people adopting and giving up very, very quickly. Less than 10% of our workforce had adopted GenAI. Today we have north of 75% of our engineers using GenAI, and 65% of those are using those on a daily basis. They start their day opening the IDE and using GenAI and finish their day doing the same, which is awesome to see. We see a strong correlation between the amount of times that they use it and the number of PRs that they push through, or what we call throughput or PR throughput metrics.
Laura Tacho
Not to brag about your results, Bruno, but I get to, which is really nice. The daily AI users are shipping 30% or more merge requests than non-AI users. The way that we understand this is that we can do cohort or comparative analysis. We can look at users who are using AI and users who are not using AI and also break it down based on their adoption level. We can also look at an individual developer and see their behavior and patterns before they adopted AI and then track them throughout the journey and do some really nice comparative analysis, also breaking it down by other attributes like their role, their tenure, where they're located, so that we can really understand who's benefiting the most. Outstanding results with that key metric of PR throughput: 30% boost.
One of the concerns and hesitations a lot of you have, and that I also have, is about code maintainability with AI. We have an increase in throughput, but also users who use AI have a better codebase experience, meaning that it's easier for them to understand and modify code, making code easier to maintain.
Another thing that was important was innovation rate. 80% innovation rate is extremely ambitious. It's beyond the top 10% of performers industry-wide. The users at Booking who are using AI are spending more time on innovation, so this hypothesis and the targeted action toward innovation ratio is paying off: 15.8% higher innovation rate than users who are not using AI.
Finally, quality was also very, very important. The group that is using AI actually has increased quality. They have a lower change fail percentage than the users who are not using AI.
Any one of these numbers taken alone means nothing, and I want to be really clear about that. One of these numbers doesn't mean anything, but when we put together PR throughput plus code maintainability plus innovation rate plus quality, and we see a strong signal from all of them together moving in the same direction, we have some really tremendous results that Bruno and his team are incredibly proud of.
The other thing I want to call out here is this is not an overnight story. This is not a success that happened quickly. If we look back to this graph, this is two years of incredibly hard work, and we don't see these numbers breaking apart until quite recently, honestly. This dark line here is heavy AI users in their PR throughput, and it goes down so satisfyingly: heavy, moderate, light users, no users, and we can see the leverage that these AI users have.
Bruno Passos
The business impact, summarizing here: we believe that there is about 70% improvement in the speed we do our tech modernization. We see a lot of really positive signs on automating some of our tech debt cleanup. Because of the experimentation background that we have, there are a lot of experiment flags and feature flags within the codebase that are dormant, that just got left there. We believe that we can automate a lot of the removal of that tech debt.
We see a lot of enhanced team collaboration. We are collaborating cross-brand, and it's something in my history at Booking I've seen very little. Because of AI, we are able to really change the culture of development within the company and outside of the company, and we're improving developer satisfaction. Through DX, we can see that our developers are 42% happier with the use of GenAI within the company.
I want to go through very quickly some of the lessons learned. Access to the tool does not equal adoption, and it's very important for you to understand why folks are not adopting. Education must be practical, focused on a business problem that will enable developers to understand and move faster. Measurement is foundational. I truly recommend pairing with Laura and her team to see some of the stuff that they're doing. It's incredible. Then think organizational, not individual. The moment you think about the individual, you're forgetting about the bigger picture. For us, that was very clear.
What is left to do? We have not touched the SDLC. We've picked very sporadic examples to go and try GenAI, and we believe there's now a real gap in the SDLC. What if today's code is no longer tomorrow's legacy? That's something that we are looking into. Our target is still true: the 80% that our developers should be focusing on innovation rather than fixing things that they shouldn't be fixing.
Laura Tacho
One of the things that we both love about this community is everyone's willingness to help each other out. If you have built agentic workflows to reduce toil in a legacy codebase, if you're dealing with things like feature flag cleanup, context building is a big area of interest, automated refactoring, we would love to hear from you. Bruno and I are going to be at the DX booth at the first break. I'm also doing a workshop tomorrow at 2:30 PM about measuring AI that's in Royal 13. I'd love to see you there. Otherwise, please connect with us on the internet, and we'll see you around. Thank you.