Anyone Can Cook Anywhere: DORA's Recipe for Improvement in Software Delivery Performance

Log in to watch

Las Vegas 2022

Download slides

Anyone Can Cook Anywhere: DORA's Recipe for Improvement in Software Delivery Performance

Eileen Flood

Associate Product Owner (Consumer Experience) · Ford Motor Credit

Jeremy Karevich

Full Stack Engineer · Ford Motor Credit

This is a case study of how a small high-trust team of rookies in a large enterprise bucked the conventional wisdom of release trains, long-lived feature branches, and pull requests. They instead followed the DORA recommendations to gain radical improvements to their software delivery performance and bottom-line impacts over only 2 consecutive quarters.

Chapters

Full transcript

The complete talk, organized by section.

Eileen Flood and Jeremy Karevich

Eileen Flood: Well, hello, everyone. My name is Eileen Flood, and I'm the associate product owner on the Personal Lifetime Communications team at Ford Credit.

Jeremy Karevich: And I'm Jeremy. I'm the full-stack software engineer on that PLC team.

Eileen Flood: And so welcome to our presentation, "Anyone Can Cook Anywhere: DORA's Recipe for Improvement in Software Delivery Performance."

In our presentation today, we're going to walk you through our team's DevOps transformation in the context of DORA's Four Keys and 24 capabilities. We're going to go over where we were before, the themes we focused on, the practices we implemented and how they related to the capabilities, and, of course, our metrics and results that we saw throughout our transformation.

What we aim for you to walk away with today is that by adhering to the DORA recommendations, even a team new to DevOps can increase velocity, improve reliability, and build a culture of shared responsibility and ownership.

So, who are we? Ford Credit is the financial services arm of Ford Motor Company. Predominantly, we're focused on financing Ford and Lincoln vehicles and supporting Ford and Lincoln dealerships.

Within that, the Personal Lifetime Communications team, who we have pictured here, PLC for short, focuses on web infrastructure that improves the experience of a Ford Credit customer throughout the lifetime of their contract, providing the right information at just the right time for our customers.

In technical terms, this consists primarily of building and maintaining the main account information page where current lease and retail customers can view information about their loan or lease, but also a variety of web components that serve similar or related functions for users.

To point out a few of our features, we have the mileage calculator for lease customers in the top left. It's our most popular feature, and it addresses customers who say their major problem as a lease customer is mileage anxiety. So asking that question, am I driving my car so much that I'm going to go over my allotment and be way over at the end of my contract, or can I drive a bit more and not go over the limits?

And so it's definitely a major problem, and our mileage calculator directly addresses that problem by using live modem data to give customers the latest update on where they are within their mileage.

And then in the top right corner, we have a digital contract, which is a bit more straightforward. A retail or lease customer can pull and review the details of their contract at any given time.

Jeremy Karevich: So this is a DevOps transformation. Where did we start? Well, the thing in red there is that we were delivering monthly. We were working through pull requests and code reviews. If something went wrong, either a bug or a defect came up, we would need a fair bit of time to recontextualize what was actually in prod. Even if we knew the code was the same across environments, we would often need a bit of time.

We knew we were missing out on the agility to quickly receive and react to feedback that would come with better-integrated DevOps. We also had to recontextualize whenever we were working in a section of code that we didn't have strong collective code ownership over, which happened because we were working in silos a little bit. We were pairing, but we had a tendency to stay in separate sections of the code after initial delegation. This was exacerbated somewhat by large gaps in experience level at various points in the course of this transformation.

Communication would often incidentally happen in direct messages, or we wouldn't necessarily seek help because we assumed that the time we would need to bring people up into the context of what we were working on would not be worth the time investment. And lastly, and this is more in context with where our heads are at now, but we did not actively strive for adaptivity, particularly. We were happy to change when it was obvious or handed down, but we hadn't ingrained a culture of regularly being forward with potential improvements.

So this leads into some of the preliminary concepts and models we were thinking about and learning about, especially going into this transformation. I'm sure these will be familiar to many of you, but they were new to us at the time.

We're in this place where we're delivering monthly, we know we can do better, and we have some inclination of the improvements that we want to make and the DevOps practices we want to implement. But we also were fortunate to have the benefit of our agile coach, Wil Pannell, who was really proselytizing a lot of source material, a lot of what I've come to realize is the DevOps canon that drove the why behind the instincts for DevOps improvement that we were building, and that reinforced why and how we should work on changing our practices.

The first of these was cockpit resource management, or crew resource management. Ron Westrum talked about it in his talk this morning, but it's the idea that in environments, industries, professions where errors are very dangerous or even fatal, such as an airplane, focusing on cognitive and interpersonal skills rather than technical skills is really effective at increasing safety. So particularly focusing in on practicing situational awareness, problem solving, and communication skills is really effective at increasing safety, problem solving, and decision making.

This works because it defines the mistakes that can happen when someone in a position of higher authority is de facto assumed to always make a correct decision, the sort of decision that we associate with power-oriented-style organizations, pathological organizations in those Westrum typology models.

And so the second really major framework that we were learning about and came to understand was those Westrum typology models. We never felt like we were in a pathological organization, but certainly we inherited some bureaucratic tendencies on that chart, if you've seen the breakdown at a team level, just from working in a large organization.

And so, with the framework of the different types, we really honed in to keep our eye on the prize in terms of what it means to have and maintain a generative culture: to collaborate and bridge actively, have failures be nothing but learning opportunities, build psychological safety, focus on outcome, and make novel ideas about how we work come to the forefront and be actively engaged with and benefited from.

And then finally, also complex adaptive systems. Our coach really liked telling the NUMMI car plant story. Again, I'm sure many of you know it. If any of you don't, basically GM fires a bunch of folks from an underperforming plant because they were putting bottles in car doors so they would make noise and otherwise having a lot of problems in cars that come off the line. But then what they do is they hire back the same folks later on that they let go, under an overhauled system in partnership with Toyota.

The line adjusts to problems that occur as cars go down it. The system of work is such that when something goes wrong or breaks, it's reflected on, and then the system is changed or hardened so it won't break again. And it works. The line, with the same people working at it, is much more productive, and it drives home this point that when it comes to performance, the system predicts success more than the people in the system. And so the best systems are the ones where those who have the best sense of how the system works or doesn't are the ones who have the most power to inform and change and adapt it.

If you haven't heard this story, This American Life has a great version of it. I know I personally heard it from a talk from Jez Humble. So there's lots of great versions around.

All three of these concepts that we learned about have one sort of obvious common thread, and that's that they focus on culture, but particularly they focus on culture that actively encourages change. Improving our performance is inherently related to changing our culture, but such that we don't change our practices once, but continuously.

So we're building, if this sort of becomes a feedback loop, that our culture affects how we determine our practices, but there's also this "we are what we do" identity-comes-from-habit reality. Any time you're building habit, you're working with feedback loops. And so we keep informing that feedback loop in our practices and our culture through the lens of transformational leadership, this big part of DORA, particularly. We really like the breakdown in the State of DevOps Report from 2017. That's where that icon in the middle comes from. It's really sustained and helpful.

And so we key on that: supportive communication, leadership, stimulation, recognition, those leadership principles, to push ourselves in a more generative direction and inform those feedback loops. And lastly, you can't improve what you don't measure. I think we're all big metrics fans.

So how do we measure our progress? Well, this is a DORA presentation, so it's the DORA Four Keys framework. These are four metrics that identify to be high predictors of organizational performance: deployment frequency, lead time for changes, or the time from a commit to the one that's in prod, change failure rate, and then mean time to restore services after an incident. Although if any of you attended Courtney Nash's presentation, maybe not that last one, which honestly is great for us because I don't have a pretty graph to show you about it anyway.

What we really did though, while we were delivering monthly, was focus on speed and deployment frequency. We knew and we'd read about the importance of deployment frequency as a key metric, and we also knew that stability-minded DevOps integration of those capabilities related to it would really be more potent once we had the agility to move and react quickly and better infrastructure in place that we could start integrating checks and monitoring into. So with that in mind, how do we speed up?

DORA capabilities. So we started with the 24 key capabilities that are in the Accelerate book, but especially in the process of building this presentation later on, I came to really like the DevOps Research dashboard breakdown of it. It covers the same content. I think it's in 27 capabilities, but it's really helpful in informing what changes you can make and where you should be striving to be with regard to the capabilities and why they're going to affect your software delivery performance and then ultimately your organizational performance.

And so, because there's a lot of these, what we started doing is we started coming up with the practices that we were going to implement and keeping in mind the context of which DORA capabilities were informing those practices.

For instance, the first thing we did is we started mob programming. We were pairing, as I said, but we were sort of siloed, and we started moving because we had a lot of new team members join our team throughout the latter half of 2021. In fact, Eileen and I are two of those members particularly. We had a lot of new members without a lot of experience, and we had a lot of technologies and practices that the team was actively trying to upskill on at the same time as those new people were coming in.

Our coach affectionately called the new team the Tiger Team. Both the Tiger Team and the more experienced developers on the team would mob as a way to pull everyone up, get us out of task-specific silos, get everyone on the same page with the architecture we were working with, get lots of simultaneous eyes on the code, start building collective ownership as we're getting away from the silos, and especially, almost above all else, upskill.

We were at this point where we were doing a ton of upskilling. We made sure that we weren't just rotating drivers, but we were rotating navigators. So every five, seven minutes, every five minutes, we would change who was coding, but we would also change who was telling them what to do. Sometimes this doesn't work if you have a domain expert, but what we were doing was naturally bringing out the friction in what people didn't know.

Sometimes I think in evolved situations you use mob programming to tackle difficult architectural builds or something like that, but we were using it in this case to bring out friction in the skills that people didn't know within the team. So if you're following the principle that if you really understand something you can teach it, we were also building a space that was safe to fail in terms of ignorance and encouraging a generative environment where opinions were brought forward fast and really strongly supporting learning so we can get people up as fast as possible.

It also helped us pretty naturally start working on the integration of trunk-based development. We were pretty much all in the same room or two rooms, so we weren't really working around each other in branches, and we slowly began migrating all of the projects in which we were mobbing to be trunk-based, relying on our mobs and pairs for synchronous eyes on code and eschewing code review.

We made sure we had a way to commit in mob so that we were making small commits and avoiding merge headaches. In general, all the while, we're working on beefing up our continuous integration so that we can actually take these commits and get them to verified builds that we trust, and we'll talk through more of that complementary CI work throughout our presentation.

Eileen Flood: Our team spends the last 15 minutes of every day reflecting. We want to foster a culture where anyone can surface problems at any time, stemming from what Nicole Forsgren has said previously, and it allows our team to identify what worked today that we want to acknowledge, that we want to continue, so pluses, and what we want to change, so deltas.

Really, to sum it up, it's let failure or success lead to inquiry. So why did that work so well? Asking why do we think that didn't work? Rather than waiting for biweekly retros and potentially forgetting over the course of that time, we could more actively incorporate them into improving our system of work.

So something that's branched directly from our daily reflections are smaller practices that our team has experimented with. We want the team itself to feel empowered to implement new practices that work for them, develop that sense of team ownership.

Right here, some items that have directly come from daily reflections: using a Pomodoro timer, experimenting with mob size, open door policy. Our team works throughout the day on the same WebEx call, and our different mobs are in different breakout rooms. So it allows the different mobs to pop into each other's rooms, ask questions. And even those that aren't on our team, we encourage you to just pop in so that we can talk it out.

And then lastly, dedicating time for housekeeping items so they don't fall by the wayside.

This one's a bit more straightforward, but our organization puts aside time specifically to allow the team to pursue learnings. In our case, it's Friday afternoon, so they can share with teammates what they learned on Monday, and safe to say this one directly addresses the support learning capability.

Jeremy Karevich: Test-driven development is one of those things on which we were really upskilling strongly. As I said, we were bringing in some team members who were sort of new to some of the practices that we were doing or less experienced, and we really focused on skilling up that red-green-refactor so that as we were continuing to build out greenfield projects, but also going back and working on our existing code bases, we were building up the testing capacity where it was missing and making sure it was fully there in our new project.

So as we were migrating to trunk-based development, we're really able to start relying on CI/CD to ensure that we're going to get to that place of having those builds. So red-green-refactor, we have skilled on a ton. We instituted commit hooks across the board, making sure to find any place where they're missing, so that now you can't commit without your tests passing.

We add contract testing so that we're not just testing up to the network edge, but over the network edge, and we further bolster our integration tests with service virtualization. We're still actively working on using it to improve our testing rigor, and all this in support of that CI/CD.

Eileen Flood: Every two weeks our team presents what we've been working on to our branch of the Ford Credit product org, and having it every two weeks pushes us to have new working software each and every time. It keeps us accountable to always ask what's next and keeps our eyes on the value stream. Our goal for this year, 2022, was to make sure that we had something to demo all 26 biweekly demos this year, and so far we've been successful in that endeavor.

Jeremy Karevich: Simplifying pipelines is something we did. At one point we had to migrate our Jenkins, and we seized that opportunity as a big push to drastically simplify our pipelines in accordance with all of the CI/CD work we're doing. We deduplicated them across environments, went from somewhere in the 25-30 range down to 12, one per project.

The pipelines spin up on commit, which are happening much more frequently now that we're using trunk-based development. They now go through to both environments. We're using feature flags for the first time, and we're preventing unfinished features from being exposed in prod, but our pipeline is able to go through to both.

The feature flag is also allowing very fast incident response, so new features can be turned off at the flip of a switch if something actually does go wrong, and we can be investigated and deal with fewer monitoring in lower environments. We even experimented with continuous deployment, where when you committed it was going straight through to prod, guarded by that feature flag. We pulled back from that for now while we're working through different pieces of our pipeline infrastructure, but we're still at that point of having continuous delivery, where every commit, assuming it makes it through the pipeline green, results in a deployable build that's ready to go. And that was really big for us coming from where we were.

We also started working in web components in a foray into establishing a more loosely coupled cross-team architecture, as well as more cross-team collaboration, which are big DORA points particularly. We actually piloted a lot of the new dev practices that we were doing by first doing them in a greenfield web component project.

We've packaged up some of the most-used functionality of our homepage, and we set it up in such a way that we were empowered to choose our tools. That was the mileage calculator, and then we got to choose our framework that we were using to implement it. We used feature flags and experimentation software for the first time in this project. We used a new CSS framework, and we did a ton of upskilling, and we built self-contained web components that were deployable to multiple other channels.

We also instituted Kanban norms. This is not a picture of our old Kanban board, but it might as well have been. The main point with the Kanban was that we were not using it basically as an illustration of the actual state status of work in a stream. We had cards, but if it turned out that the scope of those cards was mismatched or uneven, we would just block the card where it was while we waited for whatever it was that we were stuck on before we could continue work.

This is not an accurate reflection of the state of the work that was being done, because that card didn't actually have anything that was being done on it. So it's messing up our metrics, but also it meant that when we looked at our board it was not helping us see the state of work going through this stream.

So we changed it so that the cards now are almost unilaterally something that we think we can get done that day. They're much smaller. They go through the work and accurately reflect the status of the work that goes through the stream. And on the occasions that we do have something that gets blocked while it's in progress, often we will change the name of that card to reflect the work that we did and create a new blocked card in the backlog, and that card goes through and accurately reflects basically what is the team doing? What is the team actively working on?

We also instituted work-in-progress limits. So if we're working in two mobs on a given day, we would have, say, two cards in progress. It's not always one-to-one, but it's usually never more than two-to-one. So we try to keep these work-in-progress limits.

Eileen Flood: Our team strives to maintain a customer-first mindset. We're consistently reviewing customer behavior and feedback when we're determining our priorities and next steps, whether it's click-through data via Adobe Analytics, reviewing customer-service feedback in Medallia, or contrasting customer behavior in A/B tests using Optimizely.

And it's not just me, the PO, sitting there reviewing all these on my own. We review these as a team, so that devs can have insight on the impact of the features they're working so diligently on and have input on the direction we want to take it based off of what we're looking at.

With every launch, we make sure we have a plan in place to have a pulse on the customer. In an instance recently, which I'll speak to a bit more later, we were able to immediately get customer feedback upon launching, and our team was able to deploy the necessary changes to production the very next day. This was definitely a point of pride for us because we're coming from a point where we were deploying monthly.

We strive for transparency between all the branches of our team. For example, we want to have dev presence when UX is creating a design so that they can speak to the technical viability and vice versa. We would like to have UX pop into our breakout rooms so that they can speak to how and why they need something to behave a certain way.

Next up, everyone has full rights in non-prod, risks are shared, and then lastly, opportunities for recognition. Every morning in our stand-ups, we have failures to celebrate and shout-outs, and then in the biweekly demos that we mentioned earlier, we have a section for bragging points.

Jeremy Karevich: Now that we've covered all the practices that we've implemented, here's a quick visual of how many of the 24 DORA capabilities we improved upon since we started. And as you can see, all of these practices have really been able to make an impact on a majority of the capabilities.

But there were a couple that were still gray. So what's left? Well, stability. A lot of more stability-minded DORA capabilities are the ones that our team is actively working on incorporating now, things like shifting security left. We established a security gate early in our pipeline. Test data management, practicing failure notification, monitoring systems to inform business decisions. These are the DevOps improvements our team currently has their mind on, but we're glad to be in a place now where we have the deployment agility to really empower these infrastructural stability efforts that we're working on to have really effective, impactful change on our products.

So results. In Q2 2021, the earlier slide, we were deploying once a day, once a month, sorry. And by Q4 2021, so just two quarters later, we were deploying daily. And our median lead time to change also increased, or I guess the time decreased dramatically, but our CI/CD was empowering our changes to go through to prod much faster.

Our throughput over the entire course of 2021 trended up, our cycle time trended down, and though it can't continue forever, since you can't finish a card you didn't start, over the entire course of 2021 our cards finished versus started trended up.

We also over this time measured our work distribution across some basic categories. In Q2 2021, we were doing three-quarters compliance work according to the metrics that we tracked. By Q4 2021, we were doing significantly more feature work. So the devs feel like the work that they're doing is more meaningful, it's going through to value, and it also means that we're accomplishing more.

We're also doing more spike and rework, so we're learning a lot more. We understand the concept of a spike more intimately. We're ready to be generative with creating those ideas and then being able to throw them away. We're tracking our incidents a little bit more closely. I'm sure we probably had some in Q2 2021 that maybe just didn't show up in those metrics.

And then lastly, that compliance piece of the puzzle: we're still doing almost all the same compliance tasks. They're just integrated into our DevOps more effectively and happening automatically. We're no longer spending as much time on them.

Eileen Flood: So what about the features that we delivered in this time? We have the mileage calculator widget, which you've seen appear as part of the visual set our entire presentation, and that's because this was the first feature we implemented all these practices with.

This was taking the feature on PLC that these customers love so much and making it portable thanks to our practice of building as a web component. As a result, it's built in a way that can be used in any other digital platform, in any other part of Ford that wants to implement it to further along that customer value.

The other feature, state-to-state, we can speak to the impact even more. State-to-state self-service basically was addressing when a customer moves to a new state. They have the slew of documents that they need to request from us to bring to the DMV, and that usually requires multiple calls to our centers to figure out what they are going to request and to make the request itself.

In the first four months since we've launched, it's gone from zero to 72% of all state-to-state requests being conducted through our self-service. And when customers asked if this was helpful or not, 93% said yes. This was a feature that I mentioned earlier that we got feedback from customers the same day, and then the following day we were able to implement changes to production.

So now, taking a look at our overall product, the Personal Lifetime Communications page. All this work, all these great features that we delivered, how has it impacted our team metric for the product? Well, since the end of Q4 2021, it stayed consistently over 80%, just like all of our other metrics improved.

Jeremy Karevich: So to finish, it's not just about the data. Our biggest success, in our eyes, is that we really did arrive at a more generative culture. Our team has better confidence in the quality of our code, a greater sense of ownership in terms of code. Anyone feels comfortable making changes at any part of code. And in terms of team agency, they feel safe to fail, and what we've really been striving for, our team feels empowered to change and adapt.

Eileen Flood: So these are LinkedIn codes. Come find us after if you have any questions or reach out to us on LinkedIn, since we are up on time. But thank you all very much for coming today.