Log in to watch

Log in or create a free account to watch this video.

Log in
Las Vegas 2025
Share

Moving to Meaningful Metrics: Measuring our Digital Transformation

To truly drive digital transformation and achieve organizational alignment, we moved beyond vanity metrics to embrace a system of meaningful, objective measurements on system health and organizational maturity.

Chapters

Full transcript

The complete talk, organized by section.

Nick Krosschell

So I'm Nick Krosschell. I am the VP of Enterprise Applications Engineering at WEX. I've been doing this role for a number of years, have come through the ranks and know the engineering plus management plus the architectural aspect of things. And then this is my colleague Haley.

Haley Reddington

Hi, everybody. Good morning. There's a podcast I listen to that starts with, "What gives you the right to be here?" And what today gives me the right is I am lucky enough to head up our technology innovation and strategic operations group for WEX. And if you've never heard of WEX, we'll give you a little bit more context in a second.

We are accountable for a variety of different components of our tech transformation over the last three to five years. I've been lucky enough to help build out groups who have been focused on things like flow engineering and value stream architecture, and a lot of the metrics and instrumentation work we're about to walk through today is a direct reflection of the teams and the contributions there. So, super excited to share more.

But first, what is WEX? If anybody has been in Portland, Maine, we are one of the few companies headquartered there, and that's partially because we started as a relatively small fuel card company about 40 years ago. And since then, we have transformed into a global commerce platform that simplifies complex payments ecosystems for customers across the globe in over 200 countries. We primarily focus on solutions in the fleet, benefits, and corporate payments space.

We've definitely experienced a lot of growing pains in some of those transformations, especially in the last 10 years, where we have really taken on a lot of innovation, a lot of changes in our leadership team, expanded into new industries, and some of that growth has been through acquisition, which introduces challenges like tech debt.

But before we go too deep into some of the challenges that we were trying to solve and how we got there, I'm going to borrow a book from one of my favorite English professors, and we're going to tell you what we're going to tell you before we tell you, and talk a little bit about what we're going to highlight for you guys today.

In my time at WEX, we have undergone a lot of different versions of value stream architecture. We had a scrum transformation five or six years ago. And ultimately, I'm very, very proud of the work that we did to better instrument and understand and document even what our value streams were, the flow of work through those.

But ultimately, where we kind of fell short up until about a year ago was in translating that into actual insights and building it into governance and process. Ultimately, that meant that how we were applying our measurements was a little bit fragmented. It wasn't the full story. And even flow metrics themselves were not able to tackle challenges of how quickly we're moving with the advent of technology like AI. And ultimately, a lot of it was retroactive. So it still felt a little bit like a scorecard or a report card, not something that we are necessarily applying in a forward-facing way.

So we'll talk to you a little bit about what we did to solve those challenges in a second. But let me tell you where we ended up, where we are today. Ultimately, we're really, really proud of how we're able to transform that conversation from output to outcome, as so many of us have talked about over the last couple of days.

We also did that by translating very complex, maybe a little bit niche depending on which group we were working with, metrics into a universal language that could be understood whether you're a developer working on a specific feature all the way up to the board. And then ultimately, we've applied it at every layer of our governance and are starting to use it in a diagnostic way. Some of the case studies that we'll walk through are examples of how we started really small and built on it to the point we're now able to measure AI in our products.

Nick Krosschell

So I'm going to tell you a little bit more about WEX, mostly about how we grew as a company. There's a lot of mergers and acquisitions. This means we have lots of software that did similar things that were built by different groups. They have different cultures, they have different technology stacks, they have different ways of measuring things.

And so this created an ecosystem where there's an awful lot of opinions. It was really hard to get a view of how is this business team doing compared to this business team compared to this business team. Haley and I are both in the center of the technology organization, and we're uniquely positioned to be able to solve this problem.

We were tasked with our new CTO to come up with a way that we can create a single dashboard that helps us understand what's most important for us to work on. This is across a fair number of engineers: 2,100 engineers or so and 300 scrum teams. So the scope isn't enormous, but it's big enough that there's a lot of opportunity for fragmentation.

So how is it that we took this landscape and its challenges and came up with data to help us get some clarity? Clearly we needed to start with the foundation of fundamentals that are necessary for every business. DORA is table stakes. We recognize this. We've been tracking DORA for a very long time.

What we find, though, is that this focused on individual components. It focused on individual teams, and it was fantastic at getting us understanding of how those teams are doing. But we felt like we needed something a little bit higher that helped us understand what these larger products are doing.

We needed something at a higher altitude that would unify what was being told, that all of the teams, both at the executive level and at the management level and at the individual contributor level, could understand, rather than just each individual team focusing on their own progress. And so what we came up with, one of the measurements that we wanted to share with the entire organization, we call it product innovation velocity. It is at that higher level, at the line-of-business level, helping us share that information across a broader group.

So we knew how fast we could deliver, but was it the right initiative? That is the difference between the DORA metrics and the product innovation velocity.

There were some principles that we needed to follow that we established and followed as we came up with what measurements we wanted to include. We wanted, as mentioned before, DORA as an industry standard. There were other industry standards that we took into account and anchored in those industry standards as we established these measurements.

We also wanted to make sure that these were simple. They were things that could be easily understood, could be easily measured. They are not perfect in our estimation. It was very important for us to come up with vectors or measurements that don't have to be perfect. I don't want to be mired down in the details of every aspect, but directionally it can point us in the correct direction of how do you get better.

These need to be universal. These needed to be things that apply to each and every part of business so that when we're measuring in one place, we can measure it and apply that same measurement across other lines of business as well. And that goes for consistency as well.

So these are the measures that we came up with. I think that more important than the measures that we came up with is what I just mentioned, what the principles were. These are the ones that are right for us. One of the ones in here is AI maturity. We think that's important because of where we are today. If we had come up with these measurements two years ago, three years ago, that wouldn't have shown up. And maybe there was something else that would have been put in place.

I would also say that as we're measuring these, the product innovation velocity, maturity, reliability, and security, they are not set in stone. And we reevaluate these on a regular basis and reserve the right to adjust how they're measured to make sure that we are meeting our current needs. So these are the measurements that we came up with. Feel free to steal them or create your own.

Haley Reddington

So one of the functions of my role, and I'm sure folks in the room have some empathy for this, we have a relatively large technology organization of over 2,000 folks. They are dispersed globally. We have multiple lines of business with multiple products. We have commercial partners, we have business partners.

And we were trying to, for I think pretty much the first time at WEX, translate something that had previously been thought of as a technology thing, a technology problem, a technology thing to measure, into something that was universally applicable for our FinTech company. And that meant getting a lot of clear alignment, clear consensus, starting very first with the definitions that Nick just laid out.

Once we were pretty confident from both our own internal research as well as benchmarking it with some industry experts, consultants, that these were the areas we wanted to tackle, we then said, okay, who, how, what, when, where, why, how are we going to decompose this into something that we can measure and operationalize?

And that required partnership with a variety of different groups across WEX depending on what the vector was. So for something like reliability, how did we leverage the work of our SRE group, the information they were already capturing in tools like Datadog? How did we then weight it based on the importance that we were seeing of certain things we wanted to influence in our org?

For something like AI, how did we leverage everything from our understanding of how we were leveraging AI in our SDLC, to the complexity of how we are applying AI, how we're using generative AI to improve our customer experience, to how we are using AI in our products, all into a single score that would make sense to anybody?

Ultimately, it required a lot of working sessions. We sat down with owners in each space. We defined, we tested, we brought to multiple altitudes of the organization these measures. We compared our scores against what we considered a baseline of what was good in the industry, where we were today. We refined, we tried again, and ultimately came up with a single visual like what you see on the right hand of the screen.

That is something that we use in every altitude of our governance, whether it is locally at the portfolio level, whether it is up to our QPRs, our board-level presentations. And how you use a simplified visual like this is obviously a different altitude than how you use other data, but it can help drive really interesting conversations and insights. We'll get into a little bit of that in a second.

The teal bar you see is, I think we called it Product Y. If you see really high product IV scores, really high severity scores, low reliability scores, what does that mean about your quality? Why hasn't it impacted the other vectors? These are the kinds of questions that it drives.

And actually, I had one final thought before I moved on. I think one of the things I haven't hit on that is probably one of the most important things that we learned in this journey is how important it is to goal not just your technology organization on the outcomes of improving these vectors, but pretty much every ELT member and then cascade that through the rest of the organization.

So for 2025, our CTO had goals related to improving reliability, product IV, SA maturity, all of these vectors by a certain percentage. And he shared those goals with his counterparts in digital and commercial. And then ultimately, those goals were cascaded down to every level of the organization, which means that reliability is everybody's problem, accelerating our product innovation velocity and delivering meaningful outcome for customers is everybody's problem. And it really made it something that our performance management, our compensation is all tied to, and that gives it traction and stickiness.

So we're going to walk through an example of how we would apply this in process. And we're examining these scores on a regular basis. Even though they are a slightly higher altitude, they can still drive some decision making. One of them is the cross section of our AI maturity and our product IV scores.

This is a series of examples, and it starts at the smallest level of how are we applying AI in our SDLC or not, and what does that insight mean as far as the action we take? Once we build that momentum, how do we apply it to our internal products? Excuse me. And then ultimately, how do you apply it to your external products? So Nick, do you want to kick us off?

Nick Krosschell

Yeah, absolutely. Our strategy here was, when we knew that AI would be a useful component in our tool belt, we thought, how do we take this small? We knew that we were immature because of the measurement that we had. So how do we take this small and build upon it?

We found a project. It was a UI rewrite that is a perfect opportunity for an agent AI to help out with. And so we did a pilot of that. Let one person go away, take a stab at this, see if they can make it work. And turns out that they did, and in a relatively short period of time. We're talking weeks instead of what was originally thought of as was going to take months.

This is a story you've heard over and over again, but the important part of this was this went into production. So we knew that it was successful. It increased our vector measurement for both product innovation velocity and for AI. And so we went to the executive team and said, hey, this is working. Can we do a bigger project? So we did.

And the second case that we did is we knew we have our own home-built platform as a service. And there were lots of questions going on about how do I apply this? What do I do when I get this error? And this was all going to internal team members, and that took an awful lot of time.

And we thought, all right, well, we've got a knowledge base. Why don't we create a chat interface that allows people to interface with that knowledge base in a different way? And we can then build upon that. We were very successful in answering an awful lot of these questions. Again, lower risk, but bigger, fairly low risk but fairly big project. The investment was greater, and we showed that it paid off. And so that gave us the ability to go to the executive team and say, hey, I think we can do something bigger.

Haley Reddington

And at this point, we've talked a lot about how metrics can be an input into a decision, but I also think they can help celebrate success. And this particular case study is an example of both.

I don't know if anybody saw Jennifer and Prashant, who were also WEXers, talk about Claims AI yesterday, so I'll try to do it justice. We understood heading into 2025 that in general, our AI scores were relatively low in most of our various lines of business. We were coming off a great year of AI adoption in our accelerators program. We were starting to apply it into our process and products. And that was where we were starting.

We also were continuing to improve our ability and the volume of what we were deploying. We had introduced AI agents to help audit our work, so making sure that each of the epics that we were working on represented some sort of customer value. And that was starting to ensure that as we increased our product innovation velocity, we were also working on the right things.

But at the intersection of that, we had this problem in our benefits space where the claims reimbursement process for our flexible spending accounts just wasn't a delightful experience. It was pretty manual. Customers would upload receipts. It would take two-ish business days for them to get feedback on whether or not their claim had been accepted. And it just seemed so ripe to try to up the ante on how we were using AI in our products to solve that for them, as well as as we were building some of that capability using AI to do so.

Ultimately, the short story is we were able to reduce the time to get approved down from two days to two minutes, I think, with 97% success rate. And it's just a much better experience. And not only was this something that our data supported us doing and prioritizing, probably a no brainer, but it also then was reflected in the scores after the fact. We were able to bring an improved product IV, an improved AI score to our leadership team as one of the many ways we celebrated this being released.

Ultimately, that is one of the value propositions of this type of measurement. It is that we are creating a level of confidence and trust in our technology and product orgs at every altitude of the organization. This is a real quote that I did write down verbatim in a board meeting about a month ago from our CEO, about how she just has never had this level of transparency into how technology is performing, how we are owning the places we still want to improve, how we are celebrating the places that we have partnered to deliver outcomes. And it, again, goes to how we're starting to see all of these measurements show up, not just within the technology part of our enterprise, but more broadly.

There we go. Yeah. But as with all things, not without its learnings. So, Nick, do you want to share some of your thoughts?

Nick Krosschell

So I had mentioned before that we're not done with this journey. We have these measurements. We use them on a daily basis. It's used at many different altitudes. The executive team is looking at that on a daily basis. This means our product team is looking at it. It means our engineering teams are looking at it. It means our software engineers are looking at it.

We mentioned an awful lot here about AI because that is what's in the water right now, and so I feel like I have to talk about it, but there are many other use cases as well. We have used these measurements in order to improve our reliability and as well as our security and the other metrics as well, concerted focused efforts in order to improve those.

However, we know that the world is always changing, and so we reserve the right to adjust that. And as we are looking at how we should refine these, we continue to look at at least three things: clear limited set of vectors. We don't want to have 300 different vectors, or even 10. Five feels like enough for us to get our teeth into.

Second, that stakeholder engagement, making sure that it's at every single level. And third, delivering practical, visible improvements over time, and making sure that that resonates with both the teams and our teams.

Haley Reddington

The one thing I'd add is these could feel very directional, especially for folks who may be more used to something that's a little bit more, you know, as Nick said, we built on DORA, and there is just so much value in having a deeper level of insight or a metric that hasn't been aggregated. And so we very much want to support the idea that this is complementary. It does not replace those insights. It is simply a way for it to reach a broader audience.

And it really does allow us to step away from a habit that sometimes at least we had, I don't want to speak for anybody else, of spending so much time interrogating the data that we missed what it was trying to tell us. So when you have something that's aggregated at that level, it helps elevate the conversation as well.

And I think just to build on Nick's point, this is something that we've done in the last year, year and a half. So we are still very much making sure that the vectors are the right ones. We've toyed with introducing new ones. We've toyed with taking ones away. I think it's something we'll definitely examine heading into next year, and would be super interested probably from any insights we take away from this week, if there's other things that we can and should be focusing on, or even how we compose our own vector scores in things like AI being very top of mind.

And then Nick, this next one is very close to your heart.

Nick Krosschell

How do we measure architecture maturity? It's such a complex topic. How do you boil that down into a single number that is easily represented and consistent across all lines of business? I think that's probably one of the hardest vectors that we've got. But would be very interested in talking to any of you who have ideas on that and collaborating because that is a story that will never be perfect.

Haley Reddington

Yeah, and I think those are all of our thoughts. Thank you guys for letting us share them with you. Hopefully some of this resonated with journeys that you're also on, and how you're using metrics to drive meaningful conversations. Thank you.