A Platform Value Model for IDPs

Log in to watch

Las Vegas 2023

Download slides

A Platform Value Model for IDPs

Ajay Chankramath

Head of Platform Engineering · ThoughtWorks

Sridhar Kotagiri

Platform Product Principal · ThoughtWorks

This session will introduce the concept of Platform Value Model (PVM). As organizations embark on their DevOps journey through building technical capability platforms - Delivery Infrastructure (DI) Platforms or internal development platforms (IDPs) - one of the biggest challenges is to follow a value based prioritization, figure out the priorities and articulate tangible value of building these platform capabilities to your stakeholders - leadership, product management, finance and most importantly your internal clients.In this talk, we will be unveiling a model that can be used by organizations both small and large to figure out measurable return on investment for your technical platform products and capabilities. We will be using a real-life use case to demonstrate how to forecast the value of each of the capabilities. We will further demonstrate how to measure the actual value against the forecasted projections.Specifically we will focus on:

1. The ability to model costs and value of your platform engineering investments over a timeline

2. Assess for optimistic and pessimistic scenarios

3. How to accommodate changes during execution

4. Ability to model realized vs projected value

5. Accommodate various success metrics including Flow metrics and DORA.

Chapters

Full transcript

The complete talk, organized by section.

Ajay Chankramath

Alright. So this conference has been fantastic from a platform's point of view, right? I know that we had some really interesting conversations on the platforms. We had U.S. Bank talking about the platform implementation, I think just a couple of days back. Then we had ANZ talking about their platform implementation. We also had several other conversations around platform.

So what we wanted to do today was something about making sure that when you're building a platform, that is a little bit more quantifiable from the point of view of value. What can the values be that you can bring in, right? That's essentially the question that we want to ask.

So I am Ajay Chankramath. I head the platform engineering team at Thoughtworks.

Sridhar Kotagiri

I'm Sridhar Kotagiri. I'm the platform product principal at Thoughtworks.

Ajay Chankramath

So as you think about the platforms here, one thing that we basically heard from Paul and Courtney, I guess, yesterday morning was, "Let's really start thinking about more effectiveness and less about efficiency," because we are all trying to be more efficient. We are doing a lot of things that are efficiency-based. So thinking about it from an effectiveness point of view is what we are really going to focus on.

Before we actually get into our topic, I think it'll be probably good for you to know what Thoughtworks is. I'm assuming that most of you have heard of Thoughtworks. If you haven't, might be good for you to just take a quick look at this.

Some of the things that you might have heard about Thoughtworks: the Agile Manifesto sort of created out of some of the work that we did. We have several signatories on the Agile Manifesto. You have probably heard of the first CI/CD server, CruiseControl, came out of Thoughtworks. You have probably heard of Continuous Delivery. I'm sure all of us do. Jez Humble was part of Thoughtworks when he wrote it.

So we have microservices. I know I had somebody here, I believe the SBS team, talking about microservices. Building Microservices by Sam Newman came out of Thoughtworks. So as we can see over the years, there are a lot of technology innovations that have come out of Thoughtworks.

What do we do today? It's primarily on the consulting side of things. We actually create the thought leadership. We actually consult, and we work with several different clients as part of really figuring out what are the challenges in building platforms.

Specifically, what my team does is we actually go in and try and work on the transformational journeys. Again, it's a journey, right? So we work on the transformational journey, try and build the platforms that can help solve those transformational problems.

As part of doing that, what we really see is that the biggest question that people ask us is, "How can you really make sure that I'm investing things the right way?" And that is the context in which we are really going to be talking about how do you actually measure those values of building the platforms.

Because if you are building platforms in your organization, the biggest challenge is going to be not to get started, right? But the moment you get started, the question becomes, "Where's the value? Should I continue to invest in it? And what's the way in which I invest in that?"

So what we have come up with over the past, this has been going on for a couple of years now. We first built this model that you're going to see about two to three years back, where we started showing actual quantifiable values that organizations can see with respect to the investments and the value that you get out of it.

Anyway, just to finish the conversation on Thoughtworks, what you're really going to be seeing is that if you have not really come across Thoughtworks, check out any of these books. We have written more than 100, 125 books or something like that. We are writing a book on platform engineering, which should be published by Manning and coming out next year. So you should be looking out for some of the work that we are doing in this space.

Let's talk a little bit more about what triggered this whole problem. What we are really seeing is that when we work with the organizations that have multiple engineering teams, and every team is having some levels of commonality with what they do, and they're all building some sort of reusable internal capabilities. So when you're building these, there are two things that come to mind.

Again, this is the context in which, I don't know if you attended the talk from KPMG folks today about Team Topologies, right? So the whole concept of Team Topologies, the work that's done by Matthew Skelton and Manuel Pais, that really applies a lot into the platform engineering space. If you haven't checked it out, definitely check it out.

What that really tells you is that if you are able to abstract out those capabilities, those common capabilities across all these teams, then you can actually be a lot more efficient if you can come up with a working model with that.

So what we encounter is that when we build these capabilities, or when the organization builds these capabilities, the challenge becomes: how do you make sure that you're building the right things at the right time, and are you really measuring the outcome? So that's the whole trigger for this problem.

Talking about platforms themselves, there were talks earlier today about some of the IDPs and all that. Our definition of IDP is fairly simple. So when we are talking about internal developer platform, what we are really talking about is: how do you abstract out those common infrastructure capabilities into a layer that is abstracted out, that is shared by multiple teams?

And you're going to have some business platforms there. You're going to have experience platforms, as well as some of the data platforms that you look at. Our definition of IDPs, this was a question I was having. I had a lot of people ask me this question over the past couple of days: "How do you define platform engineering? How do you define the IDPs?"

So this is something that you can see from our website. We have some published articles on this one, but you can see that the way we look at it is that we actually split the IDPs across five different planes. And it's important for you to know that because it's never a necessity for an organization to build all these five planes, right? So you have to really think about it from the point of view of thin-slicing. What is it that the organization needs, and how can you actually support that?

One key thing here that you might want to look at is that if you look at the left side, that platform product management, this is the challenge that most organizations have. Last night, I was actually talking to some of the folks on the book lines, and one of the things that we heard was, "Oh, these platform teams, they're building all these capabilities and throwing it over the wall that nobody wants to use, and I don't know how to actually make that work."

The answer to that is precisely that platform product management. This is where, again, I want to give a shout-out to Mik, right? I'm sure a lot of you have met Mik this week. Mik Kersten's work on Project to Product, I think that has helped a lot in the whole platform thinking evolution, really starting to think of your platform capabilities as a product and making sure that it has got a product lifecycle that you can work with. And that is an essential component of all these planes.

So you might look at this and say, "Okay, great, all these planes and everything that you see there sort of makes sense. It sort of combines a lot of the learnings and advances, pretty much everything that we have talked about at this conference. You can see that represented here."

But the part that is most important is that platform product thinking. And if you don't get that right, you are really not going to get it right with respect to everything, right? The adoption, building the right capabilities, making sure that that business value is realized.

Sridhar Kotagiri

Good. So what is the problem that we are talking about here, right?

When we go about building an internal developer platform, you see that all of your stakeholders are not aligned. Your technical stakeholders might be completely on board with it, and they want to build it fast. They want a bigger team to build it to realize the value sooner, and they want more and more teams to use it so that they can realize the value. And they understand the purpose. They understand the benefits of the platform.

On the other hand, you have your business stakeholders who are signing the checks asking, "Where is the impact? When the teams can function without it, why should we build them? And why are we spending so much money on platforms? Shouldn't we prioritize business features instead of platforms?"

So these are all the questions that might come up. I'm sure this sounds familiar to most of you, right?

So how can we address a challenge? One thing that everybody understands, regardless of whether they are technical stakeholders or business stakeholders, is the value, right? When you start measuring the value, you would be able to visualize the ROI from the platform products, and you would be able to use it for prioritization.

Without value, you may be using your prior knowledge and experience, and maybe your leadership is pushing those priorities for you. But when you switch to value, then you can use the value for prioritization and have a better chance in realizing the value.

And you can also quantify the impact that the platforms are making on the end customer, as well as your strategic business goals. When you switch to the value-based discussion, aligning with the stakeholders is easy, right? Whether they're internal stakeholders or your investors.

So that's great. Can we just go ahead and start measuring the value for platforms? Well, it's not that easy, right? Unlike a customer-facing product, for example, you want to add recommendations to a product details page on an e-commerce website, you can start measuring, or you are already measuring, the average cart value. And it's that easy, right?

But is it that easy for platforms? No, it's not. That's because platforms do not have a direct impact on the business outcomes, and the value from platforms is not static. The platforms evolve over time. So you won't have a completely mature platform one year down the line. They keep evolving. Hence, the value is not static because they also depend on the engineering team, the leadership buy-in, and the level of enablement that the development teams have.

If you're building a CI/CD platform based on CircleCI, and the teams are not enabled enough to use CircleCI, then you don't see that value.

It's also hard to quantify or articulate the value of these platforms that we're talking about, and they need a significant investment. You don't want to run production workloads on a container orchestration platform unless you're baking in enough observability, security, and scalability, and all those things that you would require before running the production workloads. That needs a significant investment upfront, and that could be a barrier to get them started.

So how can you address, overcome that challenge? Make it a two-step process. First, define the value to make sure that you're building the right thing for decision making. And then measure the value to ensure that it's actually delivering on the objective. And that's essentially making sure that you are building the thing right, right?

Repeat it for every platform capability that you're building to ensure that the platform stays as a strategic asset to the organization. We just talked about how defining value is very difficult, right? So that's where we want to introduce something called platform value model. We'll talk about it in a bit. And for measuring value, we can use platform value metrics. Again, we'll talk about those in a bit.

So what's the platform value model? No, it's not value stream mapping. Value stream mapping, as you all know, is used to identify bottlenecks, inefficiencies, and friction in the value stream.

But value model is to model your cost and the value over a timeline, and it can assess different scenarios, like pessimistic, optimistic scenarios. It can accommodate change during execution. And it can also help you compare the realized value with the projected value. It can accommodate various success metrics, and it is extensible.

So who is this for? Your product managers and product owners can use it for prioritization, for decision making, to ensure that we are prioritizing the right platform based on the value. And then your leadership, whether it is business leadership or engineering leadership, can use this value model to visualize the value and ensure that they're investing on the right priority, right? Making the right investments, basically.

So how can we use it? We can use it to build a platform value model for every platform capability that you're building, and then use it for prioritization, decision making. While doing that, make right assumptions. A lot of the data that goes into this model may not be readily available, so make right assumptions and extend and customize as you need to, and make it a working model.

So you may be coming up with a platform architecture, and you may run some spikes and POCs and find out something is different. So you want to go back and update this model to reflect that reality.

So what goes into building one? You would need the cost baseline. You would need team, infrastructure, a number of costs that go into building a platform capability, and you would need the architecture before you start one. And you would also want to identify the timeline that you want to model for.

It's a three-step process. Essentially, in the first step, you're calculating the cost of building a platform capability. Then calculating the savings that you would realize from this platform capability, and then projecting the value over a timeline. As an outcome, you can see a value summary and the graphs, and you can use it to articulate, I mean, to connect with the business outcomes.

So let's take a demo use case and see how can we build one. A fictitious corporation, ABC Corporation, is experiencing exponential growth, and some teams are already doing microservices. Some teams are containerizing their workload. Some teams are not. Some teams are relying on the DevOps team or the operations team.

What they've decided is they wanted to start building a container orchestration platform on top of AKS, and to ensure that developers can rely on the container orchestration platform and completely focus on feature development, and abstract away all of that complexity that's involved in orchestrating the containers.

They want to invest one to three million. They want to see three times the value, and they have all the prerequisites that we just talked about already, right?

So in step one, what we're doing is we are calculating the cost that goes into building one. You have several different costs. All of these components could vary. This is just an example. So development team cost, whether it is contractors or FTE, and infrastructure cost, licensing cost. And we are only showing six months here, but otherwise we are modeling for three years. This is only for readability.

So we are doing it on a monthly basis so that we can visualize the value later. In step two, we are calculating the savings, right? We are taking one workload at a time, and how does the cost look like if you're not building the container orchestration platform? And what does the cost look like if you are building one and using one, right?

So there could be different cost components, like upfront development cost and maintenance cost, and then the downtime cost, opportunity cost, a number of other things. Again, you can customize it based on your needs.

And then we are coming up with that monthly savings per workload, right? So here in this use case, we are using the workload and a number of workloads as a measurement criteria because on a container orchestration platform, the value comes when you deploy a workload, right?

So in step three, we are taking the outcomes from one and two and then modeling the value. There are three different scenarios that we are modeling for: best-case scenario, average case, and worst-case scenario. And if you look at month 10, we are expecting just one workload to be deployed, right? So as you go, you want to see more and more workloads deployed on this platform.

And we understand that the value will not come out in the first year. So we are discounting that. We are saying that we will only see 50% of the value in year one, and then 75% in year two, and 100% in year three.

And you can use the same model to come back and plug in the realized numbers. So 12 months down the line, you expected 10 workloads to be deployed, but in fact, you have 15 workloads deployed. Then you can compare, right? So your realized value is much better than projected value, right? So you can use this model for comparison as well.

As an outcome, you can get this kind of summary, right? So in this example, this is again an example. You don't see the positive value until year three for average and worst-case scenario, but in the best-case scenario, you would already see the value in year two. So this can be a useful tool, a useful outcome for your discussions with your stakeholders, right?

Then it would be an easy discussion. It would be an easy decision-making tool when you're trying to prioritize between multiple platform capabilities.

And this is another view of the same outcome. You can see the trend. The cost is going up and it's plateauing later. And then the value keeps going up as you use it and as you deploy more and more workloads. And you can also see the break-even point. This could be another visual that you can use in your discussions with your stakeholders.

So once you have built the value model and decided to go ahead and build a platform, then comes like, you want to make sure that it is getting adopted, right? That's been a major challenge, and everybody understands that's a major challenge.

How can you help that? You want to connect the platform capability with the business outcomes. This is one way that you can connect with the business outcomes. So you can look at here, like CI/CD is enabling faster pipeline development, and faster pipeline development contributes to innovation. And innovation drives revenue, right? It increases revenue.

So if you're able to connect the dots, connect the platform capabilities with the business outcomes, then the teams and the leadership will be more encouraged to use the platform capabilities. And then you'll automatically see the value coming out of the platform.

So once you built it and started using it, then comes the measure. That's where we can use platform value metrics.

Ajay Chankramath

Thanks, Sridhar.

So I think the model makes sense as is, right? So we looked at the model, but sometimes having a model itself, to take a spreadsheet to somebody, again going back to Paul's discussion from yesterday, if your leadership is thinking of spreadsheets as an evil, the worst thing that you could actually do, that's not going to really resonate with them, right?

So what you really need are some metrics, because we have been talking about DORA metrics and we've been talking about metrics all through this conference, as well as the past few years. But one of the challenges with things like DORA metrics is it's lagging. What you're really doing here is some of the leading activities.

So what you are really looking for is: can you actually come up with some leading metrics that are associated with building this platform so that you can justify that easily?

Three things that we want to really try and address with this, right? There can be any number of metrics, but the three key things that we typically use are, number one, improved cost efficiency. Did the investment in platform give me more cost efficiency?

Second thing, are you able to drive growth of your business with faster experimentation? And that's something that everybody's looking for. So can that be an object you measure there?

The third one being, can you actually improve your developer experience? So this actually ties into the DevEx space things.

So you can see that there are three key leading metrics that we use. First one is the value-to-cost ratio. Fairly simple to understand. You invest, you get something back. What's the ratio? And what we say is that a sensible default that you should be looking for is about a number of two. And in a span of three years, we expect that to be around 10. Your mileage might vary, but these are the actual sensible defaults that we actually see, right?

Second one, to address the growth aspect of things, the innovation adoption rate metric. So this is something that we feel a sensible default for this, based on our experimentation and based on practicing this, is about 10%. And you might actually find that over a period of three years, you might actually get about 30%. But that is something for you to really implement and see if that makes sense.

Third one, this is again going back to the DevEx side of things: what's your developer-toil ratio? Are you actually reducing the toil of your developers? Are you actually making their life easier? Are you really reducing that cognitive load?

And that is the third one to look for. But with these metrics associated with that model, you would be able to make that case fairly well with all parts of your organization, not just with the leadership, but with the engineering teams too.

So how does this tie to DORA plus one, right? Nowadays, we don't just talk about four key metrics. We talk about five, right? We include the reliability too. As you can see, those are actually the lagging metrics, but there is a clear association between these leading metrics and very specific parts of the DORA metrics as you actually look at it.

For example, your value-to-cost ratio actually has a direct impact on your lead time to changes. Same thing goes for your flow metrics. If you're really applying your flow metrics, take a look at the flow metrics from your leading and lagging indicators there and try and see how are these metrics really applying to that.

And again, this is needed if you are really investing in your platforms as part of your accelerator, as part of your DevOps journey.

So what do we need from you? First thing is that if you are interested, reach out to us. We'll give you the contact. We are happy to share the model with you. We are happy to sit down with you and work with you on some of these models, just to understand how you guys work on this.

And then also take a look at the sensible defaults and the changing targets. Essentially, do those numbers work for you? Would there be a different number that might work for you?

We said we are talking about three different metrics here. Are there more? I'm pretty sure there are more, and we have actually talked internally about so many different metrics now. But it's always good to have a clear set of metrics that work for you.

The next question we have for you is that you saw that we are using spreadsheet modeling, not a tool, and that is fairly intentional. We didn't want to actually go out and codify this completely, because the moment you codify it, the big challenge is that everybody thinks that's the limit of the model. We don't want to limit the model at that point.

We are internally trying to codify it, but we want to leave that out there to see how can we evolve this to a point where we either continue to use the spreadsheets or, depending on what your requirements are, you might start deciding to codify that. But we'd love to get some feedback on it.

The last thing is fairly interesting, right? We said the sensible default for the VCR was about two, but if you think about it, as long as you make a penny more than what you're investing, it sort of makes sense, right?

So are there anomalies like that? Would you really want to continue to invest in platform even if the return is less than two from a VCR point of view? Or would it actually make sense for you to not invest in it even if your return is three? I think those are the kind of things that we would love to hear from.

But with that, you'll see some contacts. If you want to reach out to us, please do. We will be happy to share this model with you. We are happy to set up some time with you over Zoom, and we can actually have a chat and walk you through all of this.