Connecting Feature Flags and DORA Metrics

Log in to watch

Las Vegas 2022

Connecting Feature Flags and DORA Metrics

Mark Allen

Feature Flag Evangelist · Devcycle

Niroshan Shanmugarajan

Director of Development for RBC Digital Investment · RBC

How Royal Bank of Canada measured the impact of Feature Flags on their development performance using DORA metrics.

Measuring the success and impact of new development processes is crucial to ensuring your team is always well-equipped with tools and processes that actually drive growth. Any team looking to adopt feature flags in their workflow should also adopt a way to properly measure the success and effectiveness of feature flags. DORA metrics, specifically Deployment Frequency and Mean Time to Recovery, are two criteria that engineering managers and team leads can use to both motivate their team members to use feature flags, and provide rationale to their organizations on why feature flags are effective.

Even for teams not yet using DORA metrics, evaluating your processes using these two criteria are a great way to get started with them. Deployment Frequency is easy to automate and measure as it is easily added to the CI/CD pipeline tracking when new releases to production are completed. Mean Time to Recovery can be calculated from issues in the support management system that are tagged as a production outage.

RBC’s implementation of feature flags and their use of DORA metrics to measure the success of feature flags is a prime example of this. As part of the process of adopting Feature Flags the Digital Investing Development team wanted to be able to measure the effectiveness of them. The team has been implementing Dora metrics and this seemed like the right place to start with understanding the impact Feature Flags would have on their software development performance.

Chapters

Full transcript

The complete talk, organized by section.

Mark Allen and Niroshan Shanmugarajan

Mark Allen: All right. Well, thanks everybody for taking the time today to come and listen to our talk on how to connect feature flags to DORA metrics. Who out there uses feature flags? Some people. All right. Hopefully we can convince some more of you to use them by the end of this talk. And who out there collects DORA metrics? That's even less. Okay.

All right. So, Mark, what do you use feature flags for? I use feature flags to help me deploy my code more often. By hiding code behind a feature flag, I can merge it into main, deploy to production, and I know it's not going to get run until I turn the feature flag on.

So, Niroshan, what do you use DORA metrics for?

Niroshan Shanmugarajan: I use DORA metrics to keep tabs on my deployments.

Mark Allen: All right. So I'm Mark Allen. I'm a developer evangelist with DevCycle. We do feature flag tooling for developers and product managers. I've been using feature flags for about eight years now. And in my job, I get to talk to people every day about how we can use feature flags to improve developer productivity and developer performance.

Niroshan Shanmugarajan: Hi guys, I'm Niroshan. I'm the director of development at Royal Bank of Canada. I help lead the retail trading development team working on a product called RBC Direct Investing. For those who don't know Royal Bank, it's a large bank in Canada with over 89,000 employees worldwide.

Mark Allen: All right, so where did we start in all this? We've been working with Royal Bank for a number of years through a number of different products, and more recently with DevCycle, with our feature flag product. I think that we can all agree that we wanted to deploy software more often so that our users can see the features that we're working on and that we can fix those annoying little bugs that we put in the last release.

Feature flags have a great way of being able to improve a couple of different areas. One is getting code into production. It allows us to move to a trunk-based version control strategy. We can debug issues, and for product managers it allows them to release features to specific groups of users, giving us early feedback on how that particular feature is working.

So Niroshan, how do you guys at RBC deliver software?

Niroshan Shanmugarajan: First, let me give you guys some context. Our software allows retail clients to trade securities and options in real time. Considering the nature of our software, we are quite risk averse, with real financial repercussions if we get something wrong.

The cadence we've been historically comfortable with is a monthly release cycle with a significant regression exercise right before release. Our business also does not sit still, and we are continually evolving to support more features, improve user experience, and enhance security. This takes extra time and effort with our current deployment model. There's a real need to increase the frequency of deployments while keeping steady and improving application stability. So tell me, Mark, how can feature flags help?

Mark Allen: There are a couple of ways that feature flags can help with deployment frequency, but the simplest case is when you're developing a new feature, just hide that code behind a feature flag. Then you know that you can merge that code into main, you can put it in production, and it's not going to get run. It also gives you the ability to do a very much piecewise approach to develop different pieces of software.

My personal experience in doing this is in my previous role. I was head of engineering at Taplytics, the parent company of DevCycle. When I joined, they were releasing software every couple of months. First thing I did was I implemented feature flags on a new component they were releasing, and right away we were able to release software every week. I like to look at this really as kind of the genesis of DevCycle within Taplytics.

Niroshan Shanmugarajan: That sounds good in theory. But how do we actually measure the impact that feature flags have on our development process?

Mark Allen: So let's look at DORA metrics. We've seen a few talks already this week about DORA metrics, what they are. I'll go through them quickly just so that we're all on the same page. We have deployment frequency: how often we get code into production. Lead time for change: how long it takes that little piece of code I just wrote to actually get into production. Mean time to recovery: when we have an oops on the site, how long it takes us to recover back to a good running state for our users. And then change failure rate: how often a change that I make to production causes a problem.

So Niroshan, I know that RBC has spent a bunch of investment into collecting DORA metrics. How did you guys get started, and how do you do it?

Niroshan Shanmugarajan: The past couple of years has seen an explosion in retail trading activity, and we had days where our infrastructure struggled to keep up. So the management team spun up the Endurance program to ensure that our platform is able to handle these loads. As part of this program, our SRE team started collecting DORA metrics, and we've been collecting them since 2021.

Our team collects the core metrics that Mark has mentioned. In addition, our change failure rate is derived from custom SLO metrics we've set up in our monitoring services. And the mean time to restore is also derived from SLO by using the total duration of SLO violations, basically from breach time to restore time. We also have a formalized agreement between business, dev, and SRE teams to prioritize issues that affect SLO over features.

Mark Allen: You can see how this particular initiative is across multiple teams and different departments, all working together to provide a common set of metrics, which inside a large organization is quite an accomplishment.

Metrics are great, but metrics rely on data. Where does this data come from? One of the things I really like about DORA metrics is I'm collecting data from systems I already use. CI/CD gives me deployment frequency. Version control can give me lead time. Monitoring and observability can give me my MTTR and failure rate, and I can take this data and put it all together in a nice, easy-to-use dashboard so people can see it.

So Niroshan, how are you collecting this data at RBC?

Niroshan Shanmugarajan: As you mentioned, lead time is collected through Git. RBC uses GitHub, and then we keep track of when features are merged to our master branch. Deployment frequency, we get it from our pipelines. We use Jenkins, so we pull the stats from there. Mean time to recovery and change failure, what I spoke about earlier, we pull that from our Dynatrace monitoring, where we set up custom SLOs, and we also get it from logs as well.

Mark Allen: RBC is leveraging existing tooling in order to build DORA metrics. This is one of the key selling features inside an organization. You don't need to spin up a new system in order to collect these metrics. You're already in there. But there are challenges. What is a failure? Niroshan just mentioned they use SLO violations in order to say that is a failure in the system. Different organizations have different ideas of this, and that's one of the challenges that we'll talk about in a little bit.

One of the key features is that we can take DORA metrics, put them in graphs, and people like to see graphs, so it's easy for people to understand exactly what's going on.

So now the question is: we're collecting DORA metrics. How can we implement feature flags in order to impact those metrics inside an organization like RBC and a project and a team like Niroshan's?

I've talked already about the simplest case, which is to take a new feature and hide it behind a feature flag. This allows me to deploy code in production faster because I know that code is not going to get run until I turn the feature flag on. The way that I work with feature flags on a new component is I build it, and then I turn on the feature flag for myself. I test to make sure that it works. I turn it on for my team, make sure that works with the parts of that feature that they're working on as well. I turn it on for my product manager to make sure that it's what they were looking for. Then I hand it over to them, and they can roll that feature out to the users as they see fit.

One side benefit of this workflow is that it also helps improve my lead time for change. Before, I'd have to do a whole series of tasks before I could commit my code in and merge it into production. Now each one of those tasks I can individually merge into main, deploy to production, therefore decreasing the lead time for getting the code into production. I also don't have to rely upon other people having their work done in order for me to merge mine in. If someone else's code isn't ready yet, that's okay. My code is behind a feature flag. No one's going to run it until we go ahead and turn it on.

I'm a huge believer in the best place to test is in production. That's a scary statement. But one thing that feature flags can do is if I turn a new feature on and it doesn't work, I can turn it off. So my mean time to recovery then is measured in minutes, where before I would have to look at the problem, diagnose what it was, do a patch, deploy that patch, which can take more than a few minutes.

Also, feature flags have a positive impact upon change failure rate. One of the things that we can do with feature flags is slowly ramp up traffic onto that new feature. This gives our infrastructure time to scale up. Nobody's infrastructure scales up right away. Some machines have to get started, nodes have to come onto the Kubernetes cluster, et cetera. It also gives our monitoring and observability platforms time to look for bottlenecks and potential failures before they actually become a true failure.

If everything's going well, we can continue to keep ramping up the traffic onto that new feature. But if everything goes pear-shaped, we can just turn off the feature and then diagnose what was going on without having to deal with an outage at the same time.

So Niroshan, I know that you guys have implemented feature flags. What are some of the things that you've learned along the way?

Niroshan Shanmugarajan: We're still in the early stages of implementing feature flags in our stack, but we're already seeing some benefits. Historically, we got projects with different delivery dates, each with separate branches and independent pipelines deploying to various environments. We also had to contend with a large merge exercise once the feature was ready and a regression to make sure we hadn't broken anything after.

With these dynamic feature toggles, we have to collapse all these environments, branches, pipelines, and drastically lower our lead time for changes out to production. Since we're able to stagger the rollout of features, we also ensure that our changes are vetted, and any deviation from our established SLO would slow our rollout as we address server capacity or code efficiency. This would have the nice impact of improving, like Mark mentioned, the mean time to recovery and change failure rates.

Mark Allen: One thing about implementing feature flags and DORA metrics is it's a journey. You're going to start off with one feature flag. You're going to implement that, you're going to see a teeny tiny little impact, but over time the metrics that you're collecting will start to show that the utilization of feature flags has a positive impact.

Here's some things that we learned when we went through this process. Have something to work towards. Anybody who's implementing feature flags, you need to measure how that impacts your teams, and DORA metrics provide you great metrics in order to do that. Deployment frequency and lead time for change come really easy in the DORA world, and there are real positive impacts that feature flags can have on that.

When I implemented a feature flag early on in Taplytics, I was immediately able to show everybody: look, here's the deployment frequency graph. You can see how we're now able to deploy much quicker. Not only could I show my team that there was a positive impact, I got to also show the whole organization how that was actually helping out.

Results can take a long time. Implementing feature flags is a journey. It's not something you just turn on and the next day it's all running. DORA metrics give you an ability to view and see how things are going, but also DORA metrics are things that take time to collect. At RBC, they didn't just come up with it overnight in order to collect these data. There was a whole team that implemented it.

Not all organizations have gotten their heads wrapped around DORA. As Niroshan mentioned, how do they define an outage? What is an issue? In their case, it's a violation of SLO. For other organizations, it could be something else. It could be an API returning a 500 or something like that. So you need to work with the rest of your organization in order to understand exactly how to put them together.

So Niroshan, what about you? What did your team learn?

Niroshan Shanmugarajan: DORA has challenged us to identify any inefficiencies in our workflow, think through what constitutes failures and recovery, and come up with options to improve our metrics. Feature flags have been a key piece to facilitate this and provide us a pathway to become more efficient, deliver faster, and get closer to that DORA elite team status.

Developer buy-in was crucial for us. Our team was using simple config-based feature flags extensively, so our team's adoption of this tool to control the flags dynamically was a rather easy proposition, and many saw that immense power provided and loved the simplicity it brought to branching and deployments.

As easy as it was for us, your mileage may vary on your respective teams. I had a conversation with a few conference attendees, and the uptake might be a challenge on certain teams. We're also working through which use case we would use feature flags for, because it doesn't make sense in all situations.

The other thing is it's really early days for us, and we haven't seen the impacts of feature toggles in our SRE reporting yet. But at least as a thought experiment, we can definitely see where the improvements will come from. This is a required stepping stone for us to get to CI/CD in production and really ramp up those metrics.

Mark Allen: All right. Those are our lessons, and we still have some problems that remain, and we're really interested in how other people might have solved these problems or some other ideas.

Niroshan Shanmugarajan: External buy-in from the gatekeepers. And trust me, there are many in a bank. We have to ensure that risk, compliance, and security teams are all on board. So I would love to find out how you guys get these approvals and what the process looks like in your organization to get changes out to production using CI/CD and feature flags.

Mark Allen: There has to be continuous investment in DORA metrics. How are people going inside their organizations to continue to collect these metrics? And MTTR is hard. We're using SLO metrics. It'll be good to find out what everyone else is using.

All right. Well, thanks everybody. I hope that you've come away with a little bit from our talk today about how feature flags can directly impact DORA metrics. You can increase deployments by hiding code, decrease lead time by staggering out releases, decrease MTTR by turning off broken features, and reduce change failure rates by slowly scaling up the release so the infrastructure has time to catch up. All right. Got eight minutes.