Dynamic Decision Making at HMRC - Moving to Data as a Product During a Pandemic
In March 2020, the UK’s tax department HMRC was faced with a complex problem: how to improve the accuracy, reliability, and truth of information about customer activities, in order to counter fraud AND adapt to the speeds that the situation required, ensuring that every claim was reviewed in 72 hrs?
As the UK government announced statutory sick pay, a job retention scheme, and self-employment income support, the Customer Insights team at HMRC began to transform the Customer Insights Platform (CIP).
CIP is an internal platform built on top of AWS, which provides highly available transaction monitoring and auditing to internal customers across HMRC. This session tells the story of how the Customer Insights team transformed CIP from data storage to data product capabilities and built an intelligent risking service to bake automated risking decisions into customer activities.
Millions of claims were submitted by taxpayers, and every single one was automatically assessed for fraud detection, supporting the work that HMRC fraud investigators were doing in very difficult circumstances. This was a revolution in the speed of data analysis within HMRC. Meanwhile, the thinking of the Customer Insights team was transformed, from data storage to data as a product at scale.
This session will include details of CIP architecture, the delivery mindset of the customer insights team, and working practices that delivered such successful outcomes that the Customer Insights team received a personal note of gratitude from the Prime Minister, and CIP became a multi-award winning platform at the 2021 UK IT awards.
Chapters
Full transcript
The complete talk, organized by section.
Host Intro (Gene Kim)
Thank you, Christina and Robbie.
Some of my favorite presentations from DevOps Enterprise Summit have come from UK HMRC, Her Majesty's Revenue and Customs department. In 2016, Anthony Collard and Lindsay Prower talked about how they modernized one of the largest IT estates in the UK, modernizing the tax filing system so that a single parent could finish their filing on their phone while on the bus ride home from work.
Last year, Ben Conrad and Matt Hyatt presented on how HMRC were able to distribute hundreds of billions of pounds to UK citizens and businesses, an unprecedented financial support package that would eventually see around 25% of the entire UK workforce being supported by public money. And they heroically built the technology to do so in four weeks, under conditions of incredible pressure and uncertainty.
They joked that they went from the most despised government agency to the most beloved, as citizens were able to get the money they needed when most of the economy was starting to shut down. But they had also mentioned something about a parallel effort to theirs that was working to prevent fraud from happening in this process, because they knew from the beginning that these COVID economic interventions would be obvious targets for fraud and that customers could make mistakes.
And so this is that story from Andy Letherby, service owner of the Customer Insight Platform from HMRC, and Caitlin Smith, delivery lead from Equal Experts. They describe how they were able to use data that they had and capabilities they had built to detect and even prevent certain types of fraud long before any payments were made, to ensure that economic aid went to those who were entitled to it.
They were able to do this by working closely with their leadership teams and their government ministers. Instead of policy being created and modified in a way that took years, they worked hand-in-hand, making policy changes in minutes, enabling true decentralized execution. They will share the story of how they did this and how this will likely change how technology works with policymakers for years to come. Here's Andrew and Caitlin.
Andrew Letherby
Hi, my name's Andy Letherby. I'm the service owner for the Customer Insight Platform in HMRC. Today, myself and Kate Smith are going to talk to you. Kate, if you'd like to say hello.
Caitlin Smith
Hi, I'm Caitlin Smith. I'm an Agile Delivery Consultant with Equal Experts. I've been working with Andy for the last couple of years, working on the Customer Insight Platform.
Andrew Letherby
Thanks, Kate. Today what we're going to be talking to you about is some of the developments we've been doing around dynamic decision-making in HMRC, what the pandemic taught us, and some of the experience we had in the pandemic around how we innovated our platform and our products to really help out in delivery of those services.
Who are HMRC and who is the Customer Insight Platform? HMRC are the UK's tax and revenue service. We collect things like income tax, self-assessment, VAT, and corporation tax for customers. But actually, HMRC is much bigger than that. We are a very large government organization in the UK. We handle significant public services. Alongside tax, we also manage the Customs Declaration Service, handling import and export of goods into the UK. We manage the Government Banking Service, which delivers financial services to government organizations, helping to simplify payments and repayments across government.
Customer Insight Platform, my area, sits alongside all of those different products and services in HMRC, looking at how customers, taxpayers, traders, and companies interact with HMRC's digital services: the information they give us, the devices they use, how they interact, when, and what information they share with us. We gather all of that data together and use that data in a range of different decision-making across the organization.
Our old approach: what did we do? HMRC is a fairly traditional organization. We've been around for a while. Our analysts in the organization tended to work like lots of analysts do. They like to gather data together, collate it, match it, make some decisions, model it, and then do whatever their specific role or task was. This is a really manual process. It's grown up over many, many years, and they like to maintain that process because it gave them control and confidence in what they were doing. But it had some issues with it, and the biggest issue is that it's not scalable. It's not sustainable, particularly when the volume of transactions dials up. As business gets busier and you've got the same amount of analysts available, it becomes really challenging for them to process all of that information manually for the volume of work that's coming through the door.
From a Customer Insight Platform perspective, we saw ourselves really as fulfilling one of those roles: collecting data in persistent event streams and storing that data in large data storage. We were quite distant from people making policy decisions and making decisions about services. We tended to be a bit of a thorn in people's sides because we'd say, "You've got to make sure you log information. You've got to provide us data so that we've got that available for different use cases." What we didn't do was share our views on that data.
Our structure was really quite rigid. It was established around fairly traditional models of a project asking us to deliver a particular view on data and a capability. We would then build that thing and make that available. So it favored predictability and predictable structures over flexibility, but all on that baseline view of collecting as much data as we can about these things so that we've got a bit of flexibility in how we can respond.
Then the pandemic happened, and we all know about that. That was a really interesting challenge for everyone. All of a sudden, as organizations, we were told, "You can't come into the office anymore. You've got to stay at home. You've got to protect yourselves and other people. If you can work from home, you should. But if you can't work from home, then really you shouldn't for a while." We saw large sectors of business being asked to stay at home and not to trade. Obviously, that had a dramatic impact on individuals and businesses.
The ask from HMRC was, "Hey, HMRC, you've got really good flexible platforms. Can you help with the pandemic? Can you build services to get financial help to our citizens and businesses in the UK as fast as possible?" That was really critical because that was about making sure that people had money to feed their families at a time where we didn't want them to go to work and earn money to feed their families, because that was a greater risk. So it was a really difficult time.
We knew that if we were going to do this, we were going to be offering up substantial amounts of money that were going to be available in support packages. We knew, because there's big money involved, that that would be attractive to organized crime. So we knew we had to build some control measures in place. At the same time, those changing work patterns and changing conditions actually meant, as an organization, we had our own challenges, just the same as every other business in the UK.
Our working pattern moved from predominantly face-to-face working, particularly in the digital space where we were used to standing around whiteboards and talking to each other. All of a sudden, we were being asked to work from home and work online. So there were real challenges around how we started to do those things, not least really simple things like organizational VPN, which was structured so that we could support typically about a third of our employees working remotely. All of a sudden, we wanted 100% of our employees working remotely. Equally, working practices like how we collaborate on whiteboards and things like that all of a sudden changed dramatically.
What did we do? First of all, we took a step back. We reassessed our role and our offerings and thought about what our analysts actually needed. We knew that they were going to have a lot of data coming in, so we knew they were unlikely to be able to do a lot of the analysis they needed to do manually. We knew they'd need to have the data available to them readily and quickly. We also knew that if we were going to be successful, we'd need to overlay our view on that data, our insights.
We knew that we had to prepare for uncertainty. We'd designed already for flexibility, and we had some flexibility in our services. They were adaptable, scalable, and reusable. But we needed to change those because the one certainty we had with the pandemic was nobody knew what was coming next. So when we looked at what services we were going to build and how we were going to build them, we deliberately built them to be very modular, very portable, so they could be reused across multiple services and have repeated value, so we wouldn't have to keep rebuilding for a new thing.
We knew that we needed an infrastructure that could support those changes, which meant changing the size of our infrastructure, making it much bigger, scaling it to support the potential volume of customers who we were going to need to support with those services, and putting in place a much bigger team, changing the size of the team from a relatively small team to more than double its size in really short time periods. So we knew we would have to do some of these things.
One of the most important things to recognize was that our solution wasn't new. When we were looking at data models, we'd been looking at data models, insights, and how we develop them for the business for a number of years, alongside what we were doing in delivering the specific needs of the users we had within the organization. We knew there was mileage in insights. We knew that in the future, decision-making based on the analysis of data was going to be valuable. So we'd already been testing and investing time and effort into building up those types of capabilities.
All of a sudden, what the pandemic provided us with was a catalyst to really industrialize them and provide them at scale and speed to our internal analysts, because it was the solution they now needed: that ability to see automated decisions and collation of data into meaningful insight, into meaningful information that they could then use either in the filtering of transactions, so they could say, "This is good, bad," or, "We need to review it," or when they're performing that more detailed review of the things that did need review, so they could really dive into that detail.
We also understood that if we were going to be successful, we had to focus on well-understood problems. We chose to focus on looking at specific attributes and specific objects that were used by the business. We chose to do things that were really transferable, again, because we knew they needed to be reused. So we looked at things like addresses and bank accounts, where we had good sources of information. We could readily check that information for validity. We had good sources of intelligence, both from within HMRC and from wider communities and third parties, around how specific accounts or addresses had been used or are being used within the taxpayer community or the wider outside world.
What that allowed us to do was focus on those familiar concepts: we check a thing, we check it against our own intelligence and third-party intelligence, we decide if it's good, bad, or indifferent, and then we decide what the next step to take is. Those are really familiar concepts to the business, which meant that when we introduced them, the business had a really good view of how they could use them, and they already understood the concepts behind them.
Our solution: what did we actually do? One of the biggest developments we did was delivery of our insight service. Prior to the pandemic, it was really a bit unthinkable to say we would provide insights, we would provide that collated view, and actually we might make some decisions and triage activity up front. Because of the scope, scale, and volume of the pandemic, we started looking at whether we could assess claims upfront as taxpayers were making them at that point in time.
For customers that were low risk, it meant that we could push those through and they would go through an accelerated process. For ones that looked particularly risky, we could pay more attention to them. That prioritization effort became really quite important to us. New third-party data became available to us. That helped us to have a really refined view of those risks and helped us inform our judgments and detect, with more fidelity, good or bad things.
Ultimately, it was about using those ideas of human-based decision-making, but actually moving them into a world where we could make those decisions in a more automated way, whilst letting our people within the analyst community maintain some of that control over that, so they could have a view. What I'm going to do now is hand over to Kate, who's going to talk a little bit more about our solution and how we approached it.
Caitlin Smith
Thanks, Andy. I'm going to tell you a little bit about how we approached the design element.
From a data capture and data pipeline perspective, we already had a very robust pipeline. For all the digital services on the digital platform, we collect those events, process them, and surface them in various tools or various means to analysts across the organization. So we looked at building on that design. From each new COVID scheme that came on, we reused that design pattern, which made it very easy to be able to collect that data, do some testing, some processing, and some surfacing. That was fairly straightforward.
Where the more technical challenges came was around the modeling and the insights, and also the integration with the third-party data sources. From the data pipeline perspective, the team members who already knew that pipeline were already there to help. But we needed to scale when we looked at building out the insights and those additional attribute third-party services.
To scale in the pandemic, we were already helping people to move from co-located to remote, so there were those challenges, but then we also needed to onboard new members. We looked at reaching out to people who had already worked on the digital platform and had that experience, and we found a lot of people really wanted to come back and help. They'd had such a great experience at HMRC, so were actually really delighted to come back at that time. We helped that growth by reaching out to people.
The design patterns that we built, not only for the data pipeline perspective but also for the insights, had reusability in mind. As you can see, it was eight weeks to do the initial build. But then later that design could be applied to different schemes and done within approximately two weeks.
We recognized that building up the right team was crucial to being a success in this situation. For us, it was about recognizing that people had a huge amount of pressure at home, but also were aware of the importance of building out this service. So we set out that it was important that we looked after people to make sure that they weren't doing overtime. We discouraged that. Overtime was only done, I think, about once or twice, so it was an exception rather than the norm.
We made sure that we took care of people. We brought in a burnout specialist to help support the team. We had social events. We knew that some people were at home on their own during the pandemic, and their work colleagues were a lifeline. We also knew that people had families and were trying to do homeschooling. So take a moment; if a child comes in, we say hello. It was very much that we needed to look after the team. It needed to be sustainable because this wasn't something that was going to be a couple of weeks and then it was over. We knew it was going to go on longer than that, so it was really important that we took care of the team.
The scale of what we achieved: I've talked about our data pipeline, and that's been built for all the digital services. So as you can imagine, that can cope for SA peak, the self-assessment peak that comes in once a year for HMRC, and other peaks throughout the year. So it was already robust and scalable. But you can see here some of the scale that we experienced: 3,960 claims per minute. The system had to cope with that, and we had to have mechanisms to make sure that the traffic was going through, that data was being processed, and that we were not losing any data as we hit those peaks.
For the first time, 100% of the claims were being assessed upfront, then passed down to the analyst to be processed further downstream. That was a first for the organization. And 22% of the claims that were submitted for statutory sick pay were flagged up front as well, helping with that prioritization and helping to avoid that manual process.
The scope and reach: 10 times the amount of usage than we had normally. The users that were already using the service were using it even more than they normally would. Because that data was being collected, it was easy for them. They knew the tools. They were able to access the data. For example, we had 2.3 million searches, and we also had 150% more users added onto the service.
You can imagine, in normal circumstances, onboarding users needs time and focus. We had to do that remotely with new people who were being onboarded into HMRC or moving around in the organization. So that was done in a collaborative way, working with subject matter experts across the organization and supporting them through that.
Where are we today? Where has all of this taken us? Now it's about packaging it into data products. Remember how we said pre-pandemic our analysts were doing all the hard work of connecting the data and making sense of it? We are now doing some of that hard work for our customers and our analysts by packaging business objects into digital services.
What does that mean? We look at data objects such as an address or a bank account, and we pull out that data away from the digital service or the journey it was going on. We look at that address in isolation, and we're able to make the connections across our data set. We provide the analyst that view or access to that data around those data objects.
Since the pandemic, we have seen more and more people across the organization, or services, requesting access to the data. We also know that we're having requests coming in from other government departments as well. For us, it's about how we go on that journey to enable access to the data in a safe and secure way.
What's next for us? As we've said, for now we're excited to be starting to explore the data mesh architecture framework. This is to take us to the next level of sustainable data production, and so that's about scaling our data products, giving us the flexibility to produce data products at scale when we need them or stop them when we don't need them.
The data mesh takes the threads of what we've learned in the last few years, such as we are now seeing data as our commodity, where previously the tools or the pipeline was our commodity or our product. But we now know data is the product. Decentralizing the ownership and the governance enables the organization to get wider value faster and improves that data accessibility. We hope to come back next year and tell you how it went on this journey. We hope it's going to be successful, but we'll tell you about those learnings. That's where we're going next.
One thing for me, if there was one thing that I've valued or learned through this journey, as I said, it's about building those teams. Without the people and the teams, we wouldn't have been able to have built these services across HMRC. That's what I really value.
Andrew Letherby
Thanks, Kate. In summary, what have we learned? We thought really deeply about what we needed to do and what our role was within the pandemic. Our role was really clear. It was about helping HMRC to deliver critical services, but at the same time ensure that, because they were likely to be attractive to criminals, we could see that happening, prevent it from happening, detect it from happening, and if it did happen, we had the information available to pursue people who took money they weren't entitled to.
We recognized that by looking at data as a commodity rather than just as a flat source of information that people wanted to access, we could produce better value for our customers, our analysts. By understanding how we could deliver that, it started to get us into this place of a mental model where we started to look at data objects and items, attributes, as commodities in their own right, and then overlaying that idea of insights. What does the data tell us? So that our analysts didn't need to do the heavy lifting. Those analytical models were pre-built and pre-served to them as outcomes with the associated information so they could see how we'd reached decisions.
It was a fairly unique time, there's no doubt about it. For HMRC, that meant that the whole of HMRC had a single focus for a period of time, and that single focus was around delivery of these critical services. That meant we had a unique line of communication direct from delivery teams at the front end, all the way through to policymakers in central government who were making those decisions and pulling the levers, setting the levels of risk appetite very clearly within the services so that we could balance the delivery of services versus the management of risk.
Those really clear lines of communication meant that we could have those simple decision-making and detailed decision-making conversations directly. That's something that we're trying to maintain post-pandemic within HMRC, really fostering the idea that the people who have the policy goals should be directly involved in the delivery alongside the delivery teams so that we can foster that community structure.
Our ability to change and scale was absolutely underpinned by our infrastructure. You heard from Ben last year about the amazing things that our managed digital tax platform has been able to achieve through having flexible cloud-based microservice software-defined infrastructure. But actually, some of the less obvious things, like we decided we needed bring your own device so developers could develop alongside HMRC secure systems without having to have full access. That wasn't, at the time, a decision that was relative to disaster recovery, but actually it turned out to be incredibly valuable when the worst happened, because it meant that our developers could access development services without putting strain on the rest of HMRC, so we could adapt. That adaptability became essential in being able to respond to the disruption that the pandemic caused.
That disruption wasn't just changes in work practices. It was new threats, different technologies that we needed to adopt rapidly, and ways in which we could work with them.
This is titled "I'm looking for help." What we're trying to get over here is: HMRC, we're trying to be collaborative about how we do development. We're really interested in knowledge exchange with others working with similar data challenges. We're interested in understanding how other organizations approach some of these problems of how you convert data science and data management into innovative solutions and how you integrate those in large organizations. Linked to that is models for how to fund innovation that help to decentralize that and then empower agile delivery to innovate across live services and into new services.
The bottom line is we'd really love to talk. I'd like to thank you for listening, and I hope you found this informative and useful. I know Kate would like to thank you, too. Kate?
Caitlin Smith
Thanks, everyone. Thanks for listening to our story, and we do hope to come back next year and tell you about how we got on with the data mesh.
Andrew Letherby
With that, I'll say thank you very much for your attention. Hope you found it useful, and goodbye.