Saving the Economy From Ruin (with a hyperscale PaaS)

Log in to watch

Europe 2021

Download slides

Saving the Economy From Ruin (with a hyperscale PaaS)

Ben Conrad

Head of Agile Delivery · HMRC

Matt Hyatt

Technical Delivery Manager · Equal Experts

Government IT projects are infamous for being lengthy, costly and delivering unstable services that crash under load. This is an experience report of how all these norms were reversed, resulting in the rare event of positive press coverage for a normally heavily-criticised public sector department.

In March 2020 the United Kingdom went into lockdown, causing the most brutal recession in living memory. The Government had to react quickly, with policies to help citizens and businesses cope. These included:

The Coronavirus Job Retention Scheme

The Self Employment Income Support Scheme

The Eat Out to Help Out Scheme.

Once these initiatives were announced, the UK’s Tax Department (HMRC), needed to encode these policies as digital services, capable of dealing with huge spikes in traffic. And it had to be designed, implemented and delivered in a matter of weeks.

This session shares the story of how such a rapid and effective IT response was achieved. It will include the foundations that made it possible, and the architecture, delivery principles, working practices and organisational structures that enabled hugely successful outcomes.

At its centre is HMRC’s Multi-channel Digital Tax Platform (MDTP) - which employs Continuous Delivery at scale - and began development in 2014. It is a cloud platform for over 130 user-facing applications, powered by over 1000 microservices that have been built as part of HMRC’s ‘making tax digital’ strategy. It provides an easy way for teams to build and deploy applications that can scale to handle millions of requests.

The platform and services needed to deliver financial support to more than 12 million employed and self-employed workers via the three schemes outlined above. In just a matter of weeks, the team was ready and the platform withstood 67,000 job claims within half an hour of the Job Retention Scheme going live. It also handled 440,000 applications for government grants via the Self Employment Income Support Scheme on the first day of its operation.

This is a powerful story of how the right combination of culture, technology and focus can empower a large organisation to pivot fast and efficiently, resulting in the rapid delivery of digital services that are user-centric, maintainable, performant and resilient.

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

[00:00:12.970] Thank you Fernando, Vikalp, and Andrea.

[00:00:16.760] So without a doubt, the COVID-19 global pandemic was one of the most disastrous health crises in a century, and also one of the worst economic crises because suddenly hundreds of millions of people were either unable to work or their jobs had disappeared.

[00:00:32.480] However, in many countries, the worst effects of these crises were ameliorated due to the massive government programs to ensure that their citizens, often the most vulnerable, had sufficient funds to feed their families, as well as stimulate the broader economy.

[00:00:48.160] So one of my favorite presentations from DevOps Enterprise was in 2016 from the UK HMRC, Her Majesty's Revenue and Customs, their tax collection agency for the UK government. They described how they made it easier than ever for citizens to do their personal tax assessment, enabling, say, the single parent to file their taxes with a click of the button on their bus ride home.

[00:01:11.320] And they did this despite being embedded in a massively and famously complex IT estate.

[00:01:17.440] So up next is another amazing story from HMRC.

[00:01:22.020] This is a story about how last year they were able to distribute hundreds of billions of pounds to UK citizens and businesses, an unprecedented financial support package that would eventually see around 25% of the entire UK workforce being supported by public money. And they heroically built this technology to do this in four weeks under conditions of incredible pressure and uncertainty.

[00:01:46.580] This story is told by Ben Conrad, who is their head of agile delivery, responsible for the multi-channel digital tax platform of which the success of this entire program hinged upon.

[00:01:59.120] And he will be presenting with Matt Hyatt, technical delivery manager at Equal Experts. They will describe the incredible challenges that they had to overcome and how they achieved their amazing outcomes. Here's Ben and Matt.

Ben Conrad

[00:02:15.780] Hello, I'm Ben Conrad. I joined the civil service four years ago in order to come and work on the digital platform that we're going to be talking about. I've had a few job titles in that time, and currently the line at the bottom of my emails reads Head of Agile Delivery.

Matt Hyatt

[00:02:31.380] Hi folks, I'm Matt Hyatt and I'm an agile delivery consultant with Equal Experts. I've spent the last two years working together with Ben and a team of about 70 people who build and maintain HMRC's primary digital platform.

[00:02:45.280] Today we're going to talk about that platform, why we think it's successful, and then share some stories about delivering services to save the economy in the midst of the pandemic.

Ben Conrad

[00:02:56.480] Let's start in context. Although I imagine many of you are already familiar with HMRC.

[00:03:04.260] We are the UK's tax collector. We collect taxes from individuals and businesses in the United Kingdom, and as you may have heard, there's a slightly expanded role in the area of customs due to some small rule changes recently.

[00:03:18.040] The civil service is apolitical, and we are largely an operational department who have relatively little policy work.

[00:03:26.400] HM Treasury, headed by the Chancellor of the Exchequer, sets the economic policy for the UK government.

[00:03:32.580] And what HMRC does is make the policies of the government a reality, which can be quite challenging in itself.

[00:03:39.640] There are around 60,000 people working in the department as a whole, with about 2,000 working within HMRC Digital.

[00:03:47.800] And we are the big player in the UK government when it comes to digital.

[00:03:51.920] HMRC is responsible for around 70% of all government transactions with the public that happen over the internet.

[00:03:59.500] And it's because of that that when COVID-19 hit, HMRC played a key role in delivering the UK government's financial response to the economic crisis. On the 23rd of March last year, the Prime Minister made an announcement in which he gave the British people a very simple instruction.

[00:04:17.260] You must stay at home.

[00:04:19.740] In the first lockdown, the British people were only allowed to leave their homes to go shopping for necessities, take exercise, or if absolutely necessary, to travel to work, or check their ability to drive by visiting Barnard Castle, obviously.

[00:04:34.540] Everyone working on building digital services were, of course, able to work from home. Indeed, we'd been doing so for some time prior to this instruction. We have the technology and there was no negative impact to our productivity. We were the lucky ones.

[00:04:48.640] The lockdowns had a whopping great impact on the UK economy.

[00:04:54.280] Every country had its own experience of the pandemic.

[00:04:57.060] For the UK, like many European countries, it was extreme, and that is without the health crisis itself.

[00:05:04.680] We experienced the most severe economic contraction for over 300 years. It made the global financial crisis look like a blip. This chart shows the impact of the first national lockdown, but there have been three so far, during which we were ordered to stay at home.

[00:05:20.640] Thousands of businesses paused or ceased trading and millions of citizens lost their income.

[00:05:27.660] Industries were completely decimated as people simply couldn't work. Hospitality, travel, the arts were all closed.

[00:05:37.400] The government responded by announcing an unprecedented financial support package that would eventually see around 25% of the entire UK workforce being supported directly by public money.

[00:05:51.060] This support had to be accessed somehow.

[00:05:54.880] The Chancellor announced in a live television broadcast that HMRC would provide access by building four new digital services and doing so fast. From the time of the broadcast, which is more or less when our teams found out, they were given 20 working days to deliver the first service.

[00:06:12.536] The spike on this chart is from the announcement of the Self-Employed Income Support Scheme, which, before we'd built any new services at all, people were logging into their tax accounts to calculate themselves what they might be eligible for.

[00:06:25.496] A normal new digital service might take nine to 12 months to deliver from inception through a discovery phase, building an MVP, an alpha, private beta, before being launched as a public beta. But for the COVID services, we just didn't have this luxury.

[00:06:43.556] But the challenges went way beyond ludicrous timescales.

[00:06:47.996] We knew we'd have millions of users, but nobody could actually tell us how many.

[00:06:52.936] Whatever we built had to be accessible to everyone and had to be capable of paying out billions into bank accounts within hours of launch, and it needed to be secure, with checks being conducted before money was paid out.

[00:07:06.576] So, we had four new services. I'll go through the acronyms there.

[00:07:09.596] There's the Job Retention Scheme, which introduced us to the word furlough.

[00:07:14.016] There is Self-Employed Income Support Scheme, Statutory Sick Pay, and the one we've slightly mixed feelings about, Eat Out to Help Out.

[00:07:22.396] For that last one, the government probably saved thousands of jobs in the hospitality sector, but it did so by subsidizing meals and encouraging people to sit inside restaurants.

[00:07:35.016] So, how did we do?

[00:07:37.256] Thankfully, we nailed it.

[00:07:40.736] We went from being the least popular of all government departments to the people you can rely on to help out.

[00:07:46.836] All the services launched on time, most a week or two ahead of expectations without any issues, and we achieved a 94% user satisfaction rating. The Job Retention Scheme paid out over a billion in claims on its first day at an average rate of three claims per second.

[00:08:03.846] The current value of claims across those four schemes is around £80 billion.

[00:08:09.226] This created some publicity, although perhaps not as much publicity as it would have done if they'd all crashed on launch.

[00:08:15.976] So why was there this excitement?

[00:08:18.536] IT projects in the public sector have a reputation for being terrible.

[00:08:24.416] At the same time, back in the first lockdown, many established national brands in the private sector were struggling to cope with their increased traffic.

[00:08:33.536] Major websites crashed or had to implement some really nasty queuing solutions and just to book a delivery from a supermarket.

[00:08:41.297] But our services held up, and the industry wondered how we'd done it.

[00:08:45.736] The answer is that we leveraged a mature digital platform, one that has evolved over the last seven years and which allows HMRC to rapidly build digital services and then deliver them to the public at hyperscale.

[00:09:00.716] But what exactly is a platform and why is it useful? So, our platform is the Multi-Channel Digital Tax Platform or MDTP.

[00:09:11.536] It's a collection of infrastructure technologies that enables HMRC to serve content to users over the internet.

[00:09:18.256] It's useful because business domains within HMRC can expose tax services to the public by funding a small cross-functional team to build a microservice or a set of microservices on our platform.

[00:09:29.926] The microservice architecture is another talk, but it really does enable a great deal of what we offer.

[00:09:36.196] But that's not so different from any hosting service.

[00:09:40.536] What makes our platform so useful is that it removes much of the pain and complexity of getting a digital service in front of a user.

[00:09:48.356] We achieve that by building, customizing, configuring a suite of common components that are necessary to develop and run high-quality digital products, and we offer these to our tenants for free.

[00:10:01.996] We always struggle to find an image to use on slides that represents MDTP. Indeed, one year, we ran a competition across HMRC Digital for someone to draw MDTP.

[00:10:15.376] This was the winning entry.

[00:10:18.576] As you can see, it demonstrates a certain degree of hand-eye coordination.

[00:10:23.396] It won by dint of being the only entry, and it doesn't exactly convey much about the platform itself.

[00:10:30.556] So, let's go back to the logos.

[00:10:33.645] I'm not going to go through all of these, but you may notice that most of our tooling is open source, which is not the norm in a traditional government department, where there is sometimes a comfort taken from an expensive licensing arrangement.

[00:10:48.216] MDTP and the people that have worked on it have successfully transitioned a large-scale public organization into open source on the public cloud, and even to coding into the open.

[00:11:00.356] Now, here's Matt.

Matt Hyatt

[00:11:03.356] Thanks, Ben. So, an important part of this talk is scale.

[00:11:07.716] So, I guess you're wondering, how big are we?

[00:11:10.376] And the answer is pretty big. So, we're probably a level down from a planet-scale operation like an Amazon or Facebook, but we're bigger than many tech organizations.

[00:11:21.136] And we're the largest digital platform in UK government, and due to the sheer number of services that we host, we're probably one of the largest platforms in the UK as a whole. We host about 1,200 microservices built by more than 2,000 people, and they're split into 70 teams across eight geographic locations.

[00:11:43.256] Now, those teams make about 100 deployments or changes every day in our production environment, and many, many more than that in our lower environments, like staging and QA.

[00:11:54.196] All of the teams use agile methods with deliberately lightweight governance, and they're trusted to make changes themselves whenever and however they see fit.

[00:12:04.656] It only takes a few seconds to push changes through our infrastructure, so getting products and services in front of users happens really fast. But the platform hasn't always been this big or busy. Development began with a single team of engineers nearly a decade ago. So a key part of that story is the Government Digital Service, but again, that's another talk in its own

[00:12:27.264] right.

[00:12:28.724] Pivotal to the success of the platform, aside from GDS, has been a constant focus on a few really important things.

[00:12:36.104] So culture, tooling, and practices, and the goal of making it easy to add teams, build services, and deliver value quickly.

[00:12:47.084] Our teams couldn't have done that without understanding what's involved in getting a digital service in front of a user, and doing so rapidly, reliably, and repeatedly.

[00:12:57.924] We've evolved the platform around these goals so that a cross-functional team can really quickly come together and make use of our common tooling to design, develop, and operate their public-facing service.

[00:13:10.044] I guess a few of you might be wondering what this actually means in practice.

[00:13:13.264] Well, quite simply, a bunch of developers, user researchers, content designers, and product people can form what we call a service team, and then everything you see on the right-hand side of the screen, that service team will get for free when they use our platform.

[00:13:28.584] So we provide somewhere for the code to live, like GitHub.

[00:13:32.344] We provide automated pipelines for the code to get built and deployed into production environments where the teams can get rapid user feedback.

[00:13:41.264] We then supply telemetry tooling to enable a team to monitor its services with automated dashboards and alerting mechanisms so they always know what's going on.

[00:13:51.024] And finally, we provide collaboration tools to help all the teams communicate both internally and between each other, so that they can work effectively both remotely and in person.

[00:14:02.504] A key part is that all of this is available more or less instantly with minimal configuration or manual steps required.

[00:14:10.704] The result, we hope, is that the engineers can focus solely on solving the business problems rather than anything else.

[00:14:19.784] Now, one of the key principles that we think enables us to work in this way and at this scale is the concept of an opinionated platform.

[00:14:28.224] You might have heard this being referred to as a paved road or the golden path, or quite simply as guardrails.

[00:14:35.664] And the key point is, with 2,000 actors making changes, potentially several times a day on our platform, things could get very messy quite quickly.

[00:14:45.374] And our answer to that is to bake some governance into the platform itself.

[00:14:51.024] So the basic rules are, if you build a microservice, it must be written in Scala, and it must use the Play Framework.

[00:14:58.164] If your service needs persistence, it must use Mongo.

[00:15:01.764] And if your user needs to perform a common action like uploading a file, you must use a common platform service to do that when there is one.

[00:15:10.124] The benefit of this is most obviously that if you stick to the rails, you can go really, really fast when delivering your services. But there are further benefits too.

[00:15:20.844] So by limiting the technology used on the platform, the platform is simpler to support, and we can provide common services, reusable components that we know will work with all the services. It also allows people to move between services and indeed the services to move to new teams without worrying about whether our people have the required skills to

[00:15:42.764] do the job. They should all at least know Scala.

[00:15:47.064] And the opinions are designed to prevent waste.

[00:15:50.084] So by mandating common components, the idea is that we prevent all service teams having to spend time rolling their own solutions to problems that we've already solved.

[00:16:00.324] Now, obviously, not every team follows the rules all the time, but in general, we find that most teams see the benefit doing so.

[00:16:08.064] Now, crucially, the need to care about any infrastructure is abstracted away from service teams, so they can focus solely on their apps. They can still observe the infrastructure through tools like Kibana and Grafana, but none of the service teams have access to AWS accounts themselves.

[00:16:26.884] So you're probably wondering about these opinions. They're a bit out there, right?

[00:16:30.444] There's Scala, Play, Mongo. They're hardly ubiquitous elsewhere in the industry.

[00:16:35.524] Our opinions can and do change according to user needs and demand for more features. So when teams start demonstrating a justifiable need for something new, we'll work to provide it in a repeatable way, enabling self-service.

[00:16:50.184] Now, the self-service part is critical.

[00:16:52.524] A service can be created, developed, and deployed on our platform without any direct involvement from platform teams at all.

[00:17:02.184] Ben, over to you.

Ben Conrad

[00:17:04.204] Thank you.

[00:17:06.124] We've talked a bit about platform and why it's good, but delivering during the pandemic wasn't all plain sailing.

[00:17:12.904] The UK economy desperately needed this to work, and we desperately wanted to avoid any users seeing messages like the one at the top of this slide. The first part of our problem was precedent. To have lived through the last global pandemic, you'd have to be 102 years old. And although tax was definitely a thing in 1918, it was very much a paper-based system.

[00:17:35.184] For our annual key business events, we tend to have years of data so we can forecast to hourly granularity how many people we can expect to use any given service at any one time.

[00:17:46.884] But for these new services, we lack data.

[00:17:49.084] We lack traffic profiles, and we lack models.

[00:17:52.064] All we had were ballpark estimates of the eligible population and a hunch that offering people thousands of pounds when they had no other income would be popular.

[00:18:02.204] The estimates produced big, scary numbers.

[00:18:06.024] We could performance test our own services and scale them, but where we had dependencies with third parties who were not able to scale, we had to do a lot of work to make those API calls asynchronous wherever possible. We also quite deliberately prioritized getting money to people who needed it over preventing fraud.

[00:18:24.784] Now, that doesn't mean that we'd make it easy for fraudsters, but it does mean that we were racing to complete the development of our new counter-fraud measures right up until launch.

[00:18:37.144] So how do we tackle the problem of scale? There's a simple answer.

[00:18:41.864] By making everything really big.

[00:18:44.704] The huge advantage of having a platform composed of immutable infrastructure defined as code is that even the parts of the platform that need to be manually scaled can easily be.

[00:18:57.924] COVID-19 teams were working hand in glove with our platform teams.

[00:19:01.724] Between us, we created worst case traffic profiles, which were based on overall eligibility for the schemes, combined with observed behavior, like traffic spikes from previous business events and even spikes we'd seen after recent TV announcements. Although we tried to ensure everything is self-service, the platform needed to be responsive to these new requirements.

[00:19:23.584] To take a few of those, the load testing required was at a level that a single instance of Gatling was not able to provide, so our build and deploy team added features to enable load testing in parallel.

[00:19:37.444] This increased load then broke our logging pipeline. So telemetry then needed to be scaled up in both staging and production to handle the increase in logs being generated.

[00:19:52.084] MDTP itself is relatively new and shiny, certainly in UK government terms. It was chosen to deliver the COVID-19 services because it offered the best chance of meeting the ambitious development deadlines and then scaling to support the tsunami of expected traffic.

[00:20:09.864] However, as much as we believe in the power of hyperscale cloud providers, wider HMRC is much more conservative about where and how it holds citizen data, and that means that information about people and their taxes doesn't live for long on MDTP.

[00:20:27.123] HMRC does hold a great deal of confidential information about every company and taxpayer in the UK. We protect data on MDTP by ensuring that data can only be accessed by authorized microservices. But most of the systems of record exist downstream of MDTP in the HMRC corporate tier.

[00:20:48.644] It's mostly stored on a mixture of mainframes and old physical hardware, which is impossible to scale to cope with the level of traffic that we expected.

[00:20:58.244] So before a line of code was written, we'd realized we needed to remove any synchronous reliance on these data stores and host everything on MDTP.

[00:21:08.224] We achieved that in collaboration with the service teams.

[00:21:10.824] The user journeys were cleverly designed so that we could gather some information ahead of launch and avoid unnecessary load at peak times from ineligible customers.

[00:21:21.064] We then migrated the core eligibility and financial claim data from the legacy data centers into temporary stores on MDTP using a combination of Amazon S3, MongoDB, and some fairly crude data transfer methods, manually copying things up to S3 from a secure laptop.

[00:21:42.744] Despite all of this, we had some nervous moments when things could have gone quite badly wrong.

[00:21:48.424] There were a lot of late nights, there were early mornings, and there were a lot of very tired developers.

[00:21:55.304] With one of the services, there was an enormous peak of traffic at midday that HMRC were entirely responsible for generating ourselves.

[00:22:04.124] MDTP was actually able to handle this traffic, but the third-party systems on which it relies were not.

[00:22:10.564] These dependencies would constrain parts of the COVID-19 services to something like 30% of what the platform itself could handle.

[00:22:18.384] We moved these to be asynchronous calls where possible, but logging in had to be part of the user journey.

[00:22:25.824] The reason for the peak is that HMRC had decided to stagger the traffic by notifying people of a specific day on which they could claim.

[00:22:35.564] This was, in itself, entirely sensible.

[00:22:37.944] However, splitting that traffic in two by giving people a time from which they could claim, either 8:00 a.m.

[00:22:44.244] or 12:00 p.m., was entirely self-defeating.

[00:22:47.944] It turns out that 8:00 a.m. is too early for a lot of people to log on to an HMRC website, even if it's to claim money.

[00:22:54.184] But midday is ideal, and so hundreds of thousands of people set reminders on their phones and tried to log in all at the same time.

[00:23:03.584] Government Gateway is what we use for logging in, and we knew it could cope with around 200 logins per second before it started to creak.

[00:23:11.324] We anticipated many more than 200 logins per second, so we needed a break glass and decided on using Akamai Visitor Prioritization. This is a fairly crude manual tool that offers the ability to throttle traffic by holding users in a waiting room and allowing a certain percentage through every 30 seconds.

[00:23:33.004] The peak of our peakiest peak saw well over 1,000 login requests per second, and a swarm of around 50 engineers from across both platforms and the service teams monitored the event live and worked together to manage that traffic, trying to estimate the percentage to allow through, although we had very limited visibility of the numbers attempting to log in.

[00:23:56.564] And despite initially having close to 100,000 users in that waiting room, we were able to let them through into the service relatively quickly.

[00:24:03.824] So much so that we didn't receive a single complaint.

[00:24:08.184] And Matt.

Matt Hyatt

[00:24:13.204] We mentioned earlier that our priority was getting money into the hands of people who really needed it. But HMRC was also acutely aware that this would be the largest giveaway of public money in living memory, and therefore an irresistible target for fraud.

[00:24:28.932] Now, whilst the service teams were flat out building new user journeys, there were other teams across HMRC that were busy beefing up or building entirely new anti-fraud measures.

[00:24:39.002] These varied from plugging into third-party systems that check for dodgy bank accounts to extending HMRC's entire internal fraud risking system, which is a massive task just on its own.

[00:24:51.652] Now, with just four days left before the public launch, the integration between the first COVID-19 service and the new fraud risking system, which is hosted on an entirely different platform, still hadn't been built.

[00:25:05.632] What's more, the service team didn't have the capacity to finish it.

[00:25:09.732] So instead, one of our platform teams picked up the baton and found a way to get the data flowing between these two new services and platforms.

[00:25:18.732] Now, frankly, this involved breaking most of the rules which we normally insist on. There were two microservices sharing a single database, and we had to hack our way through our own database authentication to get it to work.

[00:25:32.072] However, the result was that the fraud risking capability went live together with the service, which I can tell you personally seemed super unlikely at 4:30 AM on the morning of the launch.

[00:25:44.492] And how much fraud did it catch?

[00:25:46.972] Well, the truth is, we just don't know yet.

[00:25:50.232] Officially, HMRC has assumed that between 5% and 10% of all claims will either be fraudulent or incorrect.

[00:25:58.872] But what we've seen so far is only about 70,000 cases that have been marked for investigation. And proving the fraud takes time, so it will be quite a while before we understand how much fraud we actually stopped.

[00:26:12.112] Now, stories like this last-minute integration were abundant.

[00:26:16.152] Everywhere you looked, there was innovation, improvisation, and a Herculean people effort. And it didn't stop with just the initial service launch. These services were meant to be temporary, but they still continue to be developed today as the teams iterate and add the features which they didn't have time to release last year.

[00:26:36.352] And now there are even more services being built as the government continues to announce further initiatives to help the public, and they trust HMRC to be able to get them out on time.

[00:26:46.192] In fact, the number of services in our production environment has grown by 30% in just the last year.

[00:26:52.052] So in many ways, COVID-19 has forced us to redraw what we thought was possible. It's kind of become the new normal.

[00:27:00.232] But we've discovered that that's not entirely a good thing.

[00:27:04.732] So we've learnt that, number one, it's not good to integrate a country-scale fraud risking system in four days.

[00:27:13.732] Secondly, you need to be careful with unhelpful precedents.

[00:27:17.832] It's not advisable to compromise your fundamental design standards to get a product shipped. And it's almost always best to avoid doing anything manually, particularly large-scale dumps of citizen data.

[00:27:30.892] Finally, it's never good to ask an engineer to work more days in April than there are days in April.

[00:27:37.552] We have to remember that what was achieved was done so under incredible pressure by admittedly willing and determined people, but they had the knowledge that they were helping their families and friends and neighbors who were quite literally just trying to survive.

[00:27:55.872] Now, it's very true that there wasn't a lot going on socially at the time, and anyone with children would have killed for an excuse to get out of homeschooling, but that doesn't make it okay.

[00:28:06.632] At points, we had to tell engineers to stop working and to go to sleep.

[00:28:11.712] It was clear that working as our teams did in 2020 just isn't sustainable. So this is a message that we need to keep reinforcing now with our leadership community and with other government departments who are understandably interested in learning how we got things done so quickly in an effort to recreate that.

[00:28:31.312] So COVID is strange for our teams in a number of ways.

[00:28:36.432] The legacy, alongside the pride and the sense of achievement at doing an incredible job, is a kind of collective burnout and a fatigue with being 100% remote.

[00:28:47.472] So much of what we're doing at the moment is focusing on how we make our ways of working sustainable in whatever turns out to be the new normal, with a commitment to maintaining the flexibility that our people want, but trying to revive the camaraderie and the human contact that we took for granted when we were together every day.

[00:29:08.692] Ben.

Ben Conrad

[00:29:11.592] We shouldn't leave you with the impression that we were only working on the COVID-19 response in the last 12 months.

[00:29:19.172] In that time, we have actually migrated the Tax Platform away from our homegrown legacy deployment tooling to run on AWS Elastic Container Service.

[00:29:29.192] This project was delivered by utilizing a tiger team made up of engineers from the different platform teams, drawing on that experience available to us while also ensuring that other work continued.

[00:29:41.152] The result is a more modern platform, which is simpler to operate and will scale elastically in response to demand.

[00:29:49.092] I'm proud to say we even won an industry award for it, and even more proud that we completed it with zero downtime as we hosted hundreds of important services.

[00:30:00.832] We believe that we have this amazing platform, and we have shown that it can enable services to be built and deployed at breakneck speed and scale to cope with more traffic than a government website should ever expect to see.

[00:30:15.752] We believe we are secure, and we aim to build on the reputation we have grown over the last decade.

[00:30:22.032] Digital sections of departments were set up years ago, and although they've succeeded in transforming the experience of using government digital services, they've mostly operated in a silo and failed to deliver the revolution in how technology is delivered across government. I hope that can change.

[00:30:43.532] As we look to possibly host additional services, we'll always need to keep the opinions we hold under review.

[00:30:49.872] There's a balance between the consistency that enables rapid delivery and stifling innovation by restricting people in the technology that they would most like to use.

[00:31:00.852] And one day soon, we might even be able to meet up again in person. And that's that.

[00:31:07.012] Our story of building some digital stuff to save an economy during a pandemic. If you have any experience with achieving large-scale digital transformations in huge monolithic organizations, overcoming the vested interests that come from monopoly suppliers, we would love to compare notes with you.

[00:31:28.772] Thank you so much for listening. It really is a privilege to have shared this with you today. Hopefully, none of us will ever need to build anything to fight a pandemic again, but if you're interested in finding out more about any of the things we've touched on, we'll happily answer questions and share some links to things we've published. Thank you.

Ben Conrad and Matt Hyatt

[00:31:48.352] Thanks, folks.

[00:31:49.412] Thank you.