ITV's Common Platform v2 Better, Faster, Cheaper, Happier

Log in to watch

London 2019

ITV's Common Platform v2 Better, Faster, Cheaper, Happier

In 2015 ITV thought they'd built the last hosting platform they'd ever need. In 2019 they're replacing half of it. In this frank retrospective ITV share what they've learned from v1 and what steps they're taking so that v2 really will be last last hosting platform they'll ever need.

Tom has 18 years working with technology across a number of sectors - from manufacturing with Jaguar Cars to media with the BBC, Global Radio and most recently ITV. During this time he's held senior positions in several disciplines, including engineering, development and architecture.

He is currently responsible for ITV's development and hosting platform as well as the twenty-strong team of engineers behind it.

Tom Clark, Head of Common Platform, ITV

Chapters

Full transcript

The complete talk, organized by section.

Host Intro (Gene Kim)

All right. The first speaker of the day is Tom Clark. He is the head of Common Platform at ITV. He has spoken every year here at DevOps Enterprise Summit London. He has been on a tireless mission to help elevate developer productivity for one of the most recognized brands in the UK that touches approximately 40 million people per day. And so when I told my boss and my wife, Marguerite, about Tom's work, she said, "Oh, my gosh, it's the 'Downton Abbey' people." And I'll admit, we binge-watched all seasons on the ITV platform. So this morning, Tom will share the continuing evolution of the Common Platform at ITV and how it's enabled developers to get the work they need done, and how it's played a key role in enabling their application modernization program, which touches some of the most core business processes in the ITV enterprise. And it also is enabling BritBox, the joint venture with the

BBC that combines the best of British content in one place, digitally delivered on almost every platform. So with that, come on out, Tom.

Tom Clark

Thank you to Gene for that wonderful intro. As Gene said, this is my fourth year speaking at the DevOps Enterprise Summit, and I'm honored to be on the stage in front of you now doing this keynote. So I'm Tom Clark, and I've been working in technology now for almost 20 years, straight out of school at 17 into my first job as a Windows sysadmin for Jaguar Cars. Now, one of those things I like to think is quite cool, one of them less so. I'll let you decide which is which. I soon got bored of looking at beautiful cars and less beautiful Windows NT4 logon screens, and I became a Unix sys admin for the BBC. At the BBC, I discovered Perl, which took me into the next phase of my career, where I was a Perl developer for various companies, including three startup companies. But of course, being startup companies, they failed. That led me into Linux system administration, which continued the next phase of

my career before I became a Linux sysadmin at ITV back in 2011. You can see my Twitter handle on the screen there. Please feel free to tweet me. It makes my mother very, very proud. So a little bit about ITV. So we are the UK's largest commercial producer-broadcaster, and effectively, that means we make stuff, and we distribute it as well. Originally founded in 1955, we produce around 10,000 hours of content every year. And of course, being a commercial broadcaster, it has to span the entire spectrum, from high-quality scripted dramas like "Downton Abbey," Gene, to the other end of the spectrum. But that makes us about 3 billion pounds in revenue combined with all the airtime sales we have. But we do a lot with very little. We've got about 6,000 staff, a load of freelancers and contractors on top, and about 300 or so in technology. So where do I sit in the organization? So I'm the head of Common Platform.

Sounds a lot more fancy than it actually is. I report to the director of infrastructure, who reports to the director of group technology, who reports to the CTO, who's on the board, and then to the CEO. And you can see it's quite a flat management structure, which generally means if you have a good idea or the right incriminating photographs, you can get a lot done.

So what is the Common Platform? Before I go into V2, what is it in the first place? So I'm going to take you back to 2015. Some would say a happier time. And we were going through a large modernization program because a load of the things that made ITV, ITV hadn't been changed for years. They were stuck. They were calcified, and these were the lifeblood of our organization. So airtime sales. Airtime is what we call the adverts that fit in the schedule. That's 1.5 billion pounds of revenue going through that system, but it was impossible to change and dangerous to change, and it broke. Not good. Content sales. Again, that's the other half of our revenue. That's 1.5 billion pounds, but it was really difficult to add new partners onto the system because every change took at least eight weeks if you were lucky.

And then finally, talent payments. The system we use to pay our on-screen talent and celebrities. That was running on a virtualized ICL mainframe, and we had to virtualize it when we couldn't find spare parts on eBay anymore. And it was running COBOL. A hands up, anyone here can program COBOL? Brilliant. I'm going to be in touch with you afterwards because our problem was our developers were literally dying out. You want to talk about business risk? That's it. So the modernization program. A load of stuff on the left. I like to call them swear words. Stuff on the right is where we wanted to get to. More swear words, more good things as well. So a load of transition. I'm sure many of you who've been talking about transformation, been hearing about transformation, will recognize those on the screen there. But this is not that story. We've already done this. I talked about that at a DevOps Enterprise Summit when I first met Gene back in

2016. This is a story about what happened next when you get onto the right side of that. But back to a bit more explanation about Common Platform. So a load of change was required at ITV, and we thought that actually, if we made all these brilliant new product teams, and they were forming and storming and doing all that kind of stuff, that they'd have lots of decisions to make. We worried it would be the Wild West. We worried that they'd be going out reinventing the wheel unnecessarily and reinventing boring bits of plumbing that they shouldn't really be having to think about. So logging and alerting and monitoring and CI and CD and all that kind of jazz. That's all the gray stuff they've all had to reproduce. And you can see they've had a go at inventing some wheels. So the blue team there, three sides. Not so great. The yellow team there, they've done twice as good. Six sides, but still not the best shape for a wheel.

And obviously, all those teams would have to deal with other teams. And because everything they were coming up with was unique and artisanal and bespoke, it would work like this. So we've got our audit teams, our cyber teams, all the people they interact with. Team one, fine. Okay, they have these point-to-point relationships. Very unique. Okay, that's not so bad. We can handle that. Team two, team three, team four, up to team— suddenly, all these point-to-point relationships are adding massive overhead to your organization. Not good. How do we fix that? Well, enter Common Platform. The idea being that we'd actually take care of the plumbing, so the team shouldn't have to. So the infrastructure, the logging, the metrics, the audit compliance, the security compliance, do the work once and do it brilliantly. So here's Common Platform. Here she is. Doing it once, doing it brilliantly, sharing that perfectly round wheel with all

the other teams. And you can see there that actually all that gray stuff has disappeared. They have more time to be delivering business value for our organization. The other teams I mentioned before, this lovely cobweb diagram, it turns into this. There's always that team on the other side who doesn't want to play ball. You have to accommodate that, but that's fine. But you can see the overhead on the other teams has massively reduced. And so obviously to run this, we needed some people, a very special set of people. And we looked for two really important qualities in the people on that team. So number one, we say they've got to be smart. So technology moves really, really quickly, and you've got to be pretty smart to keep up with it. Okay? A few nods in the audience. Good, I'm glad you agree. If you didn't agree, I'd be-- okay, anyway. You've also got to be kind. Okay? The ability to fit into the team.

Essentially, don't be a difficult person. I was told not to swear on stage. A difficult person

to work with. We want high IQ and high EQ on the teams. And platform engineers, they sit at the intersection of those two qualities. But I'll come back to that kind point in a little bit. What do we do with these lovely, smart, and kind platform engineers we've got on the team? Well, the first set are our core platform engineers. These are the people that build the toolkit that is the Common Platform. And so they have a number of different responsibilities. So they curate the platform, they develop it. So we were using Puppet and Terraform. They build the standard patterns that other engineers use, because there's a million and one ways to configure them, and these are the team that work on those guardrails. Okay? Makes sense. Incubating new hires. New people join the team, and for the first week, they spend it with the core team. They spend it learning the ropes, asking those silly questions, and getting set up.

That's fine. Okay, good. Second opinion as a service. So you've all heard of rubber duck debugging. These are your professional rubber ducks. You go and ask them a question, you can bounce ideas off them, and obviously, they are the experts in the Common Platform. So, if you want to bounce ideas, they are the engineer's engineer. Flex. So if you go to the gym, they'll help you spot. No, it was basically because the Common Platform is so common, this team can be parachuted in at a moment's notice to save the day. So if someone's ill or if someone's going on holiday for an extended period, they can come in and help.

And then finally, research and development, making sure the platform is evergreen, looking at the new technology that's coming down the line, and making sure we're integrating it and incorporating it into the platform. Great. So that's the core team. We also have our field platform engineers who are embedded in these lovely product development teams we introduced.

So I'm sure you're familiar with the product-based development model, but the idea is they operate as like a mini startup inside your company. So you have a product owner, obviously wearing a top hat, delivery manager probably waving a flag, a number of developers angelically wearing their halos, they're probably Scala developers, and then some platform engineers from my team as well, and they operate as a mini unit. The classic build it, run it end-to-end, they own it, and an instance of the platform too. And those engineers, similar skill sets, but they have three different responsibilities. So an operations responsibility, and it's really to coach the teams because a lot of our developers hadn't seen production before, and they're a little bit afraid of it. And so the idea was they would be the first responders if something went wrong, but they wouldn't be the only responders. They were there to take the developer to, "Look, your code's done this.

Let's make sure it doesn't do that again." Okay, fine. So they would be embedded in the team, helping with operations. Force multiplication. They are there to make the team more effective and efficient. So they see a test pipeline that's running sequentially, and it could run in parallel and that increase the productivity of the team. They're there to help with that. And then finally, quality influence. The idea being that all the way on day zero, all the way to the left, we're whispering in the ears of the developers as they're working on these services. The operational, non-functional inputs, the bits that always used to get left out at the end. So we're saying things like, "Think about exponential backoff," and like, "Maybe logging and monitoring could be good," and making sure those are put in before you even put your keys on the keyboard. Okay, so all good. And so that worked. That model was perfect. Taking it through to today, everything was

rosy. And that's the end of the talk. But it's not. Because what happened next? Well, problem number one, lonely engineers. So I mentioned those field platform engineers embedded in those product development teams. 95% of their time, they were not spending with fellow platform engineers, and that can be quite lonely. There was less osmosis. We had Slack and various ways of meeting up as a community, but it doesn't quite replace working with your fellow platform engineers every day. And this was especially true in our remote sites in Leeds and Manchester, when some of our engineers could literally be the only platform engineer in the building. Okay, so not great.

We also had a tech problem, because we really believed in the Unix philosophy of a tool should do one thing well. Okay? It kind of makes sense. And so we selected a load of best-of-breed, open-source components and wanted to integrate them. So at the time, these were great components we wanted to work. So we put them together with an ITV glue. But it was always intended to be self-serve. Developers and the people on the product teams were always meant to go into the Puppet and the Hiera and the YAML and make changes themselves. But it was a bespoke platform. It did sit there because there was a steep learning curve. Some, I think unfairly, refer to it as a learning wall. But this is where kind can go wrong.Because people got stuck, developers got stuck. They couldn't figure out how to make a secret change in YAML or something like that. And so we're helpful. So we helped, and we helped again, and we helped again.

And an overplayed strength can become a weakness because we were too kind. We were too helpful, so we helped again and again. And over time it changed from, "I can do that" to, "The platform engineer does that." We became the blocker. We became the only way to actually make changes onto the platform, and that wasn't good. And we realized we'd created platform engineer as a service rather than platform as a service. And full credit to Tim on my team for this. We were doing too much ops work, too much turn it off and on again. We didn't have much time left for force multiplication or the quality influence of it, the really important bit that are rewarding for us and high value for the organization. So PE as a service.

CI/CD cycle times. This was a problem. It was too long. 15 minutes to get a change from commit into production per environment. That was too long in this new world.

And again, like I said at the beginning, it used to be eight weeks. 15 minutes was now too slow. We realized we'd built a great hosting platform. It was incredibly reliable. It hardly ever broke. Four nines availability. There were a handful of failed deployments, but we'd forgotten that most of the change actually happened at the input side, at the development side, not the production side. So we had poor developer experience, not ideal.

Now meanwhile, back on the core team, here they are. Back in 2015 when I first started, we had six teams, and again, the core weren't responsible, but they were consulted for those second opinion as a service, the incubation, all those kind of things. And there was no one on it at the time, so that was fine. The year after, the platform was quite popular, so suddenly we more than doubled the number of product teams that were existing at ITV.

And again, all those consultation bits happening, but we added someone to the core. We added two more teams in 2017, and we added another member to the core. Again, another three teams in 2018. Through to today, we have 20 product teams at ITV and four people sitting in the core asking those questions. And so going back to these

responsibilities of those core engineers, these were all the musts. And due to time constraints, these are the things we had to focus on. And what got left behind? It was research and development, because that was a should. There'd always be more time to it later on. We could push it a little bit further down the line.

So I'm sure you're all familiar with this adoption life cycle curve. ITV like to consider themselves to be fast followers of technology, and you can see there we are, the early adopter section. But an important thing to note about this is that it takes constant effort to stay still. Constant effort to stay still. The cost of agility is constant iteration and integration of new technology. It's constant gardening. It's constant weeding. It's constant paying off tech debt just to stay still. And if you stop, if you take your foot off the gas, you slip backwards, and we did. So we were running yesterday's technology. We were running a well-maintained classic car of an estate. But the problem was everyone else was driving Teslas. We still thought Sensu was pretty cool, and everyone else was using Datadog or something. So a really important lesson we learned, and I try and click this, and hopefully it'll keep up, is do

something when you can afford to, not when you can't afford not to. Essentially, do something when you can afford to, not when you can't afford not to. What that means is that shoulds and coulds are until they're not, when suddenly they become musts, and then you must react, and then it's almost too late. So do something when you can afford to, not when you can't afford not to. And we realized actually the Common Platform was the platform we needed, but not the platform we wanted, because ultimately the platform we wanted was no platform at all. So problem recap. Lonely platform engineers, platform engineer as a service, not platform as a service, poor developer experience, and yesterday's technology.

Red alert. We'd been hearing, not listening. We'd been too busy being busy. The signal had been lost in the noise. The platform engineers weren't happy, the developers weren't happy. And so in January of this year, we staged an intervention and said, "We need to talk about Common Platform." Because the first step is admitting you have a problem. It was a big mea culpa to my community saying, "We don't like it either. Contrary to popular belief, we don't like turning it off and on again. We don't like running stuff. We'd rather not." So we went out into our communities with a roadshow. We visited our development sites in Leeds, in London, Manchester, and effectively it was like, "I'm a platform engineer, get me out of here," which is new on ITV in the summer. And we operated that with three main things. We had a classic retrospective, start, stop, continue. We did a net promoter score. Would you recommend this platform to a friend?

A weird question to ask your friends, I know, but obviously we're all geeks.

And then feature requests. What would you like to see in this platform? And so out of the retrospective, we got more transparency and visibility, please, because they knew we were doing good stuff, and we were, but sometimes they weren't quite sure exactly what it was we were doing. So could we talk a little bit more about what we're doing? Okay, fine. Noted. More empowerment and self-service, please, which was ironic because that had always been the intention, but it obviously wasn't cutting it. Net promoter score came out as a 3.1, which basically means you would neither recommend it or not recommend it, which I was actually a bit surprised by. There was one person who rated it a five, which is, "I would strongly recommend this to a friend." I never found out actually where he was previously to actually make ours so good compared to that.Feature request, faster deployments. People were saying 15 minutes per environment wasn't really

cutting it anymore. Okay, point taken.

And then auto-scaling. So at that time, we were deploying onto instances, EC2 instances. We had an AMI and Puppet ran on the top. It would take 15 minutes for a new thing to come up. So we couldn't really auto-scale. We had to pre-warm to handle spikes. And which team at ITV really would care about a large traffic spike?

So earlier I said we need to talk about Common Platform.

Now we need to talk about "Love Island".

So quick show of hands in the audience, who here has heard of "Love Island"? Okay, and who here are fans of "Love Island"? That's... Okay, you're lying.

I'm now going to reveal its secret formula, and the producers really did ask me not to do this, but I don't really do authority.

The secret formula for "Love Island" is effectively hot people minus their clothes. Multiplied by drama, divided by voting from the great British public, equals eyeballs, people watching it, which obviously equals profit. So there you go. You can go and make your own "Love Island" now if you wish.

And it is quite popular. So to give you an example, in the last 12 months, the two most popular programs on our video on-demand platform have been the two England World Cup games. You can guess what number three was, the premiere of "Love Island" a few weeks ago. These are the audience figures. When we first launched it back in 2015, half a million, and it's basically give or take-ish doubled since then. It was a massive, viral breakout hit, and obviously 2019 is going to be even bigger.

And that means that quite regularly, we have the best part of a million people watching it on simulcast, and simulcast is basically IP streaming of what's going out on air. And that generates a lot of traffic.

So this is a traffic graph for one of our services, and you have to remember that the ITV Hub, our video on-demand platform, has tens of millions of monthly active users. So it's a really popular service anyway. Can you spot when "Love Island" started? Yeah, so it was there. It's really popular with the 16 to 24-year-old demographic. Do we have any 16 to 24-year-olds in the audience? Good. I'm going to be looking at you for confirmation. So do you love your phone? There's a nod there as well. And would you say you have a short attention span, and are you sometimes a little bit impatient? Yeah. Okay, good.

So to demonstrate that, with the World Cup, people join half an hour before, maybe even an hour before, to watch the preamble. This is what happens with "Love Island." This is the anatomy of a night. So they join, that's 10 minutes before it starts, and that's the baseline. Come and find me later if you want to know the actual numbers. This is five minutes before. This is four minutes before, three minutes before, two minutes before, one minute before, and then obviously people joining at the very second. 20 times the load happened in 10 minutes. So quite impatient, from the nods over there.

So back to the auto-scaling point. This was the new normal for us at ITV. We wanted "Love Island" to be business as usual. We didn't want to have to do any pre-warming or pre-planning. We wanted to be able to handle this load whenever it happened. And so our partners in online, they set an OKR, an objective and a key result. They wanted one infinitely auto-scalable service running in production by the end of March. And obviously this was back in January we were having this conversation. And so I aligned the Common Platform V2 MVP to that OKR. Lots of acronyms.

Dave Smith, who's the principal developer for online, he offered some assistance. Says, "Tom, what do you need?" And I said, "Give me some of your developers." And so he did. So he gave me Yuri and Luke. Here they are. They look a bit different in real life, but bear with me. I added a couple of platform engineers from my team, Anastasios and Tom. John, my scrum master, joined as well. We pumped them full of coffee, and then we tempted them with what I like to call baked motivation, which you might refer to as cake. And so that was our MVP development team for CPV2. But we wanted to be transparent, so we set up a public roadmap and backlog, which ironically had been there the entire time, we'd just forgotten to tell everyone about it. We created a Slack channel just for V2 so everyone could join. And we created a contributors group for basically to get representatives from the various stakeholders of V2, because we were too cool for

working groups and steering groups.

And one of the things that contributors group did was set a vision, because you need a vision. Provide a brilliant hosting and development platform. And development platform. That was the bit we'd missed out the first time.

But there wasn't much time. We had three months to do this. So we had to focus on evolution over revolution. We had to upgrade, not re-platform. We had to be value-driven and maximize the return on investment. We had to consider the minimum viable change. What's the smallest amount of change we could do for the maximum benefit? And we knew we were changing lots of things... Well, we knew we were going to make a lot of change, so if you follow the minimum viable change, there's going to be loads of known unknowns and unknown unknowns coming out of it, and the minimum viable change helps you limit the risk.

So we decided the biggest win was changing our runtime and scheduler. Rather than deploying stuff onto EC2 instances, we should switch to containers, because they were quite cool at that point, and we could bake the config in, less stuff for us to maintain, and it was easier for our developers to self-serve.

And so we run on AWS, so the choice is really between these two things, Fargate or EKS, their Elastic Kubernetes Service. So we came up with a weighted scorecard, as you have to do, defined a number of properties we would assess them on, like capability, usability, operability, sexiness, very important when selecting software. And EKS, just by a nose, came out in front. Great.

But again, we wanted to nail it before we scaled it. So for the MVP, we would work with one product team. For the alpha phase, which came next, we'd maybe work with two. Beta, it would be invite only, bring more people on. And when we went gold, we would actually open it up to absolutely everyone.

Great. But we also came up with another one of our lessons learned, another one of our rules. Optimize for the common case.

Optimize for the common case. Don't accommodate every single edge and corner case that possibly comes up. Aim for convention over configuration, because every time you add a new configuration parameter into a file, that incurs cognitive load on anyone who ever has to look at that config file ever again. It's a lot of responsibilities. And so thinking about this, we came up with some personas for our developers, because we spoke to them, and actually most of them didn't really care, amazingly, about Kubernetes. They didn't really care how the sausages were made. They just wanted sausages. They cared about the outcomes, not the implementation. And so we thought, well, if that's the common case, we should optimize for them. So we came up with easy mode. We reckoned there was an 80/20 split between our developers who just wanted to get stuff done. They wanted containers or services running in production quickly, reliably,

securely, and when they got to production, they wanted them to stay running quickly, reliably, and securely.

And then, of course, there was hard mode available for the remaining 20% who did care about Helm charts and did care about YAML and more YAML and a lot more YAML. Speaking about YAML, this is basically how you generally try to configure Kubernetes, is a bit of a... Yeah. Anyway. So we were thinking about the developer experience. How would we optimize this for the common case?

And I spoke to Anastasios on my team, who's Greek, and I said, "What's the Greek word for simple?" And he said, "Aplo." And so aplo was born. Aplo is our easy mode, simple mode tooling to make our developers' lives easier. And so it goes from this YAML, YAML, YAML, to this. So effectively, that will do all that YAML for you in the background. You pass in a service definition file, some environment variables, and a tag, and it does it all for you. And you can look under the hood if you want to, but you don't have to.

And again, going back to that point about the minimum viable change, this is what a service deployment looks like on V1 of the Common Platform. DNS pointing to a load balancer, pointing to instances on an ASG. All very boring, and we like boring, actually. The principle of least astonishment. That's another talk. V2, not much changed. DNS pointing to a load balancer, but now pointing to a service running on Kubernetes. Again, the minimum viable change.

That takes us through to today.

Literally today, right now. Because when I submitted this talk to Gene and the IT rev team in February, we were in the middle of the MVP. We didn't actually know whether or not we'll have made it. So I said, "Detailed progress to date. Assume MVP live in March." Assume MVP live in March. Hmm. So the MVP. Do you think we made it? Well, we should ask Tom, one of my platform engineers who was working on it. Tom, did we make the MVP at the end of the March? Yeah. Yeah, of course we did. We absolutely did. Tom was very happy. And actually, it's very important to take photos of important stages of the work you do. So everyone got fantastic mission patches to stick to their laptops. But in three months from a standing start, we got a service running in production. Brilliant. And I'd promised the team baked motivation, and I delivered. And I managed to get that mission patch put onto a cake. But I can tell you, custom cakes cost a lot.

I think it cost more to make the cake than the development of the platform.

Here's the development team. We have Luke, Tom, Anastasios, and Yuri there. They look really happy. I like to think it's because of the MVP they've just successfully delivered. I think it's probably because of the delicious cake they're about to eat. So some of the benefits we realized as well. Developers can now self-serve more actions. It's more BAU for them because we don't need to deal with Puppet so much anymore. The cycle time is now 10 times faster, 15 minutes from down to one minute 30 seconds. There are fewer failed deploys. We can lean on the Kubernetes' orchestration, easy for me to say, rather than our own homegrown magic. It's 30% more efficient to run, and I don't say cheaper because I'm worried that people will think the bill will go down, but I think people will be using it more, so I'm saying it's more efficient. And then auto-scaling. We can finally auto-scale on this platform in milliseconds.

And so the greatest praise, I guess, came from Dave Smith, our principal developer, who basically sent out this email to his team instructing them to use V2. Get everything onto it ready for the summer. And I'll highlight this passage here. "The performance is better, it's cheaper to run, the config is nicer, the deployment times are delightful, and the scaling is sublime." So thank you, Dave. The check's in the post for that.

But his team did. So we went from March to April to May to June, and we moved as many services as we could in online over ready for "Love Island,"

which takes us through to the "Love Island" premiere night.

This was it. The big day we've been waiting for. Monday, the 3rd of June, 2019, 8:01.

Slack. Danno, principal developer and test. "Was anything changed around 15:30 today? User Auth has been returning a steady stream of 500 errors."

So we dug into what had happened, and actually a change had been made which actually reclassified a 400 series error as a 500 series error. So actually, it was a bit of a false positive. There was no real user impact, but it was causing unwanted noise on graphs that we wanted to be silent heading into the biggest event of the year. So in Slack, we discussed it. We knew it would literally be a one-line change. And so the duty manager said, "I'm happy. Do you think we should roll forward? Do you think we should put this change into production minutes before the biggest event of the year?" And we all did a thumbs up in Slack, thumbs up emoji, and we pressed the button. And that was it. Within about half an hour or so from actually spotting the issue, making the change, getting it out into production, and not a single blip. So it was a trial by fire for the platform, but really it's an example of the

engineering maturity we have in our teams now, when that was the obvious thing to do. It wasn't scary. There was no change freeze regardless of it. And so auto-scaling, request per one second there, it did it. Very boring now. And actually in the Slack channel when all the stats come through, people aren't watching anymore because it has become kind of BAU.

So the next phase for us actually is to enter beta. We're going to bring in more people than just online. Other areas of the business are going to come on board, which is great.

And then, well, it's like, what's next? So it could be workflow. So is it time to take Jenkins out back and look at tools like GitLab or CircleCI or maybe Jenkins X, its kind of successor? Maybe. Or is it observability? Is it getting rid of Sensu and replacing it with this Datadog thing everyone keeps going on about? But we don't know yet, because it's going to be driven by the contributors group. We'll be asking our community, "What is the biggest pain point for you?" and optimizing for that. So again, the recap of some of the problems we spoke for. Lonely platform engineers. It's not fixed yet. But by taking out all this kind of custom stuff, we're massively reducing the operations overhead. And we think in time our platform engineers can move from being dedicated to a team to being designated and maybe take a step out of the product teams and become just a core platform engineering team.

And I mentioned this to a team, they said, "Won't you become a silo? Won't we actually not speak to you anymore?" And I said, "Actually, I think you'll speak to us more because you'll want to speak to us instead of having to speak to us, and you'll be speaking to us about really high-value stuff like quality influence and force multiplication rather than the operations stuff like you do today, and it'll be more rewarding for everyone." Again, the platform engineering as a service bit gets covered by that. And the developer experience has improved massively as well. The developers can now do more things themselves without being blocked. Yesterday's technology, obviously gone because now we're moving into Kubernetes and we've got a whole load of runway ahead of us.

So I'm going to leave you with a few final thoughts.

The first one, optimize for the common case. Try to focus on convention over configuration. Simplify everything down.

Then focus on the minimum viable change. If you're making something risky, if you're stepping into the unknown, don't change everything at once. Try to find the single thing you change first and go from there. And then you're never too late to do the right thing. Staging that intervention, going with my development community and saying, "I've screwed up," that was a really powerful step, and they saw that actually we wanted to change stuff. And then finally, do something when you can afford to, not when you can't afford not to. Because the shoulds and the coulds are often the delighters for your team and for your customers, and ignore them at your peril. Go. Thank you.

Help you're looking for? Yep. The help you're looking for.

So one more thing, actually. Gene always asks us to finish with saying what help we're looking for. And so if you're on a similar stage in your journey, I'd love to compare notes with you. So, we've built a fantastic platform. It's used internally. It's actually quite loved by some of our developers now, and it's actually something, it's running a service that many of you have probably used day to day. So I'm going to be around. I'm going to be in the speakers' corner later on. I'm going to be around for the next couple days, and I'll be in Slack, so please come and find me and have a chat. So thank you again.