ITV's Common Platform

Log in to watch

London 2016

ITV's Common Platform

An introduction to the people, process and technology behind the cloud platform that underpins all of ITV's key applications - from the system that pays Ant & Dec to the ITV Hub. Touches on hiring, building a culture, devops at scale, $everything as code, and more.

Chapters

Full transcript

The complete talk, organized by section.

Tom Clark

Thank you, everyone. I just want to apologize in advance. Unlike Jason Cox from Disney, I don't have a preview of tonight's ITV content. So if you want to watch, you're just going to have to tune in, I'm afraid. Sorry.

Okay. I'm Tom Clark, Head of Common Platform at ITV. A little bit about me. I've been in the industry for about 15 years now, working as a contract sysadmin, architect, and developer. A Perl developer. Admittedly, I call myself a recovering Perl developer, and it's been about two weeks since my last one-liner.

I've worked across many orgs, large and small: Jaguar Cars, Siemens, the BBC, ITV, Global Radio, two of my own startups that failed, and one when I first came to London that created WhatsApp back in 2007. I only wish they'd waited a little longer.

A couple of years ago, I went traveling. I grew a big beard. I got a motorbike. I rode around Asia, found myself, came back, wanted a big new challenge. So I shaved off my beard, and I went permanent with ITV.

At ITV, I report to the Director of Infrastructure, who reports to the CTO, who reports to the board of ITV. So actually, we're quite a flat structure, and it gets a lot of buy-in to what we wanted to do. And please tweet me. It makes my mother very proud. Thank you.

So ITV. ITV is not the latest Apple product, as Paul pointed out. We are an integrated producer broadcaster. What it means is we make stuff as well as having the ability to distribute it over our own channels.

Originally founded in 1955. If there's any British people in the room, they will know it was like Channel 3 when they were growing up. The ITV we have today was born in 2004 when all the regions, Central, Granada, Carlton, LWT, they all merged together to become the company we know now.

We're a member of the FTSE 100, which means we're one of the 100 largest companies in the country. We turned over about 3 billion pounds last year.

Some facts and figures, because the marketing team wanted me to point this out. In 2015, we had the most watched entertainment show, drama, soap, and sporting event. We reached 75% of the ABC1 demographic, and that's the one the advertisers really care about because they're the ones with the money.

This is the one which is a bit niche: 98% of commercial shows with audiences over 5 million viewers were on ITV last year. That's quite a niche one as well.

But we do all of that with very little. We've only got 5,500 employees around the world. Obviously, lots of freelancers and contractors helping out on top of that, but all of that with very little.

So some more details about ITV. We've got five divisions. There's Studios, who make stuff. We make stuff for ITV. So if you ever watch Jeremy Kyle, we make that. But we also make stuff for other broadcasters. So the British people in the room will be familiar with University Challenge. You'd imagine that's a BBC show. We make that for the BBC.

We also own studios around the world. So if there are any Americans in the room who ever watched Duck Dynasty, we own Gurney Productions, the people who make Duck Dynasty. So there you know, some classic, high-quality television there.

There's Commercial, who sell stuff. They sell the adverts you see in the programs, but they also sell the rights to formats that we own. So if you watch Come Dine with Me in the UK, you watch that on Channel 4, but it's our format which we sold to them. So if you watch it in Germany, it's Das Perfekte Dinner, and I apologize for my German accent.

There's Broadcast, who distribute stuff over the air. It's literally where the transmission technology for ITV lives as well.

Online, who distribute stuff online. It's where the ITV Hub, or the Player as it was, lives.

And finally, Shared Services, the exciting place where everything else lives, like HR, legal, finance, and so on.

So that's some useful context for the rest of the presentation. You can understand where Conway's Law is going to come into this.

I'm going to talk about the journey, as everyone else has today. Let's roll back to 2010. ITV as a company was about to go bust. Our share price had dropped to 18p a share, and we were really this close to going bust as an organization.

Adam Crozier came in. He's been a hotshot CEO. He did a massive cost-cutting exercise across the board. One of the things he did was outsource most of our IT stuff to a managed service provider. I shan't name them. It was the right decision at the time. It saved us a lot of money and helped save the company.

You had to fill in a form. You got a VM within weeks, maybe four, and it was fine because we were mostly waterfall, and the product managers, project managers, could put it on their Gantt chart, and it was all great.

The installs were scripted, as in a paper script that you really hoped they followed. And there was still this big dev and ops divide.

But on the ops side that was there, we added Puppet to basically clean up the VMs we got from the MSP. So it did things like uninstall X Windows from our server and disable the R* services in the 21st century. But hey, no, it's fine.

We then started using it to configure the applications. We added CI with Jenkins, and we added monitoring. We stopped crowdsourcing it from Twitter. It did actually work, but actually, it's probably not the most professional way of knowing that our site is down.

Rolling forward slightly to 2014. We wanted to put the ITV Player, as it was then, onto the Samsung connected TVs. We called it our 10-foot system because you sit 10 feet away from the interface, and the UI has to be big and pretty.

There's a chap at ITV called Rob Taylor, and he's a pioneer in the Wardley sense. Simon Wardley talked about pioneers, settlers, and town planners. Rob's a pioneer. He has a big idea, and he says, "I want to try this," and he convinces people.

So he wanted to try a thin-slice kind of DevOps team, a product team. So he went to Paul Clark, who's actually in the audience at the moment, the Controller of Online, and said, "Can we try this?" And Paul said, "Sure. Let's do it."

So Rob got some ops guys, some dev guys, put them all together in a mix, and he requested these VMs, and he waited, and he waited some more, and he waited a little bit longer, and they still didn't come.

So essentially, they went rogue, and with a little help from our friends at Basho and Scale Factory, we went to AWS in about four weeks, the whole stack up and running end to end, just in time for the VMs to actually arrive from the MSP.

Now, I get told off when I say it's quite as gung-ho as that, but that's essentially what happened.

So that told ITV that DevOps, or product teams, could work, and it also showed that cloud could work. We'd never really done cloud before. We were a little bit scared of it.

Rolling forward to March 2015, March last year. Back in 2010, when we had to save money, we pressed pause and chilled out all of our internal development because it was costing a lot of money, and we needed to save money. We realized we were slipping behind our competitors, and we realized we actually needed to modernize our applications. We needed a modernization program.

In that time, we'd moved away from waterfall. We'd moved towards agile. And we looked at the on-prem infrastructure managed by our MSP, and we thought, "No, waterfall and agile, that's not going to mix." And we looked at the cloud stuff that we had done with Common Platform, and we thought, "Actually, maybe we could do something with that."

So I was asked to take Rob's great work and actually make it a thing. If Rob is a pioneer in a Wardley sense, then I'm a settler. I come in after the pioneer, I take their great idea, and I industrialize it, I commercialize it. I make it fit for purpose across the whole company. So I was asked to do that, after they offered to shave my beard off.

Rolling forward to today, it's a thing. We've got 14 instances of the Common Platform, and I'll explain what I mean by that in a moment. About 13 engineers, up from about three when I started, and it now hosts internal and external systems.

So the ITV Hub, which is our VOD platform, our talent payment systems that pay all the celebrities that work for us, our playout scheduling, our content delivery system, our sales system. Basically, broadcast and production critical systems are running on this platform.

And we're looking at using it for the off-the-shelf apps as well in 2017. So this is going to be the story about how we did that.

So what are we being asked to do at ITV? We're FTSE 100, we answer to the shareholders, we're always being asked to do more with less. The thing we're always thinking about is, does this save money or does this make money? That's the question we're always being asked to answer. The term at ITV is that we're lean, and we're very lean as an organization.

So how do we do that with a Common Platform? How does it help?

Number one, really obvious to people in the room, automation. You automate the boring, repetitive stuff, and you concentrate on the fun, interesting stuff. And if you've got smart people working for you, they get very bored of repetitive, boring very, very quickly, and they leave. So it's really critical.

The great thing about automation is the more you automate, the more time you have. So you have more time to automate, so you have more time, so you can automate more, and you get into a virtuous infinite loop. It's great.

So automation.

Standardization. This is absolutely critical. Standardization, to me, means you can make assumptions safely. If it looks like this here, and it looks like this there, it means you can swap and change between the two different things. If it's totally incompatible and you're having to look under the hood every single time, it absolutely slows you down.

The other benefit from standardization means that this team's working on this here, and you know because you've all agreed to use round pegs of a certain specification, you can swap and change between them. And if someone makes an improvement there, the other teams can inherit from it too. So it's like a rising tide lifts all boats. It's a really, really good, important one. Standardization.

Loosely coupled but highly aligned. The people who heard me talk about standardization just then will realize there's some risks. If this person upgrades and everyone else has to upgrade at exactly the same moment, that's a big risk. Actually, it'll slow you down more than it speeds you up because everyone's having to down tools.

So we're always asking, how can we become more loosely coupled, so actually you can do it at your own leisure, but highly aligned, so we're always heading the same direction together? That's one of the key terms: loosely coupled, highly aligned.

Blast radius reduction. I think this is one of my personal favorites. Blast radius is the term we use to describe the area of effect of something going catastrophically wrong.

You must imagine at ITV, there's a time with lots of shared infrastructure, and there must have been change that happened, and there must have been a catastrophic failure. And you know someone said, "Hey, I know, more process. That'll fix this problem."

So they would've introduced a CAB, a Change Approval Board. So you go into the CAB and you say, "Please, sir, may I do my release?" And they say, "Yes, yes, you may do your release." And you go, "Thank you. Thank you." And you back out of the CAB and hopefully do it within the window.

But is that the right way? Taking it up, up, up to the top and getting someone to sign off on something they probably don't even understand in the first place.

But if you can convince them that you can draw a ring fence around your particular product, and the area of effect, the blast radius, of your change going wrong is totally limited to your product, then you can devolve that down to the product owner. And if they have an objective, a target, say, your application must be available 99.9% of the time, that's great. That's actually their objective, and it's up to them to make that decision.

So you're allowed to devolve the responsibility, which means everything's faster.

The principle of least astonishment. Who's ever run a command that did something they didn't expect? Who's ever had to push on a door that has a handle on it on the outside? And now imagine a green button marked stop.

They've all violated the principle of least astonishment. They've all broken the standard behavior norms that you expect, and they've surprised us, and they've astonished us, and that's bad.

It goes back to the assumptions I was saying before. It means you have to second guess. You can't trust anymore. It breaks down trust.

You want to think of the most obvious behavior and do that. You want boring, repetitive systems. You really want high borality. Yeah, okay, we'll go with borality. You want high-borality systems. Principle of least astonishment.

You build it, you run it. I think it's probably the meme of the entire conference, but it's absolutely critical and true. If you give people responsibility, they'll want to make it work, and you get quality through psychology rather than process.

Rather than saying, "Make it good," good people want to make it good, and if they don't, they shouldn't be on the team.

So you get your devs, you get your testers, you get your platform engineers, you put them all together, dev, stage, production, end to end. They have responsibility for it. No more operations teams.

So those are some of the key tenets of the Common Platform, and now I'm going to explain how we apply it to the classic three of people, process, and technology.

Talking about Puppet, for example. We use Puppet for configuration management at ITV. It came in like 2012 or so. We used to have one common repo, Linux Puppet it was called, and everyone who used Puppet at ITV would put all their changes into that single repo. All the different teams for all the divisions I mentioned before.

It was incredibly brittle. Massive blast radius, because if any change went wrong, it would affect loads of teams. And it happened regularly where someone would make a change in this team over here that'd knock out development for another team on the other side of the business without even realizing it.

So how do we fix that? Blast radius reduction. We now have an infra repo for every product that we run, and it's actually ring-fenced. So we limit the blast radius. The only stuff that goes in there, by definition, has to relate to that product.

The other side benefit of that is the signal-to-noise ratio in that repo is really, really high. If you care about your product, the only things going in there relate to that product, so every commit is relevant.

We adopted the roles and profiles pattern, and that allows us to express the ITV way of working through the Puppet modules. And we also treat our internal modules just like our external modules. So we have one repo per module. We have semantically versioned stuff. We manage stuff through pull requests. We have change logs, and we're really, really strict on ourselves.

We then use Puppetfile to manage them, so you have a very, very simple list of all the things that a particular product uses, so very clear.

So Puppet and configuration management.

Terraform, for those that don't know, is from HashiCorp. It's the equivalent of Puppet, but for your infrastructure tier. We were very early adopters of it by this time last year, and we do exactly the same as Puppet. We use the Terraform modules to express the ITV way of working. So actually what comes out of it at the end, we know is standard because you just put the standard parameters into it.

So we have the one repo per module like we had before. There's a weird kind of Terraformism that when you use it, you specify all over the shop what particular module you want to pull in, and it's quite messy. So I thought there must be a better way of doing this. Surely we can have a file that just lists what we want at the top of the directory.

And I mentioned this to Anastasios on my team, and he goes away, and the next day he'd invented Terrafile. And Terrafile is the equivalent of Puppetfile, but in the root of your directory you say, "These modules, these versions, go."

So it's a perfect example of loosely coupled, but highly aligned. Because they're all individually versioned, all the people that are using those modules can use whatever version of them they want at a particular time. They're not forced to upgrade at exactly the same moment. And in that file, it's very obvious there's a single source of truth for what's going on.

Loosely coupled, highly aligned.

But we've got probably 100 module repos now. Managing that manually would be an absolutely impossible task, but through a little bit of tooling and automation, we've made it possible. And so that kind of goes back to what I said before about automation being critical. Lots of tools being written to actually accelerate people up.

I talked about standardization. We actually standardize our terminology. So you'll be familiar with the term environment, and product we're all familiar with as well. We have the concept of ecosystems at ITV. So your production ecosystem is a collection of environments. Your production ecosystem just contains your production environment. Your development ecosystem contains dev, stage, SRT, perf, all the different development environments you have.

What we found is a lot of config that we have actually applies at the ecosystem level better than it does at the environment level. So it's really reduced duplication.

In AWS, we actually apply the same thing. We've got two accounts per product that we host in AWS. We've got a prod, using a laser pointer, sorry. It's very '80s. Using a prod for the production system and dev.

So obvious benefit there for blast radius reduction. If something goes horribly wrong in dev, they're two totally separate AWS accounts, and we're relying on Amazon to keep them separate. So obvious blast radius benefit.

It also helps with billing. We know that because they are both joined to a consolidating billing account, that those are the development costs for the product, and those are the run costs of the product. It's very obvious, and the accountants love it because now we can cross-charge back to the business, which we've never been able to do before.

It's really useful from a security perspective because IAM in AWS applies per account. API rate limits, they again apply per account. So let's say dev goes crazy and uses up all your credit, it can't hurt production. It's ring-fenced. The blast radius is reduced.

And also from a support cost, because you understand the lean influence of ITV to reducing the support costs. We use business billing on prod and just dev billing on dev. So actually we aren't actually inflating the bill.

Also, moving forward today, we've got some environments coming in now. We have these infra VPCs, which is where all the kind of common technology lives. So we use Sentry for operational monitoring, ELK for logging, Grafana, etc. for metrics. But those apply at the ecosystem level.

So all the staging and all the dev alerts, they go into that kind of infra stack on the side. But the production one is isolated to just be production. So again, signal-to-noise ratio through the roof. If it comes from one of the systems on the left, you know by its very definition that it must be really important because it comes from production for one of your applications. So again, it really improves the signal-to-noise ratio.

And as a nod to the developers, I'll put some applications in there as well.

So I mentioned instances of the Common Platform. This is kind of what it looks like. Because we're using the same tooling and the same best practice and the same modules, there are actually 14 instances of the Common Platform, all slightly different, different versions, all kind of moving forward at the same time.

But because we're loosely coupled and highly aligned and we're all relying on the same stuff, we're moving forward at some kind of pace. And you can see there's been a catastrophic failure in the content delivery system, but it's fine. It's their development ecosystem, and it is ring-fenced. It's blast radius reduced, so it doesn't actually hurt anyone else.

So AWS. That was it for technology. We're going to move now on to people, because the technology would be useless without the people to run it.

So a small number of brilliant people. Not a surprise to the people in the room. But versus literally hundreds with a managed service provider before. So we throw brains at the problem rather than bodies. It's the whole Jeff Bezos two-pizza thing. Very obvious stuff.

So what kind of people do we look for? There are two qualities that are musts that I hire: smart and kind. That's it. Anything else is a bonus.

So I say smart is the ability to adapt to change, because the technology we use today won't be the technology we use tomorrow, and you've got to be pretty smart to keep up. Obvious.

So kind, the ability to fit into the team. Essentially, don't be a dick. ITV, we don't make room for brilliant jerks. They're not worth the hassle.

You give people a chance, and they'll surprise you. A perfect example is Cameron on my team. Young lad, he applied to a Hacker News post I put for a job post. He messaged me and said, "Hey, Tom, I'm really interested in working for ITV," and he's coming over on a mobility visa.

We had a Skype. Really bright guy. Only been working for 18 months. I thought, "Okay, well, I don't know if I've got the capacity to bring up a junior. I don't know if it'll be a fail on him rather than us."

But he came over anyway, kind of came in for a coffee. He complained the coffee wasn't as good as New Zealand, but fine. Okay, we worked with that.

Really great to chat over coffee, but I was still not sure about the whole junior thing. Said, "Okay, we'll bring you in for an interview." Put him in front of the team, and he smashed it out of the park. He just blew us away.

We did our standard whiteboard architecture question, and I kid you not, he invented sticky load balancing from first principles in the interview. Amazing.

So I always say, you throw them in the deep end, you give them armbands and a lifeguard, but they'll probably surprise you.

So smart and kind. Once you've got those, what next? Daniel Pink talks about a theory of motivation, which I really believe in: autonomy, mastery, and purpose.

Autonomy, the freedom to make your own decisions. So you've hired smart people. Let them do smart-person stuff. You should trust but set high standards. The way I say, you give them a map and a compass, but not a set of directions.

Mastery, the ability to become brilliant at something through training and practice. ITV has internal training, external training, conferences, but you're surrounded by smart people, and smart people want to get smarter through osmosis, and they do. It's fantastic. So that's another reason to hire smart people.

And then purpose, the belief that what we do actually matters. Now, I'll be totally honest, I don't think Celebrity Love Island has ever saved a life, but it entertains them, and as far as I'm concerned, that's the next best thing.

So once we've given these smart and kind people autonomy, mastery, and purpose, what do we do with them at ITV?

There's two kind of classes of... I shouldn't say classes, sets. Mathematical. Two sets of engineers at ITV. We've got embedded engineers and the core team, who I'll come onto later.

Rolling back, this is the kind of structure of ITV, and we actually embed one of these platform engineers in every division of ITV. And actually, their day-to-day isn't really decided by me. The technology director, the scrum masters, the product owner, they actually set the work they do. They do report to me, but actually, their day-to-day is controlled by someone else.

And really, this has helped reduce the blast radius and the contention because there used to be one central pool that everyone was fighting over, but now we've devolved it down to the divisions, and they have full control.

So what do they do on the team? Number one, they're the first responders for anything going wrong, generally. Some teams actually have the devs being the first responders, but generally, they're the first responders to incidents.

But more importantly, they're the force multipliers on the team. They're there to make the team more effective, more efficient. They're improving the speed of deployment. They're improving the rate at which the tests can run. They're writing tools to make the team more efficient.

The other absolutely key one is they're influencing operational quality from day one, when it's cheap. The old thing where dev built it, threw it over the wall, product expanded, and they dusted their hands off and they said, "If only you'd asked us what we would've wanted, we would've told you to do it this way." Well, it's too late now and the product's expanded.

But now they're there from day one saying, "Make these little changes. It'll be easier to run, which means it's cheaper to run." So very, very important.

So they're the embedded engineers you've got at ITV.

The core team. The core team are responsible for the concept of the Common Platform itself. So the common Puppet modules, the common Terraform modules, the tooling, the best practice. What they try to do is meant to be kind of batteries included. It's meant to kind of work out of the box.

They're also responsible for the heavy R&D. Back in the day, when we wanted to experiment with something new on a project timeline, you'd go to the project manager and say, "Hey, I want to experiment with this." They'd say, "Oh, cool. How long will it take?" "Week, maybe two, maybe four. Ha ha." And the project manager, pfft, because they couldn't fit it onto their Gantt chart.

So now the heavy R&D goes into the core team as well.

It's also where we incubate new hires. They go into the core team. We bring them up to speed with our tool sets, and we actually put them into one of the product teams, the embedded product teams, that don't actually slow them down too much.

They're also there to be second opinion as a service. If one of the embedded engineers needs some advice, needs someone to pair with them, needs someone to bounce an idea off, get onto Slack, they bring you over to a desk, and they actually just sit there with you for a little while.

It's really important to note as well that they are not dictators, although some may disagree, but they aren't dictators. They're custodians of the stuff. So generally, what they do is accept PRs from the rest of the team, check they meet the quality gates, and actually merge them in as well.

The other thing they do, they look around, and they're stealing stuff all the time. There's so much innovation going on at ITV. They say, "Hey, that's cool. We can use that everywhere." They bring it in, they polish it up, they scrub it off, and they share it with everyone else.

So this is how it used to look. These are the four product teams. The gray stuff is the boring, universal stuff. It's the logging, the monitoring, the alerting, the deployment pipelines.

Each team's reinvented the wheel, but because they haven't had time to do it brilliantly, because they're in a rush, they've done a bad job. So you've got a triangular wheel, not going to work. Square wheel. Pentagonal, it's terrible. It's an absolutely terrible idea. Waste of effort.

So what we do, we add the core team. Here she is, kind of coming along. It's her responsibility to do the work once and do it brilliantly. She makes a perfect round wheel and then shares it with everyone else.

And you can see they've actually got much more time in their day to concentrate on business value. Our CEO loves business value. It's fantastic. And it means you context switch less. Context switching is an absolute killer.

So we've got a cybersecurity team. I'm expecting more of a laugh when I say cybersecurity team in kind of 2016, but they are literally called the Cybersecurity Team.

Anyway, the security team, if they want to make a change to the Common Platform, they'll say, "Hey, we want to comply with the CIS Benchmark v2." Great. They'll put that requirement into the core team. They'll figure out what change they need to make to the various modules, and they'll update them, and then everyone will upgrade.

So hang on. I've just talked about 14 instances of the Common Platform. How do you upgrade all of that? So now we come onto the really exciting bit of the talk: process.

Underpinning the whole of the Common Platform is what we call specification. So it's a versioned standard. It defines the platform, and we treat the platform just like software. So it's semantically versioned. If we make a major change, we bump the major version number. It's in a GitHub repo. Everyone can see it. And we plan to release monthly eventually.

We're still in beta at the moment on version 0.1, but essentially, it's the owner's manual for the platform. So let's drill into it a little bit.

At the very top is the charter. It's the high-level goals. It's why we're doing this. And we say to the engineers that even if it says nothing else anywhere else, what you do should be in service of a charter goal.

So we've got quality, simplicity, agility, security, longevity, value, and portability. If someone can think of a better one for value, I'm all ears.

But it's quality, like doing the right thing once rather than the wrong thing 10 times. Simplicity, as simple as possible, but no more. Value, as small as possible, but as large as necessary. If you're trying to run an 8XL instance for your tiny little microservice, you know you're kind of violating the value charter goal, so I can point people at it.

So those are the really high-level goals, and they rarely change. They're pretty set in stone.

Drilling down slightly, the policies. And they're mostly common sense, but the problem with common sense, I've found, is actually it's not that common. So unless you write it down and say, "It says here not to do that," then people will kind of think, "Well, you didn't say not to."

So it's the musts, it's the must nots, it's the shoulds, it's the should nots.

Here's an example. Platform components must be managed by a configuration management tool. So I hope you'd all agree that's a sensible thing to have written down. But if you don't actually write it down and someone doesn't do it, you don't have a leg to stand on when you actually say, "Hey, you didn't do this." "You didn't tell me."

But also notice it doesn't mention Puppet. It doesn't actually mention a particular implementation. The policies are meant to be quite slow-moving, too. They're meant to be quite high level. So it's abstracted from the implementation detail.

So let's drill down a little further. Standards, practices, and principles. These are the day-to-day detail. These are the fun ones. These define what an ecosystem is. These define how AWS should look. These define the DNS standards and the alerting standards. And this is the document people use when they actually build the Puppet modules and they build the Terraform modules.

So an example of how we would use this. My development server just sneezed at 2:00 a.m. Should I page the on-call? No, obviously. But here it says every alert that interrupts someone must be urgent, important, and actionable. Development server at 2:00 a.m. is none of those things.

But the ITV Hub, our VOD platform, is down at prime time. Urgent, yes. Important, yes, we're losing money. And actionable, most likely. Like I said, my team are generally the first responders to the alerts that come out of the system.

So moving on, component design and implementation. But that's not really part of the spec. It's basically a requirement to write clean code, add clear comments, get it tested, and peer-reviewed, and that's enough. I don't want my team spending days and days and hours and hours writing documentation because they're all smart people, going back to the smart-person thing, and good code is good documentation.

So here's an example of one of our Puppet modules. You can zoom in with your eyes. It's not very interesting at all, but it's nicely indented. It's commented. We've got sensible parameters with defaults that are sensible as well, easily overridden, and lots of validations. They're easy to work with.

So the result? Well, happy people. We've got more change power in ITV than ever before, and safely. We've got VMs now in minutes rather than weeks, and initial environments in weeks rather than six months like it was previously. And I think weeks is still too slow, but we're working on that.

Performance has improved. Through that standardized monitoring that we have, we've got more eyes on the problem, more people looking at the stats, more people looking at the graph, more people caring. And I always say sunlight is the best disinfectant, so it really helps.

Same with reliability as well. It's improved. The teams suffer when stuff goes wrong because they own the whole thing end to end, so they want to make it better.

So Gene asked me, what do I need help with? So I'm going to say finding talent. It's an incredibly difficult time at the moment. There's very low supply, and I think actually I went into management at the wrong time because this is my thing. Incredibly low supply and incredibly high demand.

So the question I'll pose, I suppose, is does the community need to do something about this? Do we need to start growing our own DevOps or platform engineers? Do we need a DevOps academy to build the next generation? Because the supply's not coming from anywhere else.

So that's my question out to the audience. Is that something we should be thinking about?

Thank you very much.

Excellent. Thanks.