Breaking the 2 Pizza Paradox with Platform Applications
In my experience many large enterprises would love the adoption of DevOps to be as simple as bringing Development closer to Operations. In practice they need to consider many development teams, multiple suppliers, multiple service providers, not to mention multiple business divisions.
I will describe my experiences of implementing Continuous Delivery in large enterprises with heterogeneous technology stacks and I will share my belief that Platform Applications will be the savior of enterprise DevOps.
Chapters
Full transcript
The complete talk, organized by section.
Mark Rendell
Hi, everyone. My name is Mark Rendell, also known as Marcos for reasons of disambiguation, and I'm going to talk about treating your platform as an application.
I work for Accenture, and I really do have the absolute pleasure of running a team of around 180 people who are heavily into DevOps. We work on tens of engagements, which means hundreds of environments, thousands of Git repos, tens of thousands of Jenkins jobs and monitoring checks. More about this later when I start talking about what the team actually do and putting the ideas into practice.
I'm going to start briefly with what I think we're actually trying to achieve with DevOps. The simplest way I can think about our objective is as follows: someone has an idea, they need to get it implemented and in front of customers as soon as possible, so that if the idea is good, we start getting some value for the business. And if not, well, at least we learned quickly, and we got fast feedback.
Now, if that change involves an IT system, then we're interested, with DevOps, in helping make this as easy as possible.
I think it's earned indulgence, if you're presenting about DevOps, to just chuck another definition in the mix. So here's mine: we value enhancing IT systems to meet the business's needs over IT operating efficiency.
I'm not saying we aren't at all conscious of cost. Of course we are. But we need to be very upfront that DevOps is not about replacing the IT folk with automation and building some kind of factory of robots. It does seem to have been a pretty popular idea lately, at least in the UK, that robots are coming for our jobs.
Well, no. DevOps is about making IT within an organization, and hence the business, much better.
So why might improving throughput be so hard?
We all know how to draw DevOps, right? Draw two circles and you're halfway there. I've drawn a few typical enterprise systems on here, an ERP system, a CRM, standard stuff. Both teams interact with these same systems in different ways, and neither team interacts well with each other.
Is this really what DevOps looks like in an enterprise? Actually, no. Absolutely not. This is nothing at all what it looks like. I've never seen anything this simple. This would be lovely.
This. This is a far more typical example, unfortunately: some kind of nightmare of teams everywhere. This was a real example from somewhere I worked a few years back, and I drew the slide a couple of weeks ago, and I still keep on remembering more teams and having to draw them on.
Far from one development organization, there were at least five teams developing things, some working on the same code, just targeting different releases. And far from one IT operations team, there was a server team, there was a storage team, there were DBAs, there were sysadmins. Actually, there was one team for Windows sysadmins and one for Linux. Don't mix them up.
And I believe this is far closer to the organizational structure that many enterprises are going to have to consider when looking to increase throughput.
So why don't we just do this? Build a massive team. Well, I mean, obviously, this is just unrealistic, right? Inevitably, a team of this scale will be unmanageable, and you'd get some kind of organizational muscle memory just basically subdividing and going back to exactly how things were.
Besides which, not only giant teams, but it turns out large teams are generally considered pretty bad. Jeff Bezos from Amazon is reported to have recommended that a good team can be fed with two pizzas. That's a good team size. And when I bring this up, there's always a pretty good discussion about how many people that actually means. And there's always going to be someone that reckons they could eat two pizzas themself. Actually, I think it's intended to mean six to eight, so kind of less than eight people.
So how, in a big enterprise, do you create end-to-end teams to deliver high-throughput, complex IT without the side effect of large teams? And this is what I'm calling the two-pizza paradox.
Einstein is reported to have said that if he had to solve a problem and he had an hour, he'd spend 55 minutes thinking about the problem and only five minutes thinking about the solution. Well, I'm not claiming to be Einstein, and we don't have an hour, so let's just spend a couple of minutes just visualizing the problem space.
Like I say, we've got to get from idea to value, if there is some from the idea, as soon as possible.
Now, this is a pretty simple abstraction of the end-to-end software lifecycle. I'm not saying waterfall or agile, but these concerns are going to happen. And let's not forget that we've got a lot of systems to deal with as well, that may all be impacted by the change. And this is definitely a very short list for an enterprise.
Now, traditionally, what do we do with our people? Especially in IT operations, we group them by skill set. So let's put the database experts over there. Let's group the systems admins over there. Don't mix up Windows and Linux. Let's put the Java developers over there. Let's put the testers over there.
It's actually a lot more like we're trying to organize a zoo. And really, a zoo, when you design a zoo, you're not thinking about trying to get the animals to collaborate towards some kind of common goal. No, you're just trying to stop them from eating each other, and worse, the animals eating people.
The zoo metaphor, I like it, and it goes even further for me because it turns out that zoos aren't necessarily a great environment for nature to thrive in. Whereas if you think about a rainforest, you've got this incredible autonomous ecosystem filled with these symbiotic relationships. Zoos just take a lot of money and micromanagement just to keep running.
So let's think back to the traditional teams. They're literally scattered all over the process that we're trying to optimize. Haven't even tried to draw the lines of communication. Conway would have a field day with this.
So what can we do? Well, I think we need to rethink the lines of division, and we need to go back to prioritizing the products. These are the things that we need to optimize for, and these are the things that we need high throughput end-to-end.
So we can create end-to-end teams to develop and operate and be accountable for the products. Each product now just has one team primarily concerned with its welfare, and hence the ability to evolve it as fast as the business needs. And to keep team sizes down, we may need to decompose products, and I guess a logical conclusion of this is microservices, which is pretty trendy.
I think I need another DevOps definition.
DevOps could mean organizing for throughput, forming teams that both develop and operate products end-to-end. So even just reinterpreting the name, I think, puts us on a pretty good path.
This approach has got a lot of promise for applications. But what about these guys?
These typical operations concerns, very technical. What do we do with them? We could try embedding them into the teams, but there's a side effect there that the teams are getting bigger again. And if we do that, we're going to lead to duplication and potentially unnecessary complexity and fragmentation.
So let's recognize the platform as a first-class product. It will also benefit from an end-to-end team and ownership, accountability, and potentially we've got a new approach, a new mindset, perhaps, for something that's traditionally been pretty hard to define the ownership for, shared by lots of teams. And I think if we can tackle this and solve this, then we really are probably getting to one of the biggest constraints against throughput.
So what might a platform application look like? What do I mean? Essentially, it's everything that you need on which everything else can run. So we could call everything else the business applications. I think, platform-centrically, I like to think of them as the guest applications on the platform.
But what does it need to do to provide this service?
Well, it's an exercise of abstracting away the unnecessary detail, the unnecessary concerns, and encapsulating complexity. We need something for the hardware management. We need to turn the physical into logical. And we need a programmable API for dealing with this. We need to be able to create things with automation. And, of course, if you're using cloud, and by that I mean IaaS, Infrastructure as a Service, then you're abstracted from that already. You're just getting that as a service to you, which I think is a very good way of doing things.
The next thing we have to deal with is build some logic into this platform to basically deal with uncertainty. We've got uncertainty coming up from the bottom, and that's flaky infrastructure, just the nature of cloud and infrastructure being fragile, so needing to compensate and create high availability and auto-failover. And we've got unpredictability coming down from the top as well. So an unpredictable load, an unpredictable number of business applications, an unpredictable demand profile, and potentially the need for things like auto-scaling. So we need to build logic into this application.
We need to handle whatever installations we need. If you're using something like Docker, then that's nice. We're building some of those dependencies into the business application packages, into the containers. But still, we probably need to think about installing something and configuring the OS.
Next, we need logical environment separation. What good is a platform if it's not easy to get environments either through it or in it? And, of course, here we're talking about things like self-service, the ability to easily create them and recreate them on demand.
I think that if we're going to create a platform, it needs to be pretty easy to use, and we need to basically pride ourselves that it's easy for business applications to develop themselves, or to get developed, and to be hosted on the platform. So we need to have a deployment architecture, and I think that's a concern of the platform, to make it as easy as possible just to work with.
And finally, just like any other application, of course, we need to think about security, and we need at least a role-based access model for handling who can do what with this incredibly powerful application that we've got.
One of the nice things about defining the platform this way is suddenly it's got a lot more in common with the other applications. So now all the teams are actually more similar. They're all developing and supporting and being accountable for, end-to-end, their product. So they're essentially doing the same things: controlling scope, developing, testing, managing dependencies, releasing, ideally doing continuous delivery with pipelines.
So could you use a public PaaS? Well, yeah, of course you could. If you can find one with the features and the terms and conditions and the commercial constructs that you're happy with, then absolutely go. It's absolutely a great way of doing this, and that's what they are doing. Personally, I haven't seen that many enterprises doing this. Increasingly SaaS, increasingly IaaS, fantastic. But overall, not too much.
And I think the important question really is: is your platform part of your business differentiation? And if it is, clearly you're going to have to build it. And as a believer in platforms, I do think it's worth self-managing.
Next thing: build or buy. This is just like any business application. For example, a CRM system. You can build it or you can buy it off the shelf. I've spent many years battling COTS products, trying to get them to do automated builds and automated deployments. So I've got a bit of a learned aversion of COTS products. But that said, we did build a platform for a UK public service using Community Cloud Foundry, and it was a great experience.
So if we explore the idea of having our platform like this a little further, we think about some of the things we're going to do with a good platform.
First, we need to engineer it like any other application. This means building continuous delivery pipelines. The normal rules apply. Start with everything in version control. Have something to automate your CI. Do some static code analysis. Write some unit tests. Deploy it. Do some runtime testing. Of course, we're kind of used to this with platforms because we call it monitoring, but get as much as possible into that, checking that your platform is doing what you expect.
And it needs to be recreatable from scratch, and you need to do it regularly, right? So to use Martin Fowler's Phoenix metaphor, we need to do this regularly. And we kind of use it like a verb in my team, so we're like, "When did we last Phoenix that platform instance?" Hopefully, we're doing it every night if it's non-production. And this really should mean everything: networks, servers, those third-party installations. And getting the application back on there is kind of popular if you destroy the platform overnight.
So just like any application, the platform needs an interface to be defined to let other applications know how to work with them. These quite often are called platform opinions. And in the case of the platform, the interface tells the business applications what they need to do to be hosted.
Typical opinions might be how and where to log, what package format you need to create to get it deployed, how you're going to get the configuration from your environment when you deploy an application. The Twelve-Factor App is a brilliant example of this. I think it's very clearly written and very accessible, very easy to find as well.
I think the adage of good boundaries make good neighbors really fits here. We've got a much easier premise for a good relationship between the platform and the business applications than some of the relationships we've seen in the past.
So if we've defined the set of opinions, we've got to recognize that change is inevitable, and we need a clear approach for releasing it. And semantic versioning, or SemVer, I think is a really good standard for this. So if you've not heard of it before, basically it means that the numbers denote compatibility relative to the previous release.
You typically have three levels of numbering. The most minor one means minimal testing. It's pretty much working the same as it did before. Medium might mean that you're going to definitely need to test something and manage that integration before you release. A major means application rework is probably required. I think it's really great to have a very clear, standardized, visible way of demonstrating that something in the underlying platform or infrastructure has changed.
So once we recognize that our platform changes, we need to support the testing of integrations with new versions of the platform and the app.
This is an example that we were working on. They were kind of like microservices on the right, so we didn't really have to think about integration testing them. But we did have to think about integrating them with a new version of the platform, which is on the left.
So we started building our platform development instance of the platform, and we built pipelines. And once we were confident that we had a platform that was going to make people productive, we could build an instance of that platform application for people to develop and test their applications in.
When probably the versions had moved on quite substantially, but when we were all happy with that, obviously, we could then build an instance for production, probably potentially for performance testing and pre-production as well, and deploy into production.
Now, if we made a change, we had to recognize that we were going to need to agree and coordinate the integration of that change with the rest of the applications. So we then could upgrade the non-production platform application instance. And we couldn't release to production until we'd finished testing and were ready to make that transaction of upgrading the production platform and deploying the applications. And then we're back into the kind of dependency-free mode where the business applications could release continuously.
So in practice, I mentioned earlier that I run a team of people into DevOps. That wasn't very specific. What we consider ourselves to do is to basically operate like a very accommodating in-house PaaS provider.
So we develop and operate the tools and automated environments as a service, with a focus on agility and predictability of delivery. And I think this approach will work for other people. If they're not ready to use public PaaS, I think this idea is transferable.
By accommodating, I mean that within reason, we'll produce the platform to support whatever's required. So someone wants to use Hybris, we say yes. Java, yes. Adobe Experience Manager, yes. Some COTS product, yes. We adapt the platform as we need.
And enterprise scale, to me, looks a bit like this. I've got tens of engagements to support, each wanting a platform where they can release rapidly. So a fairly heterogeneous landscape, but then I still think not wildly more complicated than what many enterprises have to contend with.
So I need to keep things small and simple. We need to divide up the problem, and we need to produce a team that can own a platform for a particular platform instance end-to-end. They're accountable for it. They get their own pizzas. That's their concern. And they need to make sure that the platform is optimized for throughput and that they are maximizing the performance of the business applications that are running on it.
So we just scale up from there. Basically, we just keep repeating this pattern.
Sometimes we get into problems with a really big platform. It's pretty common. And we start getting to a point where we've got big teams running a big platform. People are not getting enough pizza. They're getting hungry, right? So we divide things up again.
Now, we're still exploring this, but what we mainly do is to break up by a number of business applications. So a platform that supports similar business applications, and then break it out from another to keep the team sizes down.
So sharing over shared. We do have room for some shared services. For example, we have a tools platform that serves some engagements, but not all, and actually not the majority. And our emphasis is not on promoting all sharing things. It's actually on promoting the act of sharing ideas, techniques, software, so tools and scripts, over trying to get everyone into this multi-tenancy solution to try and save costs or whatever, and having to enforce choices on people and constraints that they don't need by saying, "Use the standard service."
Starting over studying. I think, for me, one of the biggest keys to establishing continuous delivery when you're getting started with a new engagement is lowering the barrier to adoption of the tools and automation as far as possible. So if getting started takes a lengthy tool assessment process, procurement, installation, configuration, engagements will probably start to run away and be forging manual habits before you know it.
And we've tackled this by basically automating to the extreme, spinning up the tools and reference implementations. In fact, we've got it down to 10 minutes for creating all of this from nothing at all. And it's also pre-configured and usable for a particular technology.
We call this our DevOps platform, partly because, with a name like that, who couldn't want to use it? And we are hoping to open source this fairly soon. Obviously, it's wrapping a load of open source tools, and we want to give the capability of creating that back.
And when we get started for a particular technology, for example, I don't know, Mule ESB, Node.js, Hybris, whatever, we have this concept of bootstraps. And actually, we do sometimes call them blueprints, but I do regret that name because blueprint sounds like a reference architecture or design, and this is not that. It's actually deploying a plugin onto the platform, or an implementation that's actually usable.
That means the ability to create environments, so the bottom corner, and example code, and pre-configured pipelines, and even example tests potentially, so that people can get started as soon as possible. And we really like using this.
Usually, if it doesn't exist for a particular technology, people will just be very keen to create it anyway. And the beauty of doing this is it's far easier for people to experience what good looks like rather than trying to read it off a page.
It's very key to us that we make everything as easy to reuse as possible. And no one is forced to use, for example, Git or Jenkins. I found the hard way, actually, that people are far more motivated if you empower them to make the right choices for themselves. And they're even more motivated if you give them this kind of vehicle for sharing, and they know that others are going to get the benefit from the things they're doing.
And that takes me back to the bootstraps that I mentioned. So there is a kind of effect of unofficial, or rather unenforced, standards that emerge. But we get to them by allowing people to continuously improve organically when they're building platform applications.
So my five takeaways. Thanks, Gene, for the pun on pizzas.
As follows: optimize for throughput, not efficiency. I do still hear people talking about DevOps and saying that we're going to save IT costs. And I think, as a movement, we need to stand up and point to the benefits to businesses and tell them not to look at trying to reduce the IT team size.
We need to build end-to-end teams. I think it is a really good interpretation of the word DevOps to say develop, operate, and be accountable for products.
Treat the platform as an application. I've said a lot about that.
The next one is kind of eating your own dog food. So implement continuous delivery for your platform and then make sure it's possible for everyone to do that with their business applications via your platform.
The last one: lower the barrier to adoption as much as possible. Defer decisions. Just try and change the problem and start small.
And this is where I'm looking for help from others. So please, let's have a chat.
Firstly, I know this is a little early for me to ask, but when we get this open source project out there, please do take a look, and please consider talking to us about it and collaborate on it. I do think it really has helped people to understand what continuous delivery and DevOps is about and experience it firsthand. And then it's very useful as a way of starting.
Secondly, if you're working in this way at all and you're using platform applications, then please be more vocal about it and share it with the community. For example, I'd like to see more people documenting the platform opinions. As I said, The Twelve-Factor App, which is actually for Heroku, that's really good, but let's see some more and see how they vary, and maybe talk about standards for writing them.
And documenting how you've decomposed big platforms into smaller logical components within the platform. Interested to know more about that.
And finally, just show off. If you're doing this, then don't take it for granted. I think a lot of people are. Spread the word.
So this takes me to the end. I hope I've inspired you to think a little more, a little differently about platform applications, and some of you will be interested to try this at home. Thanks a lot.