SAP’s DevOps Journey: From Building an App to Building a Cloud

Log in to watch

London 2016

SAP’s DevOps Journey: From Building an App to Building a Cloud

SAP has been using a DevOps & Continuous Delivery approach for building its web and mobile apps for several years, and is now building and running a global cloud at the scale needed to support the digital transformation needs of its customers. This talk recaps the story of how SAP originally adopted DevOps practices before moving on to describe how the Cloud Infrastructure Services team is building and operating its 3rd generation cloud automation system using microservices, containers and open-source software.

Darren talks about the important tools and technologies and what is most important: the people.

SAP is the world leader in Enterprise Applications with over 300,000 customers in 190 countries. Nearly half of their 78,000 employees are in technology.

In 2010, SAP started to pivot to DevOps with a small project and have since grown to a cloud scale.

Chapters

Full transcript

The complete talk, organized by section.

Darren Hague

Good morning, everyone. I'm Darren Hague from SAP.

What I'm going to talk to you about today is SAP's DevOps journey: how we went from building a simple pilot application, learning how to do DevOps on that, and then moving on to where we are today, where we're running basically an entire cloud on our DevOps platform.

The tools and the technologies, you're going to hear all about those, and they're important. But I think what you've seen from other presentations as well is what's really important is the people. It's how you work with people.

So I guess a lot of you are thinking right now, "Who the hell's SAP?"

Well, we like to think we're the world leader in enterprise applications. We were founded 44 years ago, so we've been writing software for a while now. And our mission, what we really want to do, we want to help the world run better.

And it blows my mind every time I put one of these slides together. It's about once a year, I have to completely redo all of the numbers on the right-hand side. Basically, over the last 44 years, we've had double-digit growth pretty much every year.

So now we're at a stage where we've got over 300,000 customers, nearly 200 countries, almost 80,000 employees in 130 countries, and I think that's a similar scale to some of the other people you've heard from over the last couple of days.

What maybe makes us a little bit unique is that of those employees, about half of them, because we're a software company, are in technology. So we've got a big technology base. Quite a few users of our software as well, over 110 million at the last count.

And what's really nice is we used to talk about how much of the world's beer and how much of the world's chocolate was produced by our customers. We're getting the bar set a bit higher now. So our customers produce nearly 80% of the world's food and over 80% of the world's medical devices.

Over three-quarters of the entire world's transaction revenue touches an SAP system at some point. So we're really well on our way to helping the world run better.

So from that grand scale, down to myself. I'm not even going to show you an org chart because I'm right at the bottom of it. I'm an architect, so I've got a bit of experience, but nobody reports to me.

I was part of a team a few years ago doing a lot of SAP's websites, including our developer portal. And then we did a pivot, which you'll hear about a bit later in the talk, and now I'm one of the architects on the cloud platform team, where we're producing, as I say, this cloud for SAP's whole portfolio to run on.

So I'm going to mostly be talking about a lot of web and Java-type technologies, but I'll touch a little bit on how this also can apply to SAP on-premise stuff.

As Gene said, this is the talk I gave at the FlowCon conference a few years ago. He asked me to put this slide up, I think because it had a bit of a cool title at the time. It shows that you can go from this lumbering dinosaur of a company to get the agility and the speed of a spaceship.

So back in dinosaur times, we've always thought that we were agile in my team. You can tell from the picture there, that's not quite true back in 2010. But there were some good things around. We had version control, and before I joined SAP, I'd worked on projects where we had to fight hard even for a version control system. Project managers didn't see the point.

We had good issue tracking in Jira, Bamboo for build automation, so we could take a bunch of code and produce the JARs, and we released once a month. So not too bad.

But if we were starting a new project in our department, then you needed to know when you were going to go live with a six-month lead time. There was no, "Hey, let's do a quick project and see how it works." You had to buy the hardware in, schedule someone to come and plug it into the data center, get a budget for the hardware, all the rest of it.

And we didn't really have any kind of culture of automated testing. So each release cycle, we had someone doing manual testing for a couple of weeks, and that takes a lot of time. And it's soul-destroying work, manually running the same tests every month to see if anything broke.

And I'm sure this next point is quite familiar to a lot of you as well, is that code, we announce a weekend downtime because no one's using the website over the weekend. The ops guys come in. They don't mind working on a weekend, whatever.

So they take the machine down. They pick out the Word document that the developers have prepared with all of the sequence of steps on, all of the bits of configuration to do, and of course they're going to do that absolutely perfectly. They're clever human beings.

At the time, of course, we had the development organization over here, then the ops organization over here, and then the hardware guys over here. Each of them has their own different customers and different priorities. So even trying to get everyone lined up to do things at the right time was a challenge.

And as I mentioned, the actual configuration data, so what makes production different to QA, what servers each is talking to, that's not controlled. That's just in this Word document that's produced by the developers.

So yeah, we had some challenges there.

But as I say, we were agile. We were delivering every four weeks. As you can see from that, four weeks of each iteration, six weeks from initial request to development, QA, deployment. That's not a bad cadence, in theory.

In practice, it looks a bit like this. The first iteration's great. You do your four weeks of development, throw it over the wall to QA, they do their testing.

But hang on. While you're starting your next iteration of development, QA found some bugs. So actually, we found that of this four weeks of development, two weeks of that is spent chasing down and fixing bugs.

Then we get to the weekend where the bugs are gone, the ops guys are going to do the deployment, they're following the instructions. It's all deployed. Monday morning, hang on, something's not quite working right. Did someone put a typo in the configuration over there? And we need to figure this out.

So you have this purple bit there that's basically 48 hours of mass panic, working late, tracking down that bug, finally correcting it, and then everything's fine.

But suddenly you find yourself with a bunch of really tired developers who've got one and a half weeks to do four weeks of work. And that cycle repeats itself every month.

So what you're probably thinking now is, "Well, why would we consider DevOps?"

Well, it was forced upon us. We were between a rock and a hard place. We got to 2010 and were planning out the year's effort. We realized that, with 100 staff in our entire unit, that's roughly 20,000 person-days of effort available.

And then we looked at what the business wanted us to implement, and we figured out that was coming up to 60,000 person-days. And there's no way we could hire another 200 staff. Even if we did, they'd never get ramped up in enough time to deliver that kind of effort.

But the clue is in that tripling of effort. If we go back, we're wasting two-thirds of our effort anyway. So if we can somehow get that back, we can get on track.

So you're never going to get any of this done without people giving you cover, and that's probably our biggest piece of good fortune that really enabled this.

My boss, the chief architect, he'd read this book, Jez Humble and Dave Farley, Continuous Delivery book. Highly recommended. Essentially boils down to two things: you automate everything, and you version control everything.

And the spark of genius that my boss had is he decided for everyone who reported to him, part of their bonus would be dependent on whether they could demonstrate that they'd read this book. So that created a very strong incentive for people to spend a couple of days just sitting down, reading through, getting the lessons.

And rather than risking everything, we chose one small project to do. So we started with the SAP Identity Service, which is a new product I'll describe on the next slide.

It's not just him being able to do that, though, because you need a bit of money and a bit of time to put these things into place. So he managed to convince his boss as well, and his boss was in a position where he trusted what Marcus, my boss, could do. And he agreed that of all of the project budget that he controlled, without necessarily telling the rest of the business what we were doing, he would take 10% of that and devote that to our DevOps initiative.

So that meant that we had this project. Very briefly, it's a project where at that time, if you went to SAP's website, we're all about one version of the truth: a big system with all the information in it, controlling everything. You just go to one place.

And yet, because of all of our different business units all wanting to have their different websites, if you wanted to actually go to SAP's website and go to a developer portal or look at something for sales, you had to have, at one point, I think, 12 different accounts with our various bits of websites to be able to log in and use it all. Kind of embarrassing for a big ERP company.

So this project was really to put a product in place that allowed a single account and then a single sign-on to work between all these websites. So they could all work in as they wanted to, but there was just one identity for the end user.

Now, not in the initial iteration of that project, but eventually this spun off as its own cloud product within SAP that's now part of what we sell as the HANA Cloud Platform. So we have millions of users not only using it to access our website, but also using what we call SAP Cloud Identity product to access their own web apps as part of their enterprise solution.

So we've gone from after a year or two, when we went live with the big version of it, where this needs to stay up or essentially SAP's website is down, to where we are now, where this needs to stay up or SAP's website is down and a big load of our customers suddenly can't use their cloud software.

So yeah, from small acorns, massive oak trees, and big, reliable, scalable oak trees can grow.

But as I said, it's about the people. So the key thing that we did was we created a cross-functional team. We put everyone together. We adopted Agile. Well, we adopted what we thought was Agile from reading articles on the internet at the time.

So we had a product owner, scrum master, the blend of skills that you see there. So everything we needed within one team to create the software and deploy it.

And the challenge we had as well, being SAP, this is a geographically distributed team, from various acquisitions over the years of small companies. Our team was spread over the world. So this one project team, we were in Germany, Bulgaria, UK, Russia, and Israel.

Advantage of that is that we're all in more or less the same time zone, so we had an awful lot of using Lync and Skype, but other than that, not too bad.

So what's the first thing you need to do when you're setting up a DevOps initiative?

Well, if you can't check that everything is going to work, you don't have any confidence to deploy. So we went with a tool called Cucumber. It's a very high-level tool. It's not even that accurate to call it a technology. It's just more of a structured language for describing the various use cases in your business.

So in this case, there's a simple scenario. It uses this given, when, then structure, and it's completely done in English or whatever other language you want to choose. And it's written by a combination of the QA team and the product owner sitting together and deciding exactly what it is that this thing should do.

And the idea is then the product owner has a good idea with the business. There's many claims throughout technology of language that the business can read, that they actually view as way too technical. What we found with Cucumber is it produces specifications which the business view as technical but understandable, and that the development team view as human language.

That's a pretty good compromise that works quite well.

So where the magic comes with Cucumber is this Gherkin structure, this given, when, then. Each line of that can map to a piece of Java code or a piece of Ruby code.

Now, that doesn't mean that your system you're testing has to be written in Java or Ruby. This is just the language that your tests are written in. So essentially, that drills down to the actual technical nitty-gritty of what you're testing.

In our case, we were using Selenium quite a bit underneath the Java to drive web browsers, so we could actually test clicking through and things did what they were meant to do.

But this can also apply to any on-premise system. It's particularly easy if there's a web interface; you can use Selenium. But it's not too difficult in Java or Ruby to do an adapter to control a system through other means.

So for on-premise SAP, for example, you might be calling remote function calls, or there's an automation tool for triggering the user interface and that sort of thing. So it's challenging for sure, but certainly not impossible to use this stuff for doing on-premise ERP.

And essentially what that gives you is an executable specification, which is quite a powerful thing to have.

So given that, if it still takes you ages to run your tests, it's still an issue. And what we found is that trying to reduce that cycle time really made a difference to productivity.

So in other words, if a developer commits a change and the build takes three hours to run, to them it's almost half a day before they know whether that change even passed the test suite. And by the time that half a day has gone on, they're working on a different task completely. So the task switch overhead destroys it.

So we came up essentially with some benchmarks that said, "Well, for an individual developer running the tests just on their own developer machine, that to run all of the tests and build all of the code should always take about 15 minutes."

Any more than that, and the developer is going to go off and do something else. In 15 minutes, they can go and grab a coffee, have a quick look at Facebook, and then carry on with their work.

And then within 40 minutes, you have the benchmark that that should have gone on to the main integration system, run the integration tests, and have been deployed to QA. So if it caused some sort of disaster at the integration test stage, at least they're knowing in the order of magnitude of go to lunch and come back.

Now, what's key to this is parallelization, and especially if you are going to be tackling anything like an on-premise ERP system. We got to a point fairly quickly where we had over 1,000 scenarios with 10,000 test steps, and each of those may only take milliseconds to run. But that soon adds up.

So by running our QA system in the cloud, we were able to just add nodes as needed to keep that running quickly. And the lesson here for on-premise systems is, even if your production system is on premise, it's worth considering putting your QA system in the cloud.

Because there are two big benefits to that. One is that every night when everyone goes home, you can just turn it off and it's not costing you anything.

The other benefit is when you're running your automated tests, you can have as many tests as you like that can take quite a long time, so long as each individual test is relatively short, and you can just scale out elastically in the cloud.

So in other words, you start your test suite running, you scale out, you're running this huge, great system in the cloud that runs your tests really quickly, and then you scale it right back down again.

So it's only costing you money while you're running the test, and that money it's costing you is paying itself back and giving you really rapid feedback.

So it's something people don't often talk about in the cloud. It's mostly, "Oh, we're just going to lift and shift. We're going to run our servers. They run it for us." But it enables a different kind of working that's really powerful.

But it's not about the technology, it's about the people.

So it's one thing to say, "Yeah, we need 1,000 automated tests." It's another thing to say to a group of developers that don't have a culture of automated testing, "Hey, guys, here's what you're going to be doing for the next few months." They're suddenly faced with not really writing a lot of functionality. They're writing tests instead.

So we kept it as simple as possible. We said, "Well, we're not going to try and explore every corner of the system. Start off by saying just what the system should do, checking that it does that, and as long as it works as it should in that case, we're not going to worry about every nook and cranny, every exception case. If you can think of an exception case and write a test for it, great, that helps, but it's not essential."

So then you carry on with the manual QA testing at that stage. That's still taking two weeks. But the rule in the team is every time QA finds a bug, the developers are responsible with QA for writing a test to expose that bug.

What that means is as the months roll on, QA guys are doing less and less regression testing. All of those regressions are being caught automatically.

Even so, you may still need to give a little bit of encouragement to the team just to get them to really stay on track and focus on writing the tests. But the point is that once you've got a good test suite and everything is passing, now you've got the confidence that system is running as it should, and you can then think about deploying it automatically.

So that's the next step.

Now in our case, we used Chef. You can use Puppet, Ansible, Salt, all good tools. The reason we chose Chef is it was the first tool we tried. It worked for us. There was no particular reason to go on and try anything else.

What it gave us is the ability to take all of this work that the ops guys were doing on a weekend and just script it. So we could get rid of the Word documents put together. Essentially, it just became some Ruby code instead. That got checked into source control. It gets reviewed by a couple of people to make sure it's not stupid.

And one of the nice things about Chef as well is if you get your recipes right, you have idempotence, which means that if you run the same thing over and over, you still end up in the same state. So you're not running something that's saying adding one, adding one, adding one, and you end up with three. You're doing something that says set it to one, set it to one, set it to one, ends up with one every time, repeatably.

So that's the thing that gives you real consistency in deployment.

And then we had this concept of blue/green deployment as well. So we had an alternate pool of servers. Any new release got deployed to blue or to green, depending on whether it was an odd or an even month.

And what that did, that took away that big purple exclamation point in the flow diagram, the weekend panic. Because even if there was a mistake and production wasn't quite working properly, all we actually had to do was to tell the load balancer to put stuff back to last month's work.

So the limit of an emergency was going, click, back to blue. No one's up all night. We've got a problem, fine, but we come back to work in the morning. We work the problem like normal, rational people because last month's system is live and still running.

That really makes a huge difference. You can solve problems with a brain instead of with your animal brain awake at 2:00 a.m.

So along with the automated testing and the automated deployment, now we've got the confidence to do an automatic deployment.

But it's not about the tools. It's about the people.

So to get this working, you have to have the right skills. People have to learn Chef. Ops guys have to learn enough coding in Ruby to be able to write recipes along with the developers.

There's actually a really good opportunity there for developers to pair with ops guys to get this done, to get that cross-fertilization of what it is to do coding.

So you have these recipes like install database and so on. Any configuration values, you're not storing in Excel sheets or Word documents anymore. They're actually version controlled. So you might have a structured file in some format like JSON or YAML, and that's in your Git system or whatever. You've got an audit trail.

Things are put in correctly, they're reviewed, and they automatically drive the deployment. So there's no room for human error.

And a final bonus point, it took us a while to get around to it, but it does add an extra layer of quality to the whole thing, is actually these recipes and cookbooks that you've done in Ruby and Chef, you can do unit tests for those as well. So you can actually check the quality of the DevOps code.

So that's how we did the first cycle.

But there's a real need in this to just keep on getting better and better and better. Other people have talked about this as well, the virtuous circle that you get into. You can go faster and faster.

So this is our real journey over the last few years. In 2010, this is our pilot project doing the ID service. Everyone's doing DevOps within the team. We're building the tools within the team. And we're using VMware vSphere. We're creating a few VMs.

But it's still at the level where you're having to email someone in the infrastructure team to say, "Please create me a VM," and you'll get it in a day or so. Not too bad. A lot better than the six months, but maybe not as quick as it could be.

So we ended up writing a tool to automate some of these things we were doing. This tool was called Cocktail, and it's just a simple set of scripts to do things like to automatically run Chef, to automatically call a vSphere API to deploy a VM, that sort of thing.

So at this point, with our first pilot iteration, we took that theoretical four-week cycle, and we made it real. We eliminated those waste points. So we were genuinely doing four weeks of development every month.

Of course, it's about the people. You getting the message yet?

We thought we were Agile. It turns out that getting a bunch of people together in the morning for a phone conversation isn't necessarily what you'd call a scrum. What it can be is one or two people have particular issues, and you spend half an hour talking about those particular issues rather than 15 minutes with everyone saying what they did yesterday, what they're going to do tomorrow, and if they have any impediments.

So to us, what made a big difference there was actually getting an Agile coaching team in. We had a guy embedded with the team, essentially, and it started with a week of training, the whole team going to a room, games with Post-its, balls, all of that stuff.

And it's fun. There's lots of bright colors, there's games, but it's learning fun and it's deep learning fun. It's not just sort of reading how to do things and then trying to do it parrot fashion. You get a deep appreciation of how waste builds up and how teams can work together and how you can solve things.

In our case, we also used it for how to work well as a distributed team. So although almost everybody in the team gathered in Berlin to do this training, there was one guy who had a visa issue who couldn't make it from the UK, so he was stuck back in London.

And that gave us a great opportunity to practice how to do Agile in a distributed team because for all of that course, that guy was Skyped in. He was on an iPad on the table as a virtual team member, and through some of the exercises, we made him the product owner. In some of the other exercises, we did pairing with him.

So it really got us into working properly as a distributed team with Agile.

And the coaches, they made us grow up. The coaches weren't afraid to say, "You sound like you're behaving a bit like children here," which is another common theme, I think, among some development teams.

So yeah, we really kind of said, "Yeah, it does look a bit like that. We are squabbling a bit over this minor point. We're intelligent adults. Let's behave like that."

Even so, we did have a point where there were a few personality clashes, and that was nobody's fault. People work in different ways. People have different styles. And so we did a little bit of shuffling around and we got a big productivity boost just by putting people in the right teams that could work well together.

It really is about the people.

So as a result of all of that, we've now got a team culture of continuous improvement. We're doing retrospectives at the end of each sprint, picking the top three things to work on. And that's not focused on what we're going to build, it's about how we do it.

So in the next year, we'd done really well on this project. We'd spun off this other bunch of guys into a subteam to just work on the DevOps stuff while the main team carried on doing the ID service.

And they produced quite a cool user interface to this Cocktail tool and developed into its own kind of web-based platform, talks to VMware on the back end, runs Chef. And we actually got it to a point where now everyone in the IT department can come to this platform. They arrange for some VMs to be deployed. Their project is in this Barkeeper tool. And they can go there and they can just click to deploy.

So we've got the basics of actually getting a DevOps platform against a whole group of teams. And also, we've got knowledge sharing, brown bag lunches, little bit of people working in other teams for a couple of weeks to spread this knowledge that we'd got from the pilot project.

So then we actually got to a point with the ID service, now we're scheduling a deployment every two weeks. We took our sprint length to two weeks. But even within that, sometimes we'd have a hotfix or someone wanted something delivered quite quickly that was an important feature.

So we were averaging two or three times a week for our deployments at that point. And that was on a service where, by then, SAP's website would've gone down if anything went wrong.

But of course, it's about the people.

You have to spread the knowledge across the teams. You have to invest in the Agile or Scrum coaching. You have to make it normal for ops guys to be a part of each development team and have shared ownership, and for teams to share what they learn.

So now we're at the point where, hey, we've got a DevOps platform that's being used by a few teams in this department. Someone ramped up the challenge and said, "Well, the whole company should be doing DevOps. We're SAP. We write software for a living. This is something we should be doing."

So in 2013, we started a new project called Monsoon, which is about taking DevOps to the company scale.

What we'd done before with VMware, we sort of scaled that up to a bunch of different data centers. We had this infrastructure-as-a-service layer, so more or less equivalent to what you get in OpenStack now, and using Chef and MCollective, had this automation framework so that VMs could be deployed automatically.

People could just describe what they wanted in their project. It would automatically bring up those VMs. And in a typical turnaround time of an hour or two, you could have your dev, QA, and production landscapes all up and running. All you needed to do then was write the app.

So that was done using mostly Ruby on Rails. We found that, I think in common with a lot of people, Ruby enabled us to get going very quickly. Eventually, after a few years of doing Ruby, it can end up to a point where the initial speed causes later on issues. But I think that's common across the industry.

But the idea of having those Ruby microservices enabled us to isolate the problems there.

Another really important point here for the testing as well, especially when you're talking about on-premise stuff, we used a library in Ruby called VCR, so like tape recorders. There's a similar equivalent in Java called Betamax, and you could write something for other protocols.

But that basically allows you to record the interactions with another system so that when you run the integration test the next time around, it's just there and it behaves like a bot, essentially. So you don't have to have the system, and it's much quicker.

So now on that platform, we're still on two-week cycle, but we are deploying every single morning.

So as of the end of last year, hundreds of internal and external apps and cloud services are running on this platform. There's a small list there of just some of our products in the cloud, all running on the DevOps platform that we're building.

Thousands of developers using it, tens of thousands of virtual machines and storage volumes. It's running across six regions and 12 availability zones.

Now, I could pretty much stop there. It's about the people.

So there's a core team, 15 to 20, so not that much bigger than the original ID service team. QA, you'll notice, is now down to half a person. That's all it needs to keep all of our QAs up and running. We use IRC, we use Lync, a mixture of home and office working.

Every three to six months, this is really important in any distributed team, we get together ostensibly for a workshop to plan what we're doing next, but really, it's about just sitting together, breaking bread, refreshing the human relationships that can decay over virtual connections.

And the second-level support is important there. We have a "you build it, you run it" philosophy. So at any one point in time, there's someone on ops duty from the team. So they've got the pager.

Finally, we haven't even rested on our laurels at that point. We're now building the next-generation platform.

So, so far, although everyone's doing DevOps, they're still at the point where each project has its three web servers, its three app servers, and its two database servers, and they're all lovingly cared for. So they're deployed automatically, but people are still making sure they're up.

As you heard in the Disney talk, everything's got its personality. If there's a problem with a server, someone's going to log into it and see what they can do to fix it.

Really, in the cloud era, when we're turning to tens of thousands of virtual machines, millions of containers, we're farmers. These are cattle. We don't have the time to look after each individual instance. So we need to be just saying, "Well, if one of these things dies, that's sad, but we just get another one as quickly as possible and keep running."

So credit for this slide, by the way, this is not my slide. My pictures for copyright reasons, but this slide came from Gavin McCance at CERN, and he nabbed it from Randy Bias at Cloudscaling. So there's a bit of reuse.

So that took us then to using Kubernetes. There's a whole long laundry list of great stuff there that you can do with container management and Kubernetes. We don't really have time now to go into the detail of it. We can do all the nice things that you want at a container level. You can treat containers as cattle and not pets.

So that takes us to where we are now. We've realized that actually, I said everything was more or less equivalent to OpenStack. At any point in time on the Monsoon project, we realized that we'd built something that OpenStack then came along and did three months later.

So we were always a bit ahead of the curve, but we ended up after a couple of years saying, "Well, we've basically cloned OpenStack. There's absolutely no point us having an in-house maintained version of something that does the same as OpenStack."

So now what we're doing is we're using Kubernetes for deployment on the container level. We're using OpenStack. We're deploying the OpenStack services in containers. We're doing all of the networking, load balancing. All of these components are now using essentially open source, except for the final two, where it's a new dashboard for OpenStack and a new automation agent to enable Chef stuff, and they're things that we're putting into the open source community.

So yeah, long story short, big OpenStack-based cloud is being built.

We're now back up to a four-week sprint cycle, and what that tells you is actually sprint cycle length, sprint length is not that critical. Somewhere between two and four weeks is about right. It's whatever works for you at the time.

Because we're now adopting OpenStack and Kubernetes and everything else, less than 10% of our coding is now on our own stuff. And even that's mostly on stuff we're planning to open source soon anyway.

90% of our daily work is now working on open source projects. As a big side benefit of that as well, actually, that's really motivating for developers, it turns out. They love to hack on open source. There are challenges as well that we'll come to in the final slide.

So you may have seen this book, you may hear about it from other people in the conference. Google people have come together and written this book about site reliability engineering. It's kind of the next generation on from continuous delivery and DevOps. It's how Google do it, basically.

Jez Humble, as you can see, has written some really nice words about the book. My boss was a bit more direct.

But it's about the people.

So now it's still more or less the same team that did the previous platform doing the new one. We do have a complexity issue, in that we're adopting so many new tools, so many open source components, that we do have a bit of a cognitive overload issue.

We've had to split into three subteams just to contain some of that, so each team can then know what it is they're doing without having to worry too much about what the other teams are doing. But we still have sharing sessions to keep high-level knowledge.

The ops shifts, we now have two people on shift any point in time. That's largely because we've just grown so much within the company. We have so many customers. We just need two people to handle the volume of tickets, which is not huge. It's typically, in any given week, maybe 10 or 20 tickets, but just resolving them takes time.

And I'm really sorry, Gene, I've overrun horrendously.

Gene Kim

That's all good.

Darren Hague

So almost at the end.

That's where we've been. Here's where we are now. Technology explosion. XebiaLabs guys downstairs, they just published this DevOps tool periodic table. Really useful tool to see what there is out there, but it's also terrifying in the sense that you've got to potentially learn all of those technologies, or at least pick a few of them that work for you.

So yeah, that's it. Thank you very much. In case you hadn't got the message.

Gene Kim

Thank you, Darren. That's great right there.

So the fact that you can do this on SAP systems, going from nine months to a week to daily deploys to deployments on demand, there was a point when I was reviewing your presentation, I just started laughing.

Darren Hague

Yeah.

Gene Kim

So if you can do it for SAP, if you can do it for Siebel CRM, and mainframe systems that Ross Clanton talked about yesterday, you really can do this for anything. So thank you so much, Darren. That's awesome.

Darren Hague

Pleasure. Thank you.