DevOps at Jaguar Land Rover

Log in to watch

London 2018

DevOps at Jaguar Land Rover

Head of Systems Engineering, Infotainment · Jaguar Land Rover

In 2017, more enterprises have come to embrace the benefits of DevOps in order to deploy higher quality software at a faster pace. Although more enterprises are embracing this ideology, it can be difficult for development and sys admin teams to put it into practice.

In this talk, Chris Hill, Head of Systems Engineering at Jaguar Land Rover, will explain how enterprises can successfully achieve complete DevOps by using the appropriate development tools that will encourage greater collaboration amongst developers and sys admins.

Hill believes that tools like Continuous Integration and Continuous Delivery are vital to the DevOps process and, if, used from the beginning can reduce cycle time and prevent code from failing on someone else’s machine. As the Head of Systems Engineering at Jaguar, Hill led a development team of 18+ world-class engineers in DevOps, front-end, back-end, architects, and Linux developers focused on in-vehicle infotainment software.

Based on Hill’s expertise, he will outline the benefits and best ways developers can use CI in a DevOps environment. Hill will detail the tools needed for successful CI use by teams, identify the common pitfalls, and detail what other aspects of DevOps work well with CI (such as version control).

Chapters

Full transcript

The complete talk, organized by section.

Chris Hill

All right. Thank you for that. This is actually my fourth DevOps Enterprise Summit, and for some reason Gene keeps inviting me back. So thanks, Gene.

That video actually described our I-PACE. And the I-PACE that you saw there in the foyer is actually our first battery electric vehicle. With our battery electric vehicle, we actually launch that next month. Can I get the slides on here, please?

I'm really excited to talk about not only the I-PACE, but DevOps and the DevOps methodologies that we've been following within infotainment, which is that center screen that you saw there right on the vehicle. And the reason I'm so excited is this gives me an opportunity to reflect not only on where we started, but how far we've come, and how much more we have to go.

With a quick show of hands, how many of you have built a software product before? Okay, I made it to the right room. That's helpful.

How many of those products are built for an embedded device? If you could raise your hands again.

And leave your hands up if those products worry about user distraction in your user interface. Only a few of you. You guys are actually all hired. Come see me after the talk and I'll sort you out. I'm just kidding. I am being serious.

These are some of the constraints that we put on developing for an infotainment system. You have to worry about these things every single day.

Now, an infotainment system is really your technology experience in the vehicle. Historically, our vehicles have been about your driving experience. And just as you guys may know, as the technology of autonomous driving, and as our cars get more modern, technology becomes the main focus, and it will become more about your technology experience over time.

I'm going to walk you through our journey of adopting a DevOps methodology for infotainment. But before I get started with that, I'm going to tell you a little bit about who we are as a business.

Go to the next screen. There we go.

We're roughly 40,000 employees globally. We make approximately 24 billion pounds a year in revenue, and we sold 604,000 vehicles last year. We have about 5,000 software personnel, and out of those 604,000 vehicles that we sold last year, the infotainment system is one of 40 embedded devices that we have in each vehicle. They all talk to each other on a vehicle network in real time.

Now, what you saw in the video and you saw in the foyer is our first battery electric vehicle. We've never done this before. The car goes from zero to 60 in 4.5 seconds. It has a range of 298 miles, and the car looks sexy.

What I've come to realize is that Dev and Ops isn't always so sexy. The people are sexy, don't get me wrong. But the idea that we are transforming or causing a change within our enterprise takes hard work, and there are some qualities that I've begun to respect because of how much hard work is involved.

Those qualities are things like inspiration, persistence, and an attitude of continuous improvement. And without these qualities, creating change in such a traditional organization such as Jaguar Land Rover is very difficult, and I no longer underestimate the effort behind being the change agent.

So I'm going to start our journey off here. We started, as Gene mentioned, in Portland, Oregon. We started just with two of us who knew that our output wasn't quite meeting our potential. And we had read about higher-performing teams, and we had read about this DevOps thing, and we spent six months convincing my boss to allow us to go to our first DevOps Enterprise Summit in San Francisco in 2016.

Here, I feel we were handed the inspiration we needed. We were given books like The Phoenix Project. You may have heard of that one. We were given books like The Goal by Eliyahu Goldratt, which talks about the theory of constraints. And we were given the fuel and the inspiration required for us to begin to transform our organization.

And so we made it back to our place of work after the conference, and we did probably what everyone does at the end of the conference, and that is have all these ideas in your head that you're going to change the business today, and you are going to take over how we develop software in our business today.

But the unfortunate reality is nobody else went to the same conference. Nobody else is just as energized as you are. And you kind of fall flat on your face for a second and realize, all right, this is going to be a lot harder than I thought.

And what we really knew, and I've noticed this a few times in my career, is we needed to get something working. And what I've noticed is working software trumps everything. So we knew we had to at least prove out on a small scale that what we were talking about and what we had learned actually made sense.

So we set off to find a server. The server we found wasn't this dusty, nor actually was it a Dell computer, and it was actually a rack mount. It didn't look anything like this. However, the software that we ran could run on anything. We ran free and open-source software. We ran Linux because we were very familiar with the Linux operating system. We ran GitLab. It was free and open source, and we knew we wanted to migrate to Git.

GitLab also contains continuous integration, GitLab CI. So we decided that, you know what, this feels like a package deal. This feels like the right thing. This will be our start.

So we started with one, two, three projects that all had a bus factor of one. They all had one person on their team that knew how the build worked or knew how the automated testing worked. And we slowly brought each of them into the idea of continuous integration.

Unfortunately, some of the larger projects took over so much of the resource based off of this server, we'd actually at some points lose our revision control system because our build slave and our revision control system were on the same machine.

So we actually did what any software team does when they find themselves in a pickle, and that is buy more hardware. We bought three additional servers and three additional build slaves. We've got GitLab running as a revision control, and we've got GitLab CI using three additional build slaves.

Unfortunately, what came with three build servers were three additional personalities. And if your commit continuous integration pipeline ran on runner two or three, you were waiting four to eight times as long than if it ran on runner one. But it was the same machine, same OS, same configuration, right?

We eventually began to add more projects to this. And when we added more volume to our three-runner set, we started seeing complaints. Now, I don't really think of them as complaints. I like to think of them as gentle and little positive feedback packages written in an explosive email with choice words.

However, this was our only method of continuous feedback to make our course corrections. But equally, if not more important than the complaint itself is the response or the reaction to the complaint. And it's this idea of psychological safety.

Can I bring a complaint that I know my voice is heard, I know that it is valid, and I know that somebody cares about solving my issue? And I really feel that in an environment without psychological safety, only the assumptions prevail.

So some of the complaints we received: if my job landed on two or three, it was always really slow, and we understood that one.

We had complaints like, at 12:00 noon, all of the developers committed at the same time, my jobs were always in a queue. I always had to wait forever for my jobs to be able to finish. Right?

And one of my favorite complaints here, which led to a big course correction, is, "I asked the ops team three weeks ago to add a build dependency on the build servers, and it still hasn't been added yet. I'm just going to go back to building on my own." Right?

Now, this one obviously is a knife right to the heart because you feel like you've started to regress. But what I really like about this complaint is it led to a behavioral change as well as a technical change.

And so we decided, instead of continuing the same direction, to move to ephemeral Docker containers to run all of our builds. And with ephemeral Docker containers, we defined every piece of build infrastructure as code. We used Packer recipes to define a Docker container, and every application developer could now change the underlying infrastructure at which built their application.

They were empowered. They now had the self-service to do their life cycle on their own, and you're never going to receive the ops complaint because you've handed over the keys. Does that make sense?

We still had complaints, and we still had other problems to solve. We added more projects. We adopted more projects. We actually solved some of the configuration drift that we had prior because we ran in an ephemeral pattern, and it was an on-the-fly continuous build, and we would essentially delete the container at the end of it.

But we still had issues. We still had capacity issues as we increased the volume. We had issues that come with maintaining hardware, as you guys may know: power outages, internet service provider outages, CPU fans overheating, switch outages, you name it. Maintaining bare metal is really not sustainable.

With these latest complaints, we actually... Real quick, I had somebody show up to a stand-up once and said, "All of our bare metal servers don't do anything at night. Why don't we mine Bitcoin at night?"

While I'm a huge fan of cryptocurrency, I did feel like this was another time for us to have a course correction. And instead of mining Bitcoin at night, we decided to move to the cloud. And this was a huge technology move for us because instead of just having ephemeral Docker containers, we now had ephemeral EC2 containers.

And now every time that we needed to spin a commit, or every time we had a commit and we needed to spin up a build server, it was a fresh EC2 instance and a fresh Docker container. So we still kept the Docker container ephemeral piece, but now we've eliminated the lack of capacity.

And when we moved to the cloud, at this point was when we started to adopt many more projects. We were at about 50 or 60 developers, at which point the bigger projects started to become interested.

And at this point, I was asked to come over to England. Not quite as violent as Scorpion does to Sub-Zero here, and I don't really look good in green either, so it's definitely not me. But we were asked, my family and me, were asked to move over to England to continue the transformation.

And instead of just touring castles and eating fish and chips, we wanted to ensure that we could create value for other bigger projects as well. So I was implanted in a new city and a new culture, and many more sheep on my way to work than I ever thought possible.

But I began to realize that the maturity and growth had to come from the bigger numbers within our organization, and that it wasn't just about the technology, it was about the people.

And I had a coworker recently bring me a revelation, and that is when somebody is promoting a tool, and this tool offers and is packed in there 10 years of software engineering experience that we put into this massive feature set, does my team actually acquire a free 10 years of experience?

I like to think of it as like the RPG of life. You've acquired a tool. All of a sudden, plus 10 software engineer power?

The answer is, that's not actually the case. The team of engineers have felt the pain for the last 10 years on why these features make sense and why these features resolved their pain. But we miss out of all the growth experience, and I've actually had to tell tool vendors, "We're not mature enough yet for your product."

Unfortunately, sometimes I know that is the right answer for us in the longer term, but my team simply doesn't have the maturity to be able to understand why the tool exists in the first place, and we don't get that immediate effect.

I had the realization I did the same thing when I came to England, and I realized that I imposed a toolset without transferring the pain as well. So I had to transfer that pain, and I had to get them to understand why things are the way they are and why we've created this toolset.

Around this time is when we started to adopt infotainment. And with infotainment, we looked at some of the key indicators on the infotainment developer environment, and we knew we had a lot of room for improvement.

The first indicator, in fact, there's a quote by Gary Gruver: "If feedback takes days or weeks to get to them, it is of limited value to the developer's learning."

And really the problem statement here was our feedback loops were four to six weeks. Could you imagine writing code today and six weeks from now being told whether or not it works or is broken? I don't remember the shirt I wore yesterday, let alone what I had for breakfast this morning, let alone what I wrote six weeks ago.

And chances are I've been working on features for the last six weeks, and for me to try to unpick what I was thinking at that point could be a huge context switch penalty.

Infotainment also had a significantly higher number of contributors, up to 1,000 contributors. And what we noticed is that contributions don't come linearly, they come in bursts. We actually found that Thursdays were the day that most of our developers committed on. And when we had manual code reviews, if we didn't have reviewers ready on Thursday, we would create our own backlog.

We also had a tremendous amount of complexity built into our Linux distribution. And there were only three people in the entire world that knew how our build system worked end to end.

I remember being on a phone call with 12 other people discussing essentially how we're going to make a change to the build system, and every one of them had a different idea and a different way of actually doing it. It was massively complex, and none of the applications actually understood how their application fit in the rest of the ecosystem.

This was our time to shine.

We looked at some of the complaints that had previously been made, and unfortunately, we were at a point where complaints had stopped. The people handing in the complaints never had resolution. And this is a point I feel within engineering where we've kind of lost the battle.

However, this was our time to shine, and we knew instead of following the current direction that we had for infotainment, we were going to adopt the same toolset.

We improved on the indicators. We went from four to six weeks to 30 minutes to receive your build back and your feedback. We automated all of the processes around these burst merge requests. We also changed it so that no longer three people in the entire planet knew how the build system works. Everyone does. And we changed and refactored the entire build system to be extremely simple.

And I'm talking if you're a JavaScript developer making one application in our infotainment system, and you have no prior knowledge of how Linux works, you could know how to incorporate your app into our build system within 30 minutes.

By making things more simple, it allowed us to increase the throughput.

Since then, we've delivered an infotainment system for nine different vehicles, and each of these vehicles takes the same infotainment system and takes the same Linux distribution.

To give you a little visual indicator of what an infotainment system looks like in one of our latest vehicles, this is in the Range Rover Velar. We've started to expand more than one screen now. If you notice on here, we've got a lower screen, and we have an upper screen. We power both of these, as I mentioned, in one of the 40 embedded devices in the vehicle.

Now, infotainment isn't always thought of as a safety-critical device on the vehicle. There are, however, some safety-critical features. One is your reverse camera. Pretty important, right? Another one is your park aid beeps. Also pretty important. And another one is your battery usage, and how much range you have left, and your efficiency of your battery.

Which leads me to the next infotainment system that we delivered, and that is the I-PACE. And that's what you see actually in the foyer.

Let me get the next one.

And this was a game changer for the business because they, well, dropped the combustion engine. They moved to electric only, but they also took something that was normally just a prototype and turned it into something we could mass produce.

In fact, our vehicle line director, Ian Hoban, says, "It was important for us to come out with the I-PACE and an electric vehicle first before the established market to ensure that Jaguar engineering could show its dominance."

We asked ourselves, how could we change the game from an infotainment perspective for the I-PACE? And instead of ditching the combustion engine, we ditched the dealership visits. And we implemented software over the air.

And this huge Linux distribution that we build upwards towards 700 times per day now in a continuous integration pattern on a dev branch, or a master branch, or a release branch, we can now deliver to every vehicle in the form of small incremental deltas.

We can also deliver it to the vehicle while you're driving and not interrupt your daily life. In fact, I showed Gene yesterday, we started a download and an install while I was driving, and the entire thing happened in the background. Jeff even made the comment, "This is blue-green deployment for vehicles."

We've got a dual banking strategy that allows us to essentially eliminate any risk to your active bank and eliminate any risk to your infotainment system, and install directly in a passive bank and switch it over on your next startup.

This was my secret weapon to not only provide a tremendous amount of value to the customer, but also to our development environment. And this is my favorite part about this feature. There aren't many features that you can give directly to a customer that is also going to help your development.

But instead of just deploying to our 600,000 vehicles in the market worldwide, we can now iterate extremely fast on our own development environment with continuous deployment to vehicles within engineering.

And in fact, remember that four-to-six-weeks figure. On commit, it took typically six weeks for it to finally land in a vehicle and let you know whether or not you broke something. That happens now within an hour. Build system to vehicle within an hour.

We used to talk about the indicator. One of my favorite indicators is deploys per day per developer. But I was always embarrassed to share ours because it was always below one. All of our new software wouldn't actually make it to vehicles. It was always batched together.

And now I'm happy to say we can deploy, and we have been in our engineering environment, 50 to 70 times per day of each individual piece of software to a target or to a vehicle.

That video that you watched, they didn't include anything about Git, so I'm pretty bummed about that. But the video talks about essentially eliminating the idea of ever going to the dealership anymore.

And what I like about this idea is we're changing the value stream. No longer are deployments limited to a traditional software release cycle. We've now skirted every single process to get a technician a new piece of software and bother somebody else's day, one of our owners, to come into a dealership and spend an hour waiting for their vehicle to be done.

We've now empowered the customer to be their own technician.

There are some lessons learned over the last two years that I want to share with you.

Go to the next slide. There we go.

I didn't understand the difference between a true strategy and a set of objectives. But ultimately, a true strategy is what can make you more competitive in the marketplace, and not just a vision statement.

I learned that if you're doing principle-based software delivery, it means uncomfortable and conflicting opinions. What I've noticed is at your core, if you think that something's right, fight for it. I've learned that democracy isn't always the best approach. Even if we got those 12 people that were all on the same phone call to agree to a majority, it may still not have been the best route to take.

I've learned that articulating the why is very challenging, and I've learned spending all night trying to come up with why there is a significant return on investment in improving the throughput or improving your development environment is worth it.

I've also learned to lead with focus, positivity, and transparency. And this idea of a blameless culture and psychological safety can always ensure that you're thinking about what could go wrong.

Speaking of what could go wrong, the risk associated with software over the air is potentially the capability to brick a million vehicles overnight.

We still need help. We also still have problems to help solve that we don't know exist yet. We're also recruiting more attitudes of continuous improvement.

And we'd love to hear about your challenges, we'd love to hear about your problems, and we'd love to hear about what resonated with our DevOps journey with your own business.

That's all my time. Thank you very much for listening, and have a good rest of your conference.