Advancing the Lifeline with DevOps

Log in to watch

Las Vegas 2020

Download slides

Advancing the Lifeline with DevOps

Jonathan Akers

Product Owner - RadioCentral · Motorola Solutions

Ryan Dobson

Director of Engineering · Motorola Solutions

Motorola Solutions' mission critical two-way radio business was established over 90 years ago and has been the dominant leader in the market with over 85% market share for decades. Configuring, upgrading, and managing the radios has evolved from simple, stand-alone applications installed on a PC to enterprise grade, on-premise, distributed solutions.

In 2018, Motorola Solutions began the transition to move device management solutions to the cloud in conjunction with the launch of the newest radio, APX Next. The RadioCentral cloud platform has transformed not just how our customers manage their devices, but also kickstarted the organization on their DevOps journey.

Chapters

Full transcript

The complete talk, organized by section.

Ryan Dobson

Hi, welcome to Advancing the Lifeline with DevOps.

My name is Ryan Dobson. I'm a director of engineering with Motorola Solutions, working with the Device Cloud Engineering team. I've been with Motorola for 20 years, worked in all different areas of software development, and now have the pleasure of working with this team.

Jonathan Akers is with me today. Jonathan, do you want to go ahead and introduce yourself?

Jonathan Akers

Yeah. Thank you, Ryan. I'm Jon Akers. I've been at Motorola for 17 years and worked in a lot of different areas, and it's been a pleasure to work in some of the new technology that we're going to be showing.

Ryan Dobson

Great.

Just a little backstory on Jonathan and me and our experience here. Last year, we attended the DevOps Enterprise Summit in Vegas, and we really got a lot out of the experience, both the people we met as well as the sessions that we attended. So it's really our privilege today to be able to share with you what our story has been.

We started on this transition to DevOps in 2018, and the story we're sharing with you today is about our journey there.

First off is this iconic symbol. It means quite a bit to different folks when you first see it. Internally inside of Motorola, we refer to these as the bat wings. They've been around for quite some time, and it may stir up different images when you first see this.

For a lot of folks, you immediately start thinking about cellular devices. For other people, it might be radios. There are quite a few things that have had this symbol on them over the years. Motorola started back in 1928, and when they started, they actually were doing radios. Two years after that, in 1930, is when they released their first two-way radios, and since then, they've been a leader in the public safety two-way radio communication business.

Over the years, they extended into quite a few different areas of technology, leading the way in cellular, semiconductors, and lots of different projects. In 2011, the company split into two: Motorola Mobility and Motorola Solutions. The Motorola Solutions side retained the two-way radio business as the core business that they had. The Mobility side took the cellular. So Jonathan and I are with Motorola Solutions. We work in the two-way radio business, and that's what we're here to talk with you about today.

Like I mentioned, Motorola Solutions took the two-way radio business with them in 2011. Since that time, they've expanded through acquisitions and some organic growth, and now the core of the business is around this mission-critical ecosystem. There are four main parts that create this platform.

The first is mission-critical communications. This is based off of the core that I mentioned before, that Motorola Solutions has been in the business for a long time, but it's expanded beyond just voice communications into data and other adjacencies.

Video security and analytics is relatively new to the Motorola Solutions portfolio, but we've aggressively been going after completing this entire ecosystem, and video security as well as the backend analytics plays a key role in that.

The third is command center software. This is the backend systems that provide incident management as well as dispatch, 911 services, those types of things. The last is our managed and support services, which wraps them all together and provides an end-to-end solution for our customers.

Double-clicking into each of these areas to give you some idea of the scale at which Motorola Solutions is operating: on the mission-critical communication side, we have over 100,000 customers all around the world. This includes both on the enterprise and the public safety side. The command center software handles millions of incidents and focuses on our 911 dispatchers and call takers. Video analytics analyzes hundreds of thousands of alerts a day, notifying first responders of those events and then providing the backend analytics to support it. And then our managed service and support is where we're monitoring customer networks and proactively providing support for them as a total package.

As I mentioned earlier, we do support both the government public safety business and enterprises. On the government side, we work with federal customers, Department of Defense, lots of three-letter agencies in the U.S., as well as foreign governments and militaries. On the enterprise side, our products are used by theme parks, data centers, airlines, mining operations, all kinds of things. So when we're talking today about the services that we provide, an easy way to think of it is any two-way radio device that you see that has those bat wings on it. Those are the devices that we're talking about today.

So as you go to the airports or you go to different locations, you can look at what devices they're using. If you see those bat wings, that's what we support.

I have a short video here that's going to introduce the ecosystem a little bit more and show you how it's all integrated together to provide these solutions for our customers.

[Video plays.]

Now on to Device Cloud Engineering. The purpose of this organization is to provide cloud services and capabilities that enable the mission-critical workgroup communication solutions that I referred to earlier. Really, that is in these three key areas: device management, device analytics and telemetry, and feature and service offer enablement.

First, in device management, the device management functions I'm referring to are firmware upgrades for our devices, configuration of our devices, and new feature enablement. So as we're selling features and controlling the sellable elements of those features, all of that is wrapped together.

The second is in device analytics and telemetry. Our newer devices have an IoT aspect to them that allows us to monitor, gather the data back into our cloud services, and present that data for consumption, both to customers and to internal stakeholders, so that we can make decisions about what features we should be doing, how successful the rollouts are, things of that nature.

The last is on the feature and service offer enablement. We are bundling a lot of the services and capabilities that we're starting to offer together. Through the tools and mechanisms that we provide from a Device Cloud Engineering team, we're able to control those elements, to turn things on and off, as well as expire those services that we're offering.

That's really the core of what we're doing in Device Cloud Engineering. This is something that's been around for quite some time, and just recently, in these last two years, we've transitioned to really be focused on cloud. Prior to that, we were doing quite a bit with on-prem software as our solutions, but we've really transitioned.

Now I'm going to walk through what that evolution has been. When we started, like I mentioned, Motorola's been in the two-way radio business for quite some time. Back when we started, any customer configuration that was needed, so if you needed new channels in your device or you had new devices you were rolling out, really the only option you had was to come to the manufacturer. You worked with Motorola, and we would put in those configurations for you.

As technology progressed, trying to enable customers to do this on their own was the primary focus. So we created several different solutions over the years that were first DOS and then Windows-based commercial off-the-shelf software packages that would allow you to program one device at a time.

As those solutions were being used, obviously there were a lot more demands as fleets grew, and again, technology tends to mature. In 2010, we introduced a series of solutions that provided fleet capabilities and enterprise-grade solutions for our customers to manage fleets. Typical fleets for our public safety customers can range from a couple hundred devices for a police department up to 40,000 to 50,000 devices if we're talking about a state or a countrywide system. So the solutions that we had to provide had to be fairly scalable and mature as well as reliable for on-prem.

As our customers were using those, investing in our on-prem solution, servers, equipment, the demand for new features and capabilities that are very well aligned with the cloud started to come up, particularly around defining role-based access for their users and agency partitioning on the back end. You can imagine if you have 50,000 devices, you're not just trying to provide access to those to a very small number of people. You could have hundreds of different people that need to be able to configure, manage, and upgrade these devices. So those solutions were starting to reach their end of life for on-prem.

In 2018, Motorola Solutions started work on their next-generation two-way radio device, which is called APX Next. When we were first having discussions about this device, we knew that there were several new things and capabilities that were being provided with it that would provide us a great opportunity to mature the device management solutions that are used in order to manage it.

First of all, this is a mission-critical device. This is a two-way radio used by public safety, so no matter what, it has to work. This is the definition of mission-critical.

It's different than most of our other devices in two critical ways. One, it has an LTE pipe, and that LTE broadband data pipe allows us to provide services that we don't typically provide to these two-way radios. We can now do things like messaging and mapping and apps, and we can run an IoT framework on the device as well that would enable us to be able to manage it as more of an IoT device.

There's also a new UI. That picture had a nice big screen on it. That big screen enables us to do a whole lot more as well. But the focus for our organization and our services was around this streamlined ownership experience. This goes from reimagining what it was like for customers to go from ordering a device, looking for a device, to when we ship it to them, they open it out of the box, a lot of focus on the out-of-box experience, and then they're able to configure and manage it through its lifecycle.

As part of that, we introduced this new platform, RadioCentral, and this is the Motorola Solutions cloud device management platform of the future. We've got not just a way for us to be able to manage devices from anywhere and to do it over the air using the broadband pipe, but it also allows users to connect from wherever they're at in order to do this management. We can provide these newer features that I was referring to: role-based access, agency partitioning, all these great things that come with the cloud, including the scalability and the security and the reliability.

So it's really exciting for us. We're going to show this short video that summarizes the impact that RadioCentral has had on the APX Next device and ownership experience.

[Video plays.]

Next, I'll hand it off to Jonathan, who will walk us through the RadioCentral platform and our DevOps journey.

Jonathan Akers

Thank you, Ryan.

As Ryan and I were putting together the materials for this conference, it was really cool to see how far we've come in such a short amount of time. When you look at this timeframe of 2018 to 2020, we're really talking about the end of 2018 and the October timeframe to a lot of this being realized even as early as 2019.

Things like our release cadence: we came from a three-plus month off-ramp, not off the main line, to where we are today, which is on our main line, shipping from our main and promoting those changes to production at the end of the sprint or auto-promoting all the way to production.

We came from a mix of manual and automated tests to 100% automated. There are zero discussions of any manual tests in the things that we do today. This is built into the culture of the team. We get into meetings now, and we have passionate discussions about which kind of test strategy to employ, but it's never a discussion of whether we have time to add tests or not.

Key metrics: we used to track things purely like just the number of defects, the team velocity, or the number of successful nightly builds, to where we are today, which is focused on the DevOps key metrics.

Onboarding: back in 2018, this was not a primary focus of the team, but it's something that we made a focus for our team. We wanted to bring in developers and have them be able to make their first commit to production within their first week. When you set this as a goal, it touches everything from your CI/CD, to the ease of setting up a developer environment, to documentation and moving it closer to the source code and things like READMEs.

Ownership: we came from a place where we had shared ownership to where we are today, where the team asked for component ownership, either scrum team or individuals. They wanted to hold each other accountable, and that's something that came up in retrospectives from the team.

Tech debt: this used to be taken at an ad hoc nature, and it was always hard to prioritize, to where we are today, which is every sprint in sprint planning, we're talking about which pieces of tech debt should be taken by which of the scrum teams. As everyone knows, whenever you're building a product, even if you're starting from scratch, you're having to make a decision every sprint of how much tech debt you're biting off to keep that low and balancing that with hitting your product dates when you have to ship.

Incident response: we came from a place of having no incident response to where we are today, which is fully integrated with Runscope and PagerDuty.

I did miss our team initiative. Developers would have a chance to do some of their own initiatives, but now it's built into the culture where the team is enabled and empowered to go off and build things that are going to make their jobs easier. A lot of the ideas that we've had and what we're going to show you came from those retrospectives of the team, asking for time to go build things that'll help them get their jobs done. Things like our DevOps team portal, which we'll show you later, our self-service portal, which is a front end to our back-end APIs, and things like chatbot integration, etc.

This is your classic tech stack slide. This isn't just to show off the number of integrations or what we've done, but each of these have solved a problem for us. What's really cool is how far we've come, like I was saying, in such a short amount of time. None of this was in play in 2018, but now we are utilizing all of this in 2020.

What we really want to show is our DevOps team portal. Again, this is something that came from the team and from things that would make their jobs easier. For the team, it shows them really buying into the DevOps concept, that they own everything about the product.

It is our one-stop shop for automated DevOps metrics. We use it to track our production costs over time. That's something I won't be able to share today, but it is something we take very seriously. Our team owns our production costs, and we sometimes even re-architect in order to drive costs down. We are watching it constantly.

We can track things like the deployment health, component ownership, our delivery performance metrics, and our CI/CD pipeline health.

I think everyone in this conference is probably very familiar with the DORA report and this chart. We use this as a metric for us to find out how we're doing in the DevOps world. This really was a great insight for the team to see, to understand, and then actually automate, and we'll show that later.

Doing a little self-reflection: our team for our deployment frequency is rated high. We came from medium, so we've improved in that category. For lead time for changes, we're medium today. Time to restore service, we're high. Change failure rate, we're also high. But the point was to figure out how we can get to elite and slowly keep moving the chain to see if we can hit those elite levels. We set an organizational goal to hit the elite level in 2020, and we're going through an architecture change. We're overhauling our CI/CD and everything in order to align with this goal for our team.

As part of that elite initiative, some of this got built into that DevOps team portal. We host all of this data for what we're going to show you here in that portal for all the developers to see and check.

This is our deployment frequency, and this chart is showing you where we came from in 2018 to where we are today, which were big-bang promotions of all of our microservices at one time back in 2018, to smaller, more frequent deployments that are happening more frequently.

For our evolution through this, we started with manual promotions to all of our environments. At some point, we turned on auto-promotion all the way through our staging environment, and that caused a lot of problems. We learned a lot of lessons when we did that. We actually had to back out that change because we broke teams that were dependent on us at the time. But we gathered all the data and all the things that we would have to go fix, and we tackled them over the next couple sprints. Once we turned on that auto-promotion back to stage, we've never had to turn it back off.

We also had a goal and a mandate that all new services are auto-promoted to production from the start.

For the lead time metrics, this is also hosted in our custom DevOps team portal. As you can see here, this graph is showing you all of our microservices and their average times for deploying work items to production and the lead time. We have an average of 19 days, but you can see here that we're actually highlighting that we have a minimum of 3.4 minutes. There are some microservices that are auto-promoting all the way to production, and those can reach in 3.4 minutes. The other ones that are not auto-deploying to production have to wait until we have a promotion.

In our time to restore metric, this is hosted in PagerDuty currently, and we do review this weekly with our team. What we noticed from the graphs below, you could see that where we came from was we had a group of five people on PagerDuty on call. Over time, we realized that the number of incidents that were slipping through that team without an acknowledgement or without a resolution were getting escalated much more frequently. So what we did was we actually switched to having one person on call.

Once we did that, you can see the green bar, which is the mean time to resolve the incident at the time, dropped to close to zero. The red bar, which is the mean time to acknowledge the incident, also dropped close to zero. And then our number of incidents, we actually were working on getting better over time.

The change failure rate metric, we are still working on automating this. If there's anyone at the conference that has some ideas around this metric, let us know, but this is something that we're currently working on.

One of the other things we wanted to highlight was this component ownership piece. In our DevOps team portal, we actually have cards for every single microservice. Within those cards, and we're showing one here, in the top left, you can see this is the firmware operations service. We have a primary owner assigned, a backup owner, and then we call out the things that are shown in our static code analysis: things like the bugs, the code coverage, the code smells, the complexity.

One of the most important pieces for us is the deployment status. We built this due to the fact that the team was having a hard time keeping track of what versions of what microservices are deployed where. As soon as you start spreading out and having more environments and more regions, this was really hard to keep track of. In this example, it's actually highlighting in yellow that in the development environment, there's a different version that's deployed than the latest CI version that ran. This has been invaluable in our day-to-day operations.

This team in this example is actually getting a daily email telling them, "Hey, there's a discrepancy. You need to go look at it."

Ryan Dobson

Great. Thanks, Jonathan.

Now we want to close things out a little bit. All the great things that we've done with RadioCentral and the stuff that Jonathan walked through really doesn't matter unless it impacts our customers and makes their lives better.

Here are two examples of some feedback that we got as we started to roll this out. The things that we're talking about, IoT and DevOps within the consumer space, these are fairly well-known, but to the mission-critical public safety customers, these are completely new. In this industry, we're really pushing the envelope. In order to do that, it really takes a lot of trust from the senior leaders and from our customers that the solutions that we're going to deliver are not only going to meet their needs, but to exceed them.

With that, I'm just going to wrap things up. A couple things. One, advancing the lifeline with DevOps: this has been a journey we're on. We still have quite a ways to go. I wanted to first acknowledge the incredible team that we have, that Jonathan and I work with. We get the privilege of being here and to present and share the story, but we have a team of rock stars that created and deliver and continue to surprise us with new ways that they're coming up with in order to push the envelope, and we love it. I'll speak for both of us on that one, Jonathan.

Also, a thank you to Gene and the DevOps Enterprise Summit committee for selecting us. Being able to share our story, it means a lot to us. We hope that some of the stuff that we shared with you today, you can take back, challenge your team with, ask questions. We're certainly available, both Jonathan and I. Reach out to us. We'd love to talk, learn from you more, as well as share what we know. With that, thank you and have a great day.

Jonathan Akers

Thank you. Bye.