Industrial DevOps - Building Better Systems Faster
In today's rapidly evolving landscape, organizations face the ongoing challenge of building better systems faster to remain competitive. Traditional development and operations (DevOps) practices have proven effective in improving collaboration for software and operations. How can we take the benefits that software has achieved to the system level?
Dr. Suzette Johnson and Robin Yeman have been pioneering the research in collaboration with IT Revolution and a number of industry experts over the last 5 years and are now releasing a book with a set of proven success patterns in implementing DevOps at the System Level. The team has focused on large-scale safety-critical cyber-physical systems throughout their careers at large tier one defense contractors such as Lockheed Martin and Northrop Grumman building everything from Submarines to Satellites and everything in between.
Throughout their journey, they have borrowed from key bodies of knowledge including systems thinking, design thinking, value steam mapping, lean, model-based systems engineering, and of course, Agile and DevOps. They will describe how implementing Industrial DevOps for cyber-physical systems requires a holistic approach that spans hardware, software, and the physical environment. The approach requires beginning with test, leveraging multiple horizons of planning, managing flow, utilizing the entire digital engineering value stream, and, most importantly a focus on people and culture. By adopting Industrial DevOps, organizations can learn faster, accelerate the development of CPS, enhance system reliability, and enable rapid adaptation to changing requirements.
Chapters
Full transcript
The complete talk, organized by section.
Robin Yeman
Are we ready? All right. All right. Yay. Okay. First test of the day. Awesome. It works. I love it when that happens.
How about you introduce yourself first?
Suzette Johnson
Hello, everyone. Thank you for being here. We know you had a lot of choices, and we appreciate your time and participation.
I am Dr. Suzette Johnson. I work for Northrop Grumman. I am a fellow, and I specialize in lean, agile, and applying lean-agile practices across the enterprise, but specifically into space systems and our cyber-physical systems.
Robin Yeman
Awesome. My name's Robin Yeman, and I'm her partner in crime. I actually work at Carnegie Mellon Software Engineering Institute. I'm having a fabulous time there as the space domain lead. And I get to spend time working with a whole variety of government customers, again, on how to enable them to do cool digital things in more legacy environments. So it comes back to: how do I apply agile and DevOps to cyber-physical, safety-critical systems?
All right, so why do we care? Why do we care about industrial DevOps?
Suzette Johnson
Well, that's because as we've been building out our systems and going from software into systems that also include hardware and manufacturing, we're still seeking to achieve and meet business outcomes.
So we want faster time to market, faster time to delivery, business resilience. That is our ability to be able to respond and adapt to changing needs. Scalability. I have a feeling all of you in the room resonate with these practices, but the difference is we're moving beyond software. And what does it mean in our environment where we have this level of hardware and manufacturing that we need to take into account, and the long-lead items?
Robin Yeman
So let's talk about some cool early adopters, and I bet you guys are early adopters as well. But some of the favorites that we've run into so far: obviously BMW. We all have read Mik Kersten's Project to Product, which is awesome. We've got SpaceX right here down in front, as well as Tesla. Formula One has shown some amazing business outcomes, so you're going to see some things in our book based on that.
Relativity Space, they actually launched this year an entire rocket that was 3D printed. Not 3D printed to test, but 3D printed and went to space. And then I'm also a huge fan of Planet Labs. Planet Labs goes ahead and they launch every two to three months. They have a large number, I want to say 400 to 500 CubeSats, that basically are looking at Earth observation, stuff like that.
But these companies all have something in common, which is they've taken agile and DevOps practices and they moved them beyond software. That wasn't where they stopped, so that they could actually deliver capabilities at the speed of relevance.
Suzette Johnson
So based on what we've learned from industry, and working with other industry partners and some folks even in this room, as we've been talking about this over the years, we have developed nine principles that we find are common principles in these environments that we can leverage.
So the first one is organizing around flow of value. Apply multiple horizons of planning. Implement data-driven decisions. Architect for speed and change. Iterate, manage queues, and create flow of delivery. Cadence and synchronization. Integrating early and often. Shift left. And apply a growth mindset.
And what we're going to do now is we're going to actually walk through each of these one by one and give you a little bit more information about why it's important and how it works. All right?
So the first one we have is organizing around the flow of value. This might seem very logical. Well, of course we're going to organize around flow of value. But this is a shift in mindset and how we've been working traditionally in some of these environments that have been around for a long time, and they've been used to organizing around functional areas, where we have handoffs from requirements, and then a handoff to design, and then a handoff to software.
And what we're doing is instead, let's take a mental shift. And instead of organizing by functions, let's start organizing by product. So therefore we can organize our teams to develop integrated solutions that can be demonstrated. And again, that takes a little bit of mindset shift of how do we actually, what are the steps that we take in delivery of our product, and how do we actually organize our teams around that delivery?
Robin Yeman
All right. So how many of you guys have been doing agile for years? All right.
How many of you leverage an integrated master schedule? Agilists don't do that, right? We don't necessarily like that yet. But here's where I'm going to go all rogue on you.
If I'm building a large cyber-physical, safety-critical system, typically these programs are going to be multi-year programs. It's not a one and done. It's not a, I can plan two weeks at a time.
That being said, do I want an integrated master schedule that says where Robin's going to be in October 2027, on Thursday of the last week of the month? We know we've seen those schedules, right? It's totally made up. I'm just telling you it's not real. I won't be there.
But I do want to leverage what I call multiple horizons of planning. And here's what I was trying to show around Artemis and the launch. It's not that they just planned in two weeks and they just launched. There's lead time.
So in order to deliver multiple horizons of planning, what I am doing is I'm building a large schedule, but high-level big rocks. So think of potentially a satellite or a rocket launch, typically a five- to six-year program. So I'm going to have a five-year plan, high level. I will break that down into an annual plan, because I have to. I'll break that down into a quarterly plan, because I really need to see that data. I'll break that down into a sprint plan, one to two weeks, and then I'll further break that down into a daily plan.
Now, here's the coolest thing in the world. I can use the data in that daily plan to further inform my sprint plan. So if Suzette here happens to plan 10 things to do, but she completes six, it's happened, right? We've done that before. But she does that a couple times during the sprint. What do you think percent complete is at the sprint when we finish? Sixty percent. There is no magic catch-up at the end.
So I want to take that and share that data with people sooner than later, and further inform the schedule. So the key difference between what I'm talking about and an integrated master schedule is I have regular timelines where I'm reviewing that plan with empirical data to further inform the plan. So it is flexible, but I have to have a plan.
Suzette Johnson
Which ties nicely into the next principle of implementing data-driven decisions. So a lot of times we are very much driven by schedule, which we can appreciate, because sometimes we do have a launch date or a test flight date that we need to hit. So I can appreciate that.
But as we are making and understanding our progress against those schedules, we want to understand: how are we doing? We're really great at understanding how we're doing at the team level. But as we scale, the question is: how are we doing in terms of an integrated system?
And we want to take that objective evidence against work completed, not just the amount of documentation we have done or if our design reviews are done. Those are things that are important and necessary, but what's really most important in terms of understanding progress is the integration. And we can measure our progress and our success about that.
In addition, as Robin was talking about, as we are having all these demos, we can understand the progress that we have. How are we using that data to inform the next level of decision-making?
Also, with this, we have big visible charts. We've been doing this for a long time, especially in our software environments, where you would always watch the continuous integration points, and you can always see the progress of things being made, when things were broken, how things were up and running.
But we want to continue to carry that through the value stream, through the integrated system level, so that we can actually see not just how we're doing at the team level, how are we doing at the integration level, and where the bottlenecks are in that decision or in that pipeline to make decisions in terms of what we need to work on next.
Because maybe the thing that we need to work on next isn't more features. Maybe it's actually to go focus on integration and test to remove bottlenecks, because it's important that we continue to have flow across our delivery pipeline.
Robin Yeman
All right. So I'm going to tell you something you guys already know. Yep, we have to architect for change and speed. But what does that mean? Well, first thing I would say is context matters. We have to make sure that we're thinking from that perspective.
So the answer may not be, in every case, for everything, that I can do microservices for everything. It may be the answer. It may not be. Real-time embedded systems, that's going to cause lag and delay, so we have to overcome that.
But we can create modularized components. We can also create standardized interfaces, and we should do that for both hardware and software. Interesting thing is, every time I get a new plug or a new adapter for my Mac or my iPhone, I'm annoyed. I'm like, "God darn it, I got to buy another charger." We want to keep those standards. Do we really have to change that every 15 minutes? No. We can leave that alone.
Allow the innovation and the creativity to occur inside those modules, inside those areas. And then also allow this to be an option, at least in my world, to commoditize some of these parts. In the past, large defense contractors, we build every single bus to launch a rocket as a one-off. Why would we do that? We've got to redo that architecture every single time.
We can commoditize some of these components. Now we can do product-line engineering, just like automotive. And we want to bring that to, basically, cyber-physical systems. So some of you guys may already know this, but we're kind of repeating this back.
Suzette Johnson
All right, so our next principle focuses on iterations, managing queues, and creating flow. Now, most of you, I think if not everyone in this room, raised your hand that, hey, we're used to doing agile and we're very familiar with it.
But this is interesting when you start getting into cyber-physical systems, because now you are doing it at scale. And you should probably also start seeing how some of these principles are related. In order to iterate, we need that multiple horizons of planning, because we're going to iterate at that two-week level or whatever that cadence might be for that particular product.
But the importance of iterations, well, I'll tell you a story. Sometimes when I go out to work with programs, people will ask, "How long should my iteration length be? I hear it's two weeks. Does it need to be two weeks?" And my response back is, "How fast do you need to learn? How often do you want to integrate to get the feedback of what's working and not working?"
And that will help figure out how often you want to iterate and get that feedback. We need to be able to manage queues because, again, it goes back to the bottlenecks, because that bottleneck is always shifting. As we make improvements in one area, as we improve our software, as we improve our automation and software, it's often creating a bottleneck somewhere else in the system.
So as we're iterating, we're also having to manage the queues and watching the flow of work that's going across that pipeline. And that iteration, again, it goes back to the ability to learn.
We've been doing this, actually, in hardware for, you know, we can go back all the way to 10 years. There's an area where we were working on the launch abort motors for the Orion system. And what happened is the launch system was too heavy, so they had to reduce the weight of that part of the system.
So what they did is they created simulation environments, and they just kept iterating and learning and testing out new materials in a simulated environment until they could get to the weight that they needed and meet that requirement. So even in hardware, we've had the ability to iterate and learn.
Robin Yeman
All right. The next two are a little less talked about in our community, I mean, in some cases, but a little less talked about: cadence and synchronization.
So let me talk about manufacturing first. What is one of the things we want to do with lean manufacturing? What do you guys think? Who wants to say?
Audience
Eliminate waste.
Robin Yeman
Eliminate waste. We want elimination of waste, and we want to reduce all variability.
Now what are we talking about? It's different in product development. So product development, we want to eliminate waste and reduce bad variability, but exploit good variability. And that is critical in order to actually get this right, get these things out the door.
So Don Reinertsen's book The Principles of Product Development Flow will tell you exactly how to do this. And basically, one of the things we want to do is put all of our teams on the same cadence, the same synchronization. We want to begin with the bottleneck because we know we can't go any faster than that.
That's why I'm showing you guys the drum, rope, and buffer. Meaning, if I'm going really fast, but I've got six things that are never going to be able to go that fast, then I can only go as slow as my slowest component.
But one of the key things that I found is it eliminates noise in the system. It's very hard to find that good variability that I can exploit if I just have a ton of noise. I can't look at it. But if I put everybody on cadence and synchronization, I've removed some noise from the system, I've got the system moving at the same level, and now I can remove bad variability and exploit good variability for better product development.
Suzette Johnson
So along with our iterations and our cadence and our synchronization comes along the integration, and it ties into that concept of how long does my iteration need to be?
So with integration, we want to integrate early and often. With our software, we're integrating daily. But now we have these systems, these complex systems, that we want to be integrating across the system, so that we can see across our teams where the integration points are well defined, how we are progressing at those integration points, and how close we are to, you can see here we're looking at different MVPs and getting that feedback of what's working through that cycle. We continue to make improvements.
So the principles that we've talked about, you can kind of see the relationship of: I need the multiple horizons of planning. I need to understand my cadence and synchronization, which ties into the iterative development, and then the integration of frequency to get that feedback.
Robin Yeman
All right, here's my car. Well, it's not really my car, but I'm hoping to get one of these McLarens someday. I don't know.
Anyway, what is this slide actually talking about? Shift left. Now, we've heard this before. And in software, we would say test-driven development, acceptance test-driven development, all those things. That's exactly what we want to do in hardware.
What does that mean? Well, the coolest thing has happened over the last decade. Technologies in areas like materials and digital twins and simulations have gotten even better, meaning I can take physical systems and move them into cyberspace and actually run a bajillion simulations early, before I've built anything or before I've built much.
So again, places like Formula One have revolutionized this, and now you're seeing it a lot in different car manufacturing. So you've got Mercedes, et cetera. We want to leverage these digital twins, digital shadows, digital ecosystems in order to get validation on the system earlier in the cycle, as opposed to building it and then testing it.
Have you guys ever actually tested something at the end and had it work the first time? I know, it never happened to me either. I just figured I was on the wrong team, but it probably happened to somebody.
Suzette Johnson
All right, so how many of you have heard about a growth mindset? Okay, it's most of us, but not all of us.
So there's this concept that's been published about a fixed mindset versus a growth mindset. And a fixed mindset kind of means, I'm the way I am, the way I was born. I'm fixed. I can't learn. And I have limited capability here.
A growth mindset means that I have the opportunity to continue to grow and to learn and take advantage of those new opportunities. So when we talk about fail fast, fail early, recover quickly, all of that learning, we're applying a growth mindset because it ties closely into continuous improvement as well.
So we want to make sure we're creating a culture where we can get feedback. So we've talked about fail fast, fail early. We also need a culture of psychological safety that feeds into this, so that we can take advantage of that learning and then again feed it into that next cycle to make us even better.
And it's that mindset of continuous improvement, of learning from those mistakes, that makes us better and makes us better individually, but also makes us better in the long run with our product delivery.
Robin Yeman
Tell us about getting started, Suzette.
Suzette Johnson
So with getting started, one of the things that we talk about, because when we were writing the book, we're writing about all of these different principles and giving examples. And every chapter has a getting started section and some coaching tips. And then at the end, we put it together in this graphic of how do we actually get started? What are some things that are steps that I could take?
And it's not all-inclusive, but it is the getting started steps that can help you with your journey.
So one of the things we talk about is building a generative culture and leading by example. That generative culture where we apply a growth mindset, where we are driven by mission, where we practice decentralized decision-making, and we're all working together towards that common goal.
And then we say, okay, we've got the culture in place, and we're building out that cultural roadmap for continuous improvement. What do I need to do in terms of just some foundations?
Well, who's helping to champion the change in the organization or in your particular product? What do we need to do to understand the current state and drive desired improvements? And then we go through each of these with each of our principles, like organize your structure for flow. Maybe that's the first thing on your checklist, is let's see how we're organized. Am I organized by function? Is there a way to improve that organizational structure? Do we need to take a more productized perspective?
We want to, of course, refactor our planning for multiple horizons of planning. So sometimes, in most organizations, as we're building big cyber-physical systems, there is a longer-term roadmap. But are we allowing the flexibility in that roadmap planning so that as we iterate, we're learning and improving that plan?
Robin Yeman
Or we can just beat you until you get back on track. That works.
Suzette Johnson
Right. That works. I heard that.
All right, so I'm not going to read the chart. You can read it. But each of these is a step you can take on an improvement.
And then lastly is a nine. I just want to talk about, in case you're not familiar with, use the improvement kata model to drive continuous improvement. And that is what ties back into the last principle of growth mindset.
How many, anyone familiar with the improvement kata? A few of us. That comes from the lean community. So we do address that in the book. And it talks about, as you're making improvements, the first thing you need to do is understand your current state. Where are you now?
Because it's not about just following a checklist. It's about understanding where you are today. What are your strengths? What improvements do you want to make? And then you're kind of carving out that future of expectations and what that future could look like. And the kata helps develop those steps for getting to that next stage.
So I recommend starting there, because you don't have to tackle everything at once, but always start by knowing where you are now and understanding the existing culture.
Robin Yeman
So what can we use help with? Because this is what Gene always wants us to come out with. And we actually have a whole bunch, because as we went through this book, I think we learned all of the things that we don't know. Actually, I probably feel less intelligent than I did when I started, because I realized how much I don't know.
So case studies in different domains. Are you in healthcare? Are you in automotive? Are you in nuclear? Are you in whatever your environment is? Please, let's do a case study. Either you already have one, or we could do one with you.
Test the practices and tell us which ones don't apply. Maybe some don't apply to your environment. Or, oh, by the way, maybe we forgot something. Now it's unlikely because I'm always right, but if I wasn't, it could happen.
What data is missing? What isn't there? I posted something about architecture, and I said, "Tell me all the things that are missing." Oh my God, that post must have gone on for days. I just learned that every architect in the known man knows that I'm not that bright. So there you go.
Information on what technologies are missing. I'm just learning about generative AI and how to use that. I've had some great mentors and things like that. One of the algorithms they were showing us, they said, "Oh, you could use this, but that's 30 days old. It's deprecated." I was like, "What? What is even going on here?"
Suzette Johnson
Yeah. And at lunch we were talking about manufacturing. So that's another area as we shift left manufacturing and what does that mean? So if anyone is in the manufacturing part of this, I would love to talk with you more as we continue to shift left.
We have some papers. They've been out for, over the years we've been growing them. But most importantly, we have our book that's coming out.
Robin Yeman
And here's a cool QR code. It'll give you the first chapter, although we are going to do book signing tonight, which I'm really excited. Please get in our line, because we noticed that last year not all the lines were equal, and we are feeling a little insecure.
Suzette had a plan that she could just plant a couple of people and have them keep rotating back in line. I like it. But if you're in line, so definitely, please check us out there.
The other thing is, we're going to be doing a learning sprint in not too long, about 30 minutes or so. And we've got Hassan, who's helped us with a number of the papers, showing us how we do continuous integration with an FPGA. Which is, some people would argue hardware, firmware, but yes. Anyway, it's getting started with that.
Do we have time for questions? No? No. All right. If you have questions, you're going to have to find us. Just us back there. I can't, you know what, it's so bright.
Suzette Johnson
I heard there was a book signing, and if you want to stand in line to ask questions.
Robin Yeman
Brilliant. All right, thanks, guys. Appreciate it.
Suzette Johnson
Thank you.