DevOps Insights for the Executable Digital Twin

Log in to watch

Amsterdam 2023

Download slides

DevOps Insights for the Executable Digital Twin

Bernhard Sputh

Senior Software Engineer Model-based System Testing · Siemens Digital Industries Software

Roland Pastorino

Product Manager Model-based System Testing · Siemens Digital Industries Software

The Executable Digital Twin is a technology jump enabling new and disruptive ways of creating complex mechatronic products. Digital Twins are models of physical systems found in the real world. They exhibit the same behavior in amplitude and time than the physical systems they represent. In the Engineering domain, Digital Twins have long been used for computer-aided design during the product creation process for cars, aircrafts, machinery and many more systems.

The Executable Digital Twin brings a new level of interactivity to Digital Twins by focusing on their connectivity to the real world. As a consequence, the development of engineering solutions enabling the Executable Digital Twin has become increasingly complex. This is largely due to the wider range of technologies required to create such engineering solutions and to the large variety of use-cases. Mastering this complexity is paramount to enable the Executable Digital Twin to bring its benefits to many industries. This is where DevOps comes in.

Chapters

Full transcript

The complete talk, organized by section.

Roland Pastorino

[00:00:12.790] Good afternoon, everyone. Welcome to this talk. I'm Roland Pastorino. I'm here together with my colleague Bernhard Sputh. Hello. We have prepared a talk for you related to DevOps and the executable digital twin. So quite a different talk and a topic from what you have seen from other companies.

[00:00:33.540] Allow me to just explain a tiny bit what we do at Siemens. We are from Siemens Digital Industries Software. We make simulation and test solutions for the R&D of vehicles, aircraft, ships, machinery, off-highway vehicles, et cetera. To come to this conference, you took a car, a train, a plane. Very likely it has been designed with our systems, simulated with our systems, tested for acoustics, durability, vibration with our systems. So that's what we do in a nutshell. That's our context: automotive industry, aerospace industry, defense, mechanical. We come from that background and we found that DevOps was a very interesting approach for us, and we will show you how we have been using it in our context.

[00:01:29.140] To understand that, you'll have to understand first what an executable digital twin is. I'll first go into that and then we will give you the insights. The executable digital twin is the executable form of a digital twin. A digital twin is the mathematical representation of a physical system, a real system, a car, an aircraft, et cetera. And the executable form is to execute it on different execution platforms. This digital twin is deployed alongside or as part of a physical system under test. Most importantly, it creates a live or real-time connection between a virtual world and a physical world. That's a very, very important point. Once we have an executable digital twin, we can predict and optimize the performance and the behavior of the real system. That's the industry where we live, and that's how we mix simulation and physical systems for the testing. We are basically doing some kind of mechanical DevOps, if you want.

[00:02:46.020] Now that you know what an executable digital twin is, let's look at how we use that. I have two examples for you today. The one on the left side is related to the testing of propulsion systems of electric vehicles. Some of them have in-wheel motors, so electric motors inside each wheel, and we have to test the propulsion control units of that system. You should see a car, which is virtual, and the black parts that you see, the kind of black belts, those are the in-wheel motors. There you go. Those are the in-wheel motors. That's inside the rim of the wheel of the electric vehicle. That's the executable digital twin. We are basically having a virtual-physical test. We have a virtual vehicle driving on a virtual road, controlling the bench and testing the propulsion control unit of the vehicle to make sure that if something fails, the vehicle reacts properly and safely.

[00:04:00.670] This is only one example for electric vehicles. We also can apply that in completely different industries. On the right side, you probably have never seen that before. This is a wind turbine blade that is on a test bench for durability testing. Turbine blades have to be tested for durability. There, you have to put a lot of sensors on that blade. It's time-consuming, costly, et cetera. What we are doing there is making what we call finite element models of the blade, structural simulation models of the blade, and we are feeding that one, you see now on an iPad, the executable digital twin. We are connecting the physical sensors to that simulation model to create virtual sensors. This is the value of the executable digital twin.

[00:04:49.030] This brings us to how do I get one of those? You have to create one. You either start from an existing physical system, an existing car, an existing aircraft, or an upcoming one, a new generation of this truck, for example. You have to represent or model different physics: the electromagnetic behavior, the acoustic behavior, the vibration behavior, the hydraulics of your system. You have to include all those physics inside that digital twin. Once you have done that, you have to validate the accuracy of it. Once you have done that again, you have to package it like in a file format way, package it, and deploy that on an execution platform. This is where the link to DevOps starts. We are going to look at the first one, the real-time edge device. In our team, we make the firmware for that, and we apply the DevOps approach to it. We also have other platforms like IoT devices and cloud devices.

[00:05:54.300] Where did we start our DevOps journey? That was Q4 calendar year 2018. What was the situation at that moment? We had new concepts and new difficulties. The concept of the executable digital twin was not really established, but it was crystallizing at that moment. We were facing the fact that there was a very large variety of digital twins on the market from our customers, very, very large, and this was creating quite some difficulties to ideate the needs of their execution because we had to make the execution platform for those digital twins. But the variety was just huge.

[00:06:39.540] On top of that, we are making industrial solutions that once deployed at a customer, typically in test cells that are confidential, not connected to the internet, et cetera, the system stays there for 10, 15 years. So a completely different context than the one you've seen in other talks. And the product incubation phase had reached its limits in terms of quality, expandability. On top of that, we had to take into account other factors for industrial solutions: safety, security, traceability for, again, 10, 15 years, so that if something goes wrong in 10 years from now, we can rebuild the system, fix the software error, and deploy that at the customer worldwide.

[00:07:24.860] On the code base side and stakeholder side, it was also pretty challenging. The code base was growing very quickly, but we did not have all the processes in place. You have to act now because if it goes too big, then it's just too late. You cannot recover from that one. And the number of stakeholders, as the maturity of the product was increasing, we had just more of them: the developers, the testers, the higher management, the customers, the suppliers, et cetera. Of course, as any industrial solution, especially in our company, we have to scale. We have to scale in terms of the variety of the digital twins, different applications where we connect them to different test benches, physical systems. We have to scale across applications, and we have to make sure that the product development processes are stable for 5, 10, 15 years.

[00:08:17.550] This brings us to some insights we wanted to share with you. We have many more, but we selected a few of them, six of them. I will present the first two. I will let the rest to my colleague Bernhard, and I will then conclude the talk. Let's start with the first one. We have seen that already in other presentations: a very important one for us, the team culture. Out of the team culture we have put in place since 2018, we have extracted five essential values that really worked for us.

[00:08:46.100] The first one, and again, we have heard that before: knowledge is paramount for us. We are creating deep engineering knowledge that we need to nurture every single day at the team level, at individual levels, and for years, again 5, 10, 15 years. Team goals always prevail over individual goals. They always come first. That's super important to understand. Without the team, we do nothing. Value mindset: we train every team member every single day to look at every one of their tasks and say, how is this task bringing value? If we cannot answer that question, we shouldn't do that task. Responsibility on individual level, very important, but also on team level, feeling responsible for the work of colleagues and the team internally, in a very big organization of course, and also externally. Openness is at the core of our processes. We have to be open on what we do, when we do, no-blame culture, be open, share, because this is what makes us fast.

[00:09:51.940] Out of these five values, we have extracted the impacts that we experience, especially in the last years. The first one is that we are able to build expertise that is constantly revised and adapted. The expertise that we need today in 2023 is not anymore the same that we needed in 2018 or in 2019. We have in the team a long-lasting trust that is in place: if you have a problem, I can help you; you helped me before. This is really working very well. Getting the right thing done on time at team and individual levels is very important, and that's the value mindset. If you know that what you're doing is bringing value, we are doing the right thing.

[00:10:37.660] Technical debt, we have heard that before also, it's very important for us to start not to create it. Nonetheless, a bit of it will be created, and just after, we have to reduce it or even eliminate it. The final one, very, very important: psychological safety is established in the team. We report problems, again, with no-blame culture, and this contributes to a sense of ownership in the team and higher motivation. That was for the team culture that took us at least five years to put in place.

[00:11:18.660] If we look at the processes, again we have extracted different highlights for those processes. The first one is automate, automate, automate. Automate everything, not only code. We automate any repeating tasks: meeting reports, pending work overview, problem reports, ticket cleanups, everything, presentation making. This is saving us a lot of time. We review the processes almost on a weekly basis. They change so fast that the processes already in place in 2020 are not valid anymore. In 2020 we were at home, it was COVID. That's not the case anymore. We are working in a different way.

[00:11:57.940] Communication: we had specific trainings on communication to communicate effectively, clearly, and timely at all times, in all types of meetings, on progress reports, on the requirements. Requirements are to be clear. Specifications, test cases, code reviews. Clear and lean processes: whatever is not useful, we just remove it, clean it immediately. And then the last one is assignments. We have to make sure that there is no ambiguity in who has to do what. We assign a task to one and only one person, and that person is in charge of driving the task to completion.

[00:12:41.630] Five impacts we experience out of that. First one is agility. It's maintained. Our code base is growing, but we are as agile as a few years ago. We even reached the point that we enter into some meetings, we have no preparation for the meeting, and we can prepare the content of the meeting while the meeting is running. Performance of the team is maximized because whatever is not necessary is not done. If someone is not really fit for the task, we change the assignment. There is no stress. If you have a problem, follow the processes. Trust the system. We have created safety in the software creation process. It's very, very difficult to create software with our processes that is not right. And motivation: when you wake up in the morning, you go to work, you know why you go to work, what you're going to do, how you're going to do it, what's the value of it, and by when you have to deliver it. This brings me to the third insight, and I give you the word.

Bernhard Sputh

[00:13:54.850] Thank you. Our third insight that I want to share is the software architecture for DevOps that we have developed over the last five years. We start in the team by doing, whenever we have a new idea that we want to implement, a collaborative technology analysis. That means the team gets together. We together decide what technologies to use, what are the pros, what are the cons. This is followed by a collaborative software design where, again, the team comes together and we decide how we want to change our current implementation, how we have to adapt it, how we have to extend it.

[00:14:29.460] After this, we record our decisions by creating requirements, by creating specifications, or software requirements and test cases, so that we know exactly what it is that we want to implement, how do we want to implement it, and how are we going to test this. This is vital for us. Once we have all of this, we come to the implementation part, and there we apply a divide-and-conquer approach. We don't do like, oh yeah, there's this new feature and you are going to implement it. No. Very often it's divided up in small chunks. Then these small chunks get implemented by multiple people, simply so that we have smaller MRs, less review overhead.

[00:15:08.950] This gives us some benefits. On the first-hand side, we achieve a level of clarity because we know exactly how we want to modify the software architecture, how it should look in the end. Also, the team benefits because due to the collaborative design and technology analysis, everybody feels, I belong to this architecture, we have a buy-in. It's not like I have to do this because somebody higher up decided that, who is smarter than me. No, the team decides that. So the team has the buy-in. We all pull together to do the implementation and do the decisions.

[00:15:42.310] Also, due to this approach, we are not having this problem of not-invented-here or code ownership. Nobody owns the code. We all own the code. We all know the code, and we all can, in case of emergencies, incidents, fix the incident, fix the problem. How did it change our architecture overall? We ended up with a microservice architecture with many, many small little repositories, hundreds of small repositories, each of about five megabytes of code and test assets. Which means we can also do this divide-and-conquer very easily because you do an MR in this repository, somebody else does the MR in another repository, so we can parallelize the work and therefore we get a higher creation flow so we can really push features fast if needed.

[00:16:35.760] Which brings me now to the next insight, and that's the importance of requirement, specification, and test case management. This is the heart of our process. We start from system requirements, where we define this is what the system shall do. We come to software requirements, where we define this is how it will do it, how it will be implemented. From these two levels of requirements, we come to the test cases: this is how we test it. And then we come to the code base, and the code base, we say these requirements have to be implemented here, and we refer to our requirements in the code base. This is very vital.

[00:17:15.900] What does that give us? The positive impacts that we experience from this approach is we achieve traceability. We can see this requirement is implemented here and here. Also from the code, we can see, oh, this line of code is here because of these requirements. This is important in the industry that we are in because these are test benches, these are physically moving things, safety critical potentially. Also, our requirements management system provides the ground truth of what shall be in the product, how it will be implemented, and how we test it. And it also acts as our second brain because we record in this requirement management system the decisions, who took them, and when they were taken. In the end, we get clarity for all stakeholders because we can always refer to the requirements management system and say, yeah, this is what it should do, this is how it does it, and this is how it's being tested.

[00:18:21.690] What you see on this slide here is the evolution of all these requirements, test cases over time. If you can see, in 2019 we were nowhere. Mid of 2019, suddenly we started to use the system and it has been growing steadily since then. We now have just over 7,000 work items, so software requirements, system requirements, test cases, and we have 800 test cases that we run basically continuously.

[00:18:54.250] DevOps insight number five is continuous learning. It's at our heart. We saw it in the culture: knowledge is everything. How do we get the knowledge into the team? How do we share knowledge in the team? The first thing is we have periodic meetings. We're not talking about standups. We have them less often, but we have them. This way we keep the team up to date: what's the current status? What's on the roadmap? So what's next? Also, a previous talk mentioned pair programming. Well, we do code reviews. This way you get the knowledge from the person that implemented the feature, really the implementation details, to the person that reviews it. In the team we do more than a thousand code reviews per year.

[00:19:42.540] Then something also very important for us is postmortem discussions. We had an incident. What was actually the problem? How was it solved? Share this information. This is important because we might see a similar thing happening in the future, and then we need to be aware: last time it was this, let me check there first to get the response time down. Also, we practice periodic active sharing in the team. That means every team member has the opportunity multiple times per week to say, oh yeah, I found something new here. This can be interesting to the team. This is not always Nobel Prize matter of information. It can be like, oh, I found this nice plugin here, or have you seen this option here in Office? But this helps to get our productivity up because if it saves this person a minute, multiply it by the team.

[00:20:40.690] Also, we participate in research projects in two ways with our product. We learn how the product is used, what are the shortcomings, but also for the product, like we are now currently in a research project related to testing: simply, how can we test our product better?

[00:21:02.780] Coming to the next DevOps insight: automatic exploratory testing. This was a question we were asking ourselves. Our problem is we have the system here. We have a lot of tests. We saw that we have over 800 tests, but these tests only find what we put into them, what we could think of: we need to test for this. This is vital, this is very important, but it's not enough. We want to find problems before they hit us in the face. So we are looking for the unknown errors, and humans are very bad at coming up with all these crazy things to test. Our approach is we automatically explore the test space using tools such as fuzzing, which some of you might know from security. You bombard a system, you see what happens.

[00:21:52.980] This is some very interesting insight for us because we found a problem thanks to that in a floating point serialization function. The serializer simply crashed and it took the application with it. Outcome of that: we contacted the community, said, hey, your floating point serializer has a problem if it sees this very, very large number. They fixed it. We also fixed our code. We made sure if this thing throws an exception, we catch it. We were not aware that there was this exception and we don't crash any longer. So win-win.

[00:22:24.540] Another technique is, yeah, we have test suites, but how good is our test suite? Does it capture everything? So we apply a technique called mutation testing, which is coming more from the research point of view. Basically the idea of mutation testing is a mutation testing tool goes, takes a line of your code, modifies it, then runs your complete test suite and says, if the test suite says I didn't find a problem, then you have a surviving mutant. You don't want that. What you want is code gets modified, your test suite spots it and says, we have detected a fault. Now you can imagine, you are doing modification in one line of code. We have thousands of lines of code. This is computationally very expensive. So we have a dedicated system that's doing nothing else than this.

[00:23:12.860] What's the result of that? First of all, we detect problems before they hit us in the face in production. Secondly, we continuously improve our test suite and we learn also how to specify our tests better. It's an insight that we get. That's it from my side. I'm handing over to Roland for the conclusions.

Roland Pastorino

[00:23:34.780] Thanks. You're welcome. Just to wrap up here, probably the first time you were exposed to the executable digital twin, so at least something new today. You've seen that we started in 2018 with our DevOps journey. We have applied DevOps into a completely different industry than you are used to, very likely. We have shared seven DevOps insights with you. We have more experience, so don't hesitate to talk to us. We have many more things to say about it. We are in 2023 now, and I have to say the outcome from applying these DevOps is very satisfactory, and the outlook as well.

[00:24:12.580] What is it exactly? Our product now is scaling. As I said before, we have this variety of digital twins with the acoustics behavior, the vibration, the electromagnetic, the hydraulics, and so on. This is scaling very nicely. We are also scaling on the connection to the physical world. We are connecting to benches over field buses, over CAN buses, digital buses, analog signals, et cetera. This is very nice. As an outcome of that, we have more application use cases than the ones we were even thinking of in 2018. Now we have customers asking us, can you also do that? We're like, whoa, yeah, okay, let's look into that. And the product development processes that are now in place, we believe they are really solid and can stay in place for five, 10 years.

[00:25:00.950] That was on the bright side. Is everything solved? No. Some open points that we leave for you to think about and close the loop with us if you have some answers. We share these insights, and actually we strongly believe that the two insights that had the biggest impact were the team culture and the processes, so not the tools we are using, not how we do requirement management and those things. But the problem is that we believe this because we cannot really measure, we are not really measuring that well today. So open question is: how do we even measure a value-driven mindset, a clearly defined task, or even responsible behavior? How do we put that in numbers to track that and see if we are doing good or bad in the team? That's an open point.

[00:25:54.550] This brings us to the end of this presentation. If you would be interested in what we are doing in general in our industry, you can scan this QR code. Thanks a lot for your attention.