Industrializing your Data Science Capabilities
Data Science and AI are huge buzzwords nowadays. Data Scientists are creating insights and predictions with huge potential to overthrow our daily work by far. Unfortunately many of these approaches keep stuck after few meters in the mud of operations.
At Continental Tires we started out by creating an environment and accompanying processes from day one. Following the agile approach of Continous delivery and Continous deployment the Data Science Factory was a supporting infrastructure to support Data Scientist to industrialize AI and Machine Learning Use Cases. Today this environments is used by Data Scientists all over the different parts of Conti Tires and is even highly recognized by players like AWS, Microsoft or Google.
In this talk we will present the architecture and the approach we followed to implement this provider-independent environment. Done as Infrastructure-as-code and aligned with processes to follow a CI/CD pipeline Data Science at Tires can be done for developing real products inhouse. DevOps and MLOps are collaborating together with the Data Scientists to bring the old industrial company into the new world of Software and Data driven products.
Chapters
Full transcript
The complete talk, organized by section.
Dubravko Dolic
Hi, my name is Dubravko Dolic, and I want to show you how we at Tires industrialize our AI and data science use cases.
Over the years, we had a slight travel to come to the place where we can do so. And before I show you how we went through this, I would like to show you some recent news which shows where we are today.
Maybe you have heard, if you're interested in tires, about news like this here. Recently, Conti unfortunately had to recall some of its tires from a plant in the US. Not very nice, but unfortunately, things like this happen. The challenge here we had is that from our huge production, we need to find out exactly those tires which were concerned by this recall.
That was the point in time when I was approached. Some data scientist had created a small Python script, which was able to identify the correct tires, but it was on his local computer. So what to do there?
Luckily, over the last years, we created an infrastructure called Data Science Factory, and this Data Science Factory is exactly the place where we can industrialize such use cases. As you can see from the slide here, we created this Data Science Factory to be able to very quickly set up an environment where we can scale out such programs. So we took a new project there, and in really less than a day, we were able to do all the setup, connect to the data source, and bring tons of data, three years of production, into the database to analyze this problem. Over the weekend, we were able to identify the tires concerned from the problem and deliver an answer to the management. Quite a success story.
So how did we get there? In the beginning, we started very small, which is nowadays a very common approach, but for a huge company like Conti Tires, that was not usual. When I was approached with the question, "Hey, we would like to do some machine learning, some AI stuff in the demand forecasting," the first impulse of management was, "Let's call IBM, SAP, or whatsoever. They can solve the problem."
As data scientists, we stressed the point that we can do it, and we showed the management what is possible. So we just extracted some data from the data source, loaded it into R, did some analysis on a specific market for specific articles, so broke down the problem very much, and showed the management this is what's possible. And that led to our first bigger data science project there.
And from the beginning, as we were looking from a certain IT perspective on this, we were looking for a possibility that we create some infrastructure which we can reuse for other use cases later on. So what we came up with is some kind of sketch which we wanted to create there, where we thought the process of doing data science is more or less included.
What was very important to us is, as you can see here on the slide, that we created, on the one hand, some lab environment where the data scientists can really create freely, completely independent from any IT infrastructure, something with the latest libraries, the language he wants, whatsoever. And on the other hand, an environment where we can work like in a factory, where we can industrialize any data artifact that the data scientist comes up with. That was the basic idea there, and these were the first components which we used to create this Data Science Factory.
We looked at the market. We investigated some time to see and find out what was there. That was four years ago. We did some use cases with some providers there, but nothing on the market was really good to solve all the requirements or to satisfy all the requirements that the data scientists really had. Sometimes it was too strong in the business perspective, so not flexible enough on coding. Sometimes it was too strong to the IT perspective. So there was not a good mixture.
What we found out at that time, sure, was that there is a common tool stack, which nowadays is very common and more or less there everywhere. At that time, it was not that common, but also already known that you can use Git for doing collaboration on code, that you need something like Docker and Jenkins to deploy stuff, and also Kubernetes to scale out the things there.
So we looked deeper into that tool stack and found out that this is exactly what we need when we would like to go this journey from classical waterfall IT projects, which are still in place at Conti up to today, going above the agile way and really meeting the requirements of continuous integration, continuous delivery, where we are able to find new features and integrate them into the software very quickly and give also a lot of steering into the hands of the data scientist. So this was the basic idea where we wanted to go there, and the Data Science Factory was the outcome.
What is very important is that with the Data Science Factory, we created alongside processes. So we not just concentrated on delivering some piece of technology, something which runs anywhere. We also wanted to help the data scientists to go along a process to really deliver their results. So we started really very early, and as you can see here, we used some templates to help people to find use cases at all.
How can you start? What is the first step that you go? Do you look for the right data and then think about the use case? Do you look for business value? What is it that you are looking for? So we did also workshops with the business areas, with domain experts to identify use cases.
And we also established this way of working that a data scientist goes along with a business expert, a domain expert, and helps understanding the problem, shows the data, gives insight into the data, creates maybe something very isolated, maybe an R Shiny dashboard where you can look into the data. And that helped the business people very much also to understand what data can deliver, what data science can deliver. That was really a very good first step, but that was not the step where we wanted to stop.
So the Data Science Factory was the logical next step because when it comes to the point that the area, the business domain says, "Well, that's great. I want to have it on a daily basis, run automatically, and next week maybe I have a new idea I want to give input to you and please integrate it there," that exactly is the point when we need to get one step further and integrate the Data Science Factory in there.
So what you can see here is also the process, how the data scientist can really go to these further steps and go along a development process to create something which then be productive.
Alongside this, we recommended to use these classical areas, dev, QA, prod, as you can see here. But what we did there in the Data Science Factory is we abstracted everything from the data scientist. The data scientist does not need to take care about what is in there. So all the load balancing, networking, proxies, whatsoever, is taken away. That is done by the Data Science Factory so that the data scientist can concentrate on these steps.
He can create an artifact, a nice thing, let it be a dashboard, a classification algorithm which delivers its result using an API whatsoever, and he can bring this, delivering it in his lab. He can bring it to dev area, test it, do a handover, talk with the business: "Hey, is this what you want?" Can deliver the results there, can move it to a QA environment and move it to the prod environment. All this is in the Data Science Factory.
So, as I said before, classical tool stack. What you find in the Data Science Factory basically are components that are well-known to the audience here, I guess. You can find Git there, Jenkins. We put things in Docker, use Kubernetes to scale it out. And we have Airflow if the process, the workflow is more complex. For simple workflows, we still use something easy like CronTab, but if it's getting more complex to schedule all these things, we have Airflow in place, and we have monitoring on that.
And what's very important is that from the beginning, we put everything infrastructure as code. In the beginning, it wasn't clear whether we stick to the specific environment, whether it's on-premise or whether it's some AWS or Azure cloud. So we went to Azure because it was already there. There was an account there and stuff like that, but we didn't rely on the services there. So we really did everything infrastructure as code, and you see here the components, Terraform, Ansible. So we automated almost everything.
Also, many templates are available nowadays. If you want to have a new, let's say, database in place, most of the time we have Postgres. We have some templates to automate the process there. So automation was a very important paradigm to the Data Science Factory to create, abstract really a lot of stuff from the data scientists and make it possible to run this stuff as independent as possible.
So here you see the workflow that we created alongside this process. We have the data lake. I will show you in a second. We call it data lake. It's not really a data lake where you have a lab to work in, and then you have the stages, and the data scientist goes through the stages to, in each stage, test his model, deliver the results, check the data, check all the process behind it, promote it then with a defined process, and go on.
What's also important is as this area was new to Tires and to many developers, and many developers were not in IT but came from something like manufacturing or else, we also had the kind of role function there and showed to these data scientists in the different areas how you can work with such a process, how you can do software development, what is the best way, where you need a handover, how you work with the Git and the Git flow there, and stuff like that.
Here you can see a very technical view on this code, and you see how this works. We have code there. You check this code into Git, and you have tags on there, and these tags are then later used for the Data Science Factory for the right deployment there. Let's assume you have an R Shiny dashboard, which you would like to deploy there. There is a config file which steers this in a central way, and then you can do your release management. And from the Git, the Data Science Factory knows, so to say, which area you are deploying into and how you would like to work with it.
I will show you this now in a few slides here. Let's do a quick run through this process here. The starting point is always the so-called data lake. It's not a very lucky name because it's not a data lake in the classical sense, but it's a data lab, and a very good data lab. I really like that because, as a data scientist, this is a very nice entry point.
You start with a web page, as you can see here, and behind this here is the full process that many of you definitely know, like creating an EC2 instance, creating S3 buckets to store your data and stuff like that. But as a data scientist, you don't care. You don't need to look into that because you just click on this web page with your usual single sign-on, and you choose whatever you want for an environment, maybe an RStudio or a deep learning environment whatsoever. You choose your configuration. Let's say you would like to do some deep learning, then you can choose how much power you need. If you want to have a GPU in there, go for it. And there you go.
Behind that, a specific EC2 starts, and the EC2 is pre-installed with all the necessary Python flavors. You have different flavors in there, like Anaconda and plain Python. You have virtual environments already pre-installed, having TensorFlow, PyTorch, MXNet, Caffe, all in that there. You have a Jupyter Notebook, and really a minute later, you can start working. You just need to upload your data into an S3 bucket. Also, the S3 bucket is not really visible for you. It's mapped to a file into a directory in your EC2 instance, and you just start working there.
So this is the lab environment where we work in. Very helpful, very useful. And this is also maintained by our corporate division, so it's really usable for all over the place at Continental.
Then we use GitHub. GitHub has a very central role, as you can imagine. If you are ready with your code, you can create a repository where you put all your code in there. And as a data scientist, you also have to containerize. So you put the Dockerfile also into the GitHub. You can see here we do all the documentation in GitHub. We have a meta documentation which relies on GitHub and extracts the Rmd files, md files from the GitHub every night, so we have an up-to-date documentation. Every data scientist is asked to do a decent documentation in there, put a Dockerfile in there so that the Dockerization is already done, and we have a Jenkins template where you need to enter your specific project, and that's it.
Then you can publish your stuff on GitHub, and what you need to do in your lab environment, you have a configuration file like you know from other applications. This configuration file is really specifically for your project. And with this configuration file, you are able to use our Data Science Factory control.
The Data Science Factory control, as you can see here, is a specific application that we have made available to the lab and to every data scientist. You can easily install it from our internal Git. It's just a pip install. Then you have this DSF control. And with the DSF control, you can do the whole process. You want to have an overview, you just do a list, then you have an overview of all the builds that are already available.
If you want to deploy your specific application to a specific stage, you go here. As you can see in these examples, you release it to a specific one, or you do on dev, on QA, you can release your stuff there with this control. And this is basically the process that the data scientist has to follow.
After this, you can control your environment. Sure enough, we have monitoring in here, and with this monitoring, you as a data scientist, you can see how this works, how many resources you use, how many jobs you have running, what the performance is behind this, if there are any problems in there. As you already can see here, for instance, this example here shows that there are more CPUs reserved than there are necessary. These are also possibilities for us as an IT organization to look into this and help the data scientists to optimize their work.
We have this monitoring in place, where you can really control all the area, all different projects in the Data Science Factory. Currently, there are over 30 projects running in there, from small projects with simple dashboards to very complex ones with many interfaces to many different data sources and APIs behind that where more teams are involved. So everything in there.
Here, I give you an overview. We don't stop with the Data Science Factory. The Data Science Factory is a very central and important and the first infrastructure that we created. You can see it here. But from there, we started to spread out because after the first success of the Data Science Factory, many people were interested to do more, to get more data also available.
So we get connected to different data sources. We have some lightweight open source data warehouse in here because it's always the case that you find use cases where it's still necessary to have not big data or unstructured data, but the good old traditional dimensional data. So we put this in there and have access to these data.
Very interesting also, the telemetry backbone, as we call it here. The telemetry backbone is an infrastructure where we, as the name already says, collect fast data, so data with very small pieces of information, but many of them. So here we collect data from, let's say, vehicle tests where our tires are tested out there, and we get data information on pressure, temperature, and so on, GPS information. Everything like that is collected in this telemetry backbone.
In this telemetry backbone, we have stages and a Cassandra database, but also here we abstracted all the stuff for the data scientist. The data scientist has a Python layer again, which is the layer of access for the data scientist, where you can easily grab the data from Cassandra without knowing any details about Cassandra. Just use his optimized view, like I want to query kind of vehicles, kind of date range, maybe also region, some GPS information or something like that. This all we abstracted from the technology so that the data scientists really quickly can use their mainly used languages.
This is the overview here for the whole technology stack that we use. So you see from the start, four years ago, we grew a lot. But still, the Data Science Factory is the central point where we industrialize use cases and use them for production.
I would like to show you-- This is a very new slide. Very recently, we were also challenged like, well, is this still the right infrastructure? As I told you, it's four years old. And so we were asked, let's compare to what is there now. Over the years, something developed, and yes, there is something there. So we did a comparison with SageMaker. Very interesting, very nice insights that we had there in the software.
And what we saw is that with the clear distinction between what the MLOps needs to do and the data scientist needs to do in the Data Science Factory, our process was slightly more adapted to the needs that we have as a company than SageMaker because this distinction between these roles of data scientist and MLOps is very common because we have many data scientists, as I said, who did not come from IT background, computer science or something, but learned their domain like sales, like manufacturing, process engineering, something like that, and now learn things like R or Python or do TensorFlow or else. So they don't want to have too much insight into what the MLOps does. And this distinction is done with the Data Science Factory, and we use this here, and that was clearly something that was stated after our evaluation that we have in the Data Science Factory, which is not there in SageMaker.
Also what we are quite skeptical about is this lock-in if we go to something like AWS, that we have vendor lock-in there, which we don't want to have. But still we are completely aware that a company like Amazon can develop these things with a vast amount of developers, and maybe we are not that quick because in the end, we are a tire manufacturer and not an IT company. So we need to keep our eyes open, we keep our mind open, and always compare the things that we do compared to what's out there to find out whether we still have the right balance.
Currently, we clearly place ourselves between the more business-driven tools, which give you a point-and-click possibility to deploy some AI, and things like you see here, strongly driven from the IT perspective, which is something like Google or Amazon or also Azure. But these tools really clearly have a strong IT background and focus, and working with these tools, you always are reminded on that. And we are more in the middle of that, and also adapted very much to the process we have at Continental.
Let me show you some of the use cases we have at the Data Science Factory. I think after these insights into the Data Science Factory, you might be interested in what are we doing with it.
I show you some things here. A very nice use case, for instance, concerns our specialty tires, so tires which are not mounted to a usual way tire you see on the street, but some mining vehicle. These are tires which are maybe two meters high, and they have a specific behavior. And it's important that you really handle these tires also in this specific way. They build up heat and stuff like that.
So we created this monitoring tool together with our engineers. And now these engineers have a tool which they can use in the field, go out there and mount a logger to vehicles and tell the, let's say, mining operator: "Well, we observed your behavior with this mining vehicle, and you could change it here and there. If you go this way, this path with a longer curve, for instance, then your tire will last longer." And behind this tool, what you see here, this is completely built in R Shiny and scaled out with the Data Science Factory.
Several customers and engineers are working on this today, and they can use it. The software automatically detects loading and dumping areas in the mining area, detects paths which are not recommendable. There's a lateral force analysis and things like that.
Another use case here from our plants. Here we observed or investigated into the mixing process. So the first process when the raw materials were in a huge mixer to be prepared for creating tires. And this process, we also had some insights into, created some algorithms to optimize this process to be quicker in the turnaround times and stuff like that. Also behind that, really mechanics, which were created in Python. We developed a small dashboard and made it available to the operators. They use it now on a daily base, and up to now, we rolled this out in three different plants using the Data Science Factory again, where the whole software is running.
Another very interesting use case is the extrusion process. So when you really have the right mass for creating tires, you need to stretch it out, sort of like on a spread on a bread. And with this process, this is very sensible. It might happen that you create scrap, which is not very good. So we were on the way to predict this scrap. So we try to predict as early as possible whether this extrusion process happened in the right way or whether it went wrong.
And in the beginning, we did this on a very low level. We didn't have the right data, so we had a low accuracy. But over time, we evolved there, and we are on a good way to be even better and predicting the scrap with a very high accuracy. And also this process is really running on the Data Science Factory, and here it's very useful because we have installed this in one plant, on one machine, and the operator can give feedback, and the data scientist can really optimize the algorithm behind it and just deploy it, as we've seen before in the slides, very quickly, so that the operator sees the results of his feedback quite quickly. That's a very good process here for the Data Science Factory.
Also, image recognition is something we turned to recently. We have some TensorFlow models to detect serial numbers, which are printed on some vehicle tires. And we have an app running to use the results of these models here. And also the code itself is really running on the Data Science Factory again. So then the phones need to connect to the Data Science Factory and get the inference from there. This is still a good working model there. In future, we will also deploy things to the mobile device. Also, edge is in plan. And with that, the Data Science Factory might evolve and do also device management, mobile management and stuff like that. This is a process ongoing. We are currently into it and really plan to stretch out to also these areas here.
Very important area, we saw the first example already, is the whole fleet service area. We have little sensors in our tires, specifically in truck tires. And these sensors can measure the pressure and the temperature. And from there, we can use this data to do some predictions. How long will the tire last? How many miles can you go with a tire? When is the next service necessary? And this is something we develop currently for our customers so that they can use services alongside to the tire. Very important service. And this is also based in the development on things like the telemetry backbone I showed before. And it's then deployed to the Data Science Factory where the algorithm works and delivers the results using APIs.
That was it. Thanks for listening. If you have questions, I'm happy to hear. Thank you.