By Dubravko Dolic, Head of Applied Analytics & AI, Continental Tires
Since Continental Tires began researching the topic of machine learning and AI several years ago, it has had to manage a lot of challenges to industrialize software products based on machine learning (ML) and artificial intelligence. While many companies started out by trying out different use cases in many business areas, we had the advantage of setting our focus on the industrialization of use cases from the beginning. By 2018, a lab and an industrialization platform were established at Continental Tires to support the operation of business-relevant ML/AI-based use cases.
Still, a lot of lessons had to be learned. While a first lighthouse project demonstrated the business value of adapting processes using ML/AI was set up quickly, each business area still needed to find its own way into the topic. Luckily, all newly created data science teams were able to start with the same infrastructure.
While this bottom-up approach took some time to spread across the organization, the lighthouse project was a crucial point of reference that helped every new party quickly adjust to using the same infrastructure and adapt to the processes that were already established. But with each new challenge, processes needed to be adapted or improved.
By nature, data science is suitable for bridging the gaps between different teams and classical borders that may exist between business and IT departments. While the domain know-how differs, data scientists quickly find synergies with other data scientists in terms of methods applied or technologies used. At Continental, we found that quickly bringing together data scientists and programming expertise from classical IT, so that they develop common processes hand-in-hand, is crucial for success. Otherwise, business data science teams run the risk of developing into something that was once called “shadow IT.”
Most data scientists concentrate on creating models which predict, forecast, or classify anything based on some input data. Nowadays, the process of optimizing the results of this kind of modeling is clear and established in most companies.
It is getting more challenging to use the outcome of a model in a repeatable, controllable, and reliable way. To operate a model in such a way, paradigms from traditional software development need to be considered. The model itself is then just a part of the whole infrastructure that will be delivered. The complete result can be seen as a data product.
Developing a data product does not mean providing syntactically error-free code. It means providing a sufficiently trained model. A sufficiently trained version of the model is one that can be commercialized within the company as an optimization of existing business processes or outside the company as an extension of an existing product or as a completely new product. Industrialization, therefore, just means that the data scientist must develop the application to the point that it ultimately runs in the data-productive environment in which it was trained. For IT, industrialization is the finalization of a software artifact with focus on the program syntax and with respect to performance, reliability, and so on. In a data science project, industrialization means to run a data product technically in an environment where the model can grow.
Both traditional software and AI-based software are subject to constant adaptation and change. However, the process of constant change in traditional software is deterministic. The process of developing a model has stochastic elements and therefore can be planned in the same way.
Finalizing a model comes at the expense of the technological implementation quality. Data scientists cannot spend that much time developing good, scalable code. They need to invest more time in developing the model. So, they need to get a lot of primary input data very rapidly. Afterward, in terms of process technology, they must have the freedom to do everything possible with this data without an IT department making specifications.
Model Life-Cycle Management
Model life-cycle management plays a central role in enabling a controlled and efficient development of AI/ML-based products. While the basic iterative nature of the process of statistical learning was referenced early in frameworks like CRISP, REPL, or knowledge discovery in databases, the management of the models built into software became popular only very recently. As common ground, we find the following elements being essential to the process of model life-cycle management:
- data and feature versioning
- model versioning
- model warm-up
- model deployment
- model retraining
Data and Feature Versioning
The data-understanding phase is understood as being utterly crucial to the creation of a model. EDA and data visualization take up most of the time in a typical data science project. If erroneous data or wrong assumptions due to data interpretation occur, it may result in wrong models. Therefore, thorough and comprehensive documentation of all primary and aggregated data used for the modeling is necessary. The data situation for a specific model needs to be reproducible. Above that, the features which were used for a specific setup should be easily accessible and understandable. Consequently, a model life-cycle management depends on integrated data and feature storage.
At Continental Tires, we experienced the full range, from starting without any traceability for the data in use up to the implementation of feature and data storage for the model life-cycle management. While in the beginning of our AI journey we used simple object stores that were mainly administered by the data scientist to store and track data used in models, over time we created project-specific feature and data stores. The simplest approaches used SHAP values as a kind of feature store that shows the feature importance for each version of a model created, which then were stored in a local database. With that at least it was possible to trace the development of features over time. With our growing maturity, and specifically with the advent of real-life deep learning applications, we started to create an integrated data and feature store that is connected to our data science lab platform. To do so, we mainly relied on open-source components, which are foundational to our internal customized solutions.
After the data used to create an AI/ML-based solution are stored and versioned to be reproducible, it is inevitable that the different models that were tested must also be versioned. Again, this aligns with the goal of making the full process of developing productive models reproducible and traceable. While ad hoc analyses of data tend to show only successful results of a modeling phase, it is necessary to also make unsuccessful attempts visible to optimize productivity. This not only makes the full process transparent and helps to gain efficiency in later stages of the process, but it also clarifies the effort it takes to bring a model to production. Therefore, a model versioning should keep record of all different versions that were done to create the necessary accuracy. This includes varieties of one method with different parameterization or hyperparameters as well as the experiments with different methods. Besides the result of a model, the technical performance (runtime, etc.) should also be tracked.
One important side effect of the model tracking is the choice of KPIs. While some models might have built-in measures to determine their usefulness, in most cases it is necessary to clarify the KPIs that measure performance of a mode. This process is done between the business/product owner and data science/deep-learning experts. It is crucial to train business/product owners in AI so that they understand the impact of certain values and measurements that can determine the performance of a model; a process which increases the AI-maturity of the company.
Continental Tires is still seeking to improve our use of KPIs in measuring model performance. In the first projects where we attempted to implement ML models, we had to overcome nonstatistical performance measurements that delivered useless figures for model performance. But as they were established, a lot of negotiations were necessary to create understanding of more statistically based KPIs. In our current state, we define KPIs specific to each project, and they are agreed upon in the first phase of each project. These are then used in a model-tracking system (e.g., ML-flow) and in a productive stage also implemented in application-specific dashboards (R/Shiny, Python, PowerBI, Tableau) which are chosen by the respective business area. Internally, we are currently publishing guidelines and best practices to be used for models in context.
The most important phase in model life cycle management is the model warm-up phase. This phase needs to be a part of every model implementation approach. Only in this phase do we bring our model from the lab and into the wild, where new data enter the model and predictions need to show whether the promised accuracy can be delivered. Model warm-up should be planned from the first moment of industrialization. It can be done in very different ways.
Common setups are:
- a demo app that has a limited set of features but allows testing a model
- a testing machine in a plant
- a customer who has full transparency on the model character of a possible setup and is willing to create warm-up data
- creating a digital twin to simulate behavior of software before it runs on a machine
Seeing these warm-up setups suggests that it is necessary to plan some additional efforts. As we want to retrieve quick insights on how the model works when it is exposed to the wild, we need to set up an environment to collect results quickly but also alter, adapt, and retest the model just as quickly.
A model warm-up phase can deliver different insights. It may be that some technical issues in the data pipelining occur, which will need to be fixed. Also, it may happen that the target environment works with incompatible frameworks in terms of model standards. Regularly, a model warm-up will result in detecting situations that weren’t part of the initial training of the model. In terms of deep learning for computer vision, we may observe camera angles, exposures, lighting conditions, or other circumstances that differ from the data to which the model was exposed.
The model warm-up phase demonstrates the importance of an existing platform to realize CI/CD and quick industrialization. It must be possible to quickly change and adapt the model and try out variants during this phase. Therefore, a fast deployment to the warm-up environment needs to be available.
The model warm-up setting provides a kind of protected environment that should be used to test out the maturity of a model to be ready for commercialization. In this setting, the data scientist should be able to “fail fast, learn fast”—they should be able to fail with a safety net. By providing this setting we also prevent data scientists from extending their modeling phase ad libitum.
Specifically for critical applications, the warm-up at Continental Tires is always done over the course of at least two phases. First, a model only delivers results that are reported and read by a human. No actions are executed automatically due to an algorithm. Only after an extensive monitoring phase (lasting several months or up to years), a phase with a higher degree of automation will be implemented. And even in that stage, a human in the loop needs to confirm results or actions. Only after this extensive monitoring and after another warm-up test scenario can fully automated actions be deployed into a process.