MLOPs: Architecting Machine Learning Systems to Detect Model Degradation
Big Data processes and models often experience issues such as data quality, drift, and accuracy. The underlying data is becoming bigger and harder to manage which puts additional pressure on the technology to scale and support larger data loads with no down-time in a production setting. Since the model is specified to fit the underlying data, changes, or drift from the training sample sets, will affect the estimates causing model degradation in accuracy of predictions. Here we will layout some of the pipelines and processes that need to be included in an MLOPs environment to counter some of these challenges and provide an automated way to calibrate models.
Model Pipeline Architecture and Components
A typical model pipeline includes components such as data validation, pre-processing, training, validation exercises, and deployment. Data pre-processing steps includes feature normalization, imputation, and selection programs. When the pipeline is called, a model is trained on a dataset and an endpoint is developed which can be used to deploy to production or picked up by another pipeline. There are various tools that can be used to develop a model pipeline, but we built one using Kubeflow.
Detecting Model Degradation using Data Drift
In addition to model training pipeline, an MLOPs pipeline can be established to detect model degradation from data drift metrics to trigger model retraining so that your model stays relevant as the underlying data changes. There are three different types of drift that are monitored:
Concept Drift : or change in P(Y|X) is a shift in the actual relationship between the model inputs and the output.
Prediction Drift: or change in P(Y hat Prediction|X) is a shift in the model’s predictions.
Feature Drift: or change in P(X) is a shift in the model’s input data distribution.
There are various statistical techniques that can help you assess drift including a population stability index PSI which is commonly used in finance, divergence measures such as KL, and the Kolmogorov-Smirnov (K-S) test. Some examples of model drift are included below:
**Image source: IEEE Transactions on Knowledge and Data Engineering
MLOPs Pipeline and Retraining
Conceptually your MLOPs pipeline is separate from the model development pipeline established above. This is done intentionally to control for checking model stability after training. It includes a feedback loop to call the model pipeline in case any drift is detected in the estimates mentioned above. I’ve created an illustration to highlight how these pipelines can interact with each other:
Dashboards
Monitoring your model will be important to view the improved accuracy and comparative analysis of models. Dashboards can be incorporated off the development environment for A/B testing of champion and challenger models. Some common reports for consideration and inclusion in MLOPs dashboards are given below:
Stability Reports –Output from Drift Analysis including PSI and Population Divergence.
Performance Reports –Accuracy metrics such as K-S, RMSE, AUC-ROC, R-SQUARED, CONFUSION MATRIX WITH COST.
Operations Reports – Pipeline health and retraining events.
Comments