A ML model should only be moved to production, if it has been thoroughly tested under realistic circumstances. Potential model weaknesses and corresponding correction methods should be documented and communicated.
Appropriate monitoring of the model’s performance over time depends on the implementation of the model: - Automatic real-time scoring requires continuous monitoring with automated test and validation methods for data quality and model performance, including transparency and fairness requirements. Data quality should be monitored during the life cycle of the model to preclude concept drift and to ensure stable model performance. - A manual setup with cyclical retraining or re-optimisation naturally includes performance checks in each cycle. In any case, a mechanism to flag possible changes in the performance over time should be in place. This monitoring should include model predictions in comparison with actual results.
If the model is retrained or redeveloped based on the outcome of previous predictions or because model performance in production has dropped below requirements, this feedback loop needs to be designed such that no additional bias is introduced. While monitoring model performance, transparency and fairness requirements have to be included.
Before moving a model to production, procedures and circumstances have to be defined under which the model needs to be replaced or removed from the process.
Furthermore, a model should only be used for its original purpose and by authorised users to guarantee safe operation, especially if personal data is used.
The performance metric used when optimising the model tends to reflect policy decisions (such as prioritising sensitivity over precision). As such policies can change, the performance metric must be adjusted when updating the model. The model should be adjustable to changing circumstances, such as changes in legislation.
- Low performing or untested model is moved to production.
- Performance degrading over time (for example, due to change in demographics or other circumstances such as laws and regulations).
- Increased model bias (possibly by the production model itself).
- Obsolete choices embedded in the model.
- Although retraining does not yield desired model performance, procedures to replace or turn off the model are not implemented.
- Model is ‘repurposed’ over time, and predictions begin to be used out of context.
- Verify that the population for which the model is used in production is (still) sufficiently represented in the training data.
- Obtain the code of the production version of the model.
- Compare performance in production to expectation from development.
- Review monitoring of development of performance and input data distributions.