3.4 Model in production

Appropriate monitoring of the model’s performance over time depends on the implementation of the model: automatic real-time scoring requires continuous monitoring with automated tests to ensure stable model performance, while a manual setup with cyclical retraining or re-optimisation naturally includes performance checks in each cycle. In any case, a mechanism to flag possible changes in the performance over time should be in place.

If the model is retrained or redeveloped based on the outcome of previous predictions, this feedback loop needs to be designed such that no additional bias is introduced.

The performance metric used when optimising the model reflects policy decisions, (for example, in prioritising sensitivity over precision). As such policies can change, the performance metric must be adjusted when updating the model.

Risks:

  • Performance degrading over time (for example, due to change in demographics)
  • Increased model bias
  • Obsolete choices embedded in the model
  • Model is ‘repurposed’ over time, and predictions begin to be used out of context.

3.4.1 Risk assessment: Model In production

Table 3.4: Aspects and contact persons: Model in production
Aspect
Roles
Tool
Product owner User Helpdesk Chief information officer Project leader Data analyst Data engineer Developer Controller IT security Data protection official Budget holder Helper tool reference
Overall responsibility: Model in production x x x A6
Data update and monitoring x x x A6.001
Model re-training x x x
Automation, system architecture, interface to other systems x x A6.003
Long-term quality assurance x A6.006
Performance control in production x x A6.006

3.4.2 Possible audit tests: Model in production

  • Verify that the population for which the model is used in production is (still) sufficiently represented in the training data.
  • Obtain the code of the production version of the model.
  • Compare performance in production to expectation from development.
  • Review monitoring of development of performance and input data distributions.