4.6 AI systems in production

Once an AI system is deployed, it operates in real-world conditions that may differ from those encountered during development and testing. As such, AI systems in production are exposed to several new risks.

Live data often differs from the historical or curated datasets used for training and testing and may include unexpected or messy inputs. As AI becomes part of operational processes, it is used by staff who may not have been involved in its development, increasing the risk of incorrect use or repurposing beyond its original intent. Production environments also place greater demands on technical stability, speed, and availability.

Over time, changes in user behaviour, demographics, or external factors can alter the distribution of input data. This is known as data drift. Model drift can also occur, which refers to declining model performance over time. These shifts can reduce system performance or introduce new biases. In addition, evolving norms, policies, or legislation may affect compliance requirements.

Effective monitoring and management are essential to ensure the system continues to deliver value, remains compliant with regulations, and operates safely and ethically. Organisations should regularly review observations and require business owners to be aware of the system’s performance. If the AI system no longer meets performance or compliance requirements, or the business need has changed, corrective action (such as retraining, technical adjustments, or decommissioning) should be taken. Retraining should follow the same standards for documentation, testing, and acceptance as initial model development (see Section 4.3). Any changes to features, prompts, or model architecture must be properly versioned and documented.

Many AI systems rely on third-party models, APIs, or cloud services. Vendors may update or withdraw models, change APIs, or alter service terms. These changes can affect performance or introduce new risks. If a vendor update or model change materially affects system behaviour, organisations should re-apply acceptance criteria and only return the system to production once it passes all relevant tests.

4.6.1 Risks to consider

Performance drift: the system may encounter live data that differs from the curated training and test data, leading to reduced accuracy or reliability. These differences may be present from the start or emerge gradually as data drift.
Misuse or unintended use: AI systems can be used in ways not foreseen during development. New users may apply the AI system in different contexts or misinterpret its outputs. This can lead to poor decisions and the predictions of the AI system being used out of context.
Reduced performance due to misuse: the AI system may perform as expected, but it may still fail to achieve KPIs if the procedures around it are poorly implemented. For example, it may not deliver expected benefits if staff do not use its predictions effectively.
Lack of ownership: without clear responsibility for ongoing management, including handover and business ownership, the AI system may not be monitored properly or updated when needed.
Compliance gaps: laws and regulations may change over time, and systems can become non-compliant if not actively reviewed.
Vendor and model version changes: vendors may update or withdraw models, change APIs, or alter service terms. This can affect performance or introduce new risks.

4.6.2 Expected controls

To manage these risks, organisations should:

Continuously track accuracy, reliability, robustness, and key performance indicators set during project planning.
Monitor input data quality and representativeness.
Set up automated or scheduled checks for data and model drift, and define thresholds for acceptable performance. If the AI system is developed by a third-party and the organisation does not have details on the data, the organisation should receive regular data quality reports from the third party.
Establish policies for retraining or updating models when drift is detected. Document retraining events, including data used, changes made, and evaluation results. Ensure any changes to features, prompts, or model architecture are properly versioned and documented.
Monitor bias and fairness in outputs, along with user feedback and complaints.
Have procedures for detecting, logging, and investigating incidents. Notify stakeholders as required and take corrective action, including model rollback or decommissioning if needed.
Regularly review the system’s compliance with current laws and regulations, especially for systems that process personal data, make decisions affecting individuals, or operate in regulated sectors.
Provide clear guidance and training for users, ensuring they understand the system’s intended use and limitations. Reasonably foreseeable misuses should be identified, documented and monitored.
Ensure clear ownership for ongoing management, monitoring, and updates.
Define triggers for decommissioning, procedures for archiving or deleting data, and responsibilities for a smooth transition when the system is retired.
Track vendor/model versions in use. Test for regression or compatibility issues after vendor updates. Maintain fallback mechanisms or alternative solutions, and document all changes and their impact.