3 ML audit catalogue

While AI applications can cause issues of a different nature compared to other software or IT systems, the audit of ML models can be structured according to the cross-industry standard process for data mining (CRISP-DM)[10] as it reflects a standard development process of ML models (even if not explicitly employed by the ML developers). IT auditors familiar with CRISP-DM can thus perform a high-level review without expert knowledge of ML and assess whether ML experts need to be consulted for further tests by following these seven phases:

  1. Business understanding

  2. Data understanding

  3. Data preparation

  4. Modelling and development

  5. Evaluation of the model before deployment

  6. Deployment and the accompanying change management processes

  7. Operation of the model and performance in production.

While each of these phases are important (especially the business understanding and data understanding of the auditee organisation), phases i) and vi) may, by and large, be audited in the same way as software development projects, and are therefore combined here into Section 3.1 Project management & governance. Phases ii) and iii) are combined into Section 3.2 Data 7.

This paper focuses on issues that are particular to AI applications, with a special consideration of values such as transparency of an ML algorithm’s decisions, the equality and fairness of these decisions, and autonomy as well as accountability. While these aspects should be considered in every step of the ML development process,8 the suggested audit catalogue places those considerations in the evaluation step. Section 3.5 is thus not considered as one step in a linear process, but rather in a loop accompanying the other steps, notably after the deployment of the application. ML models tend to be frequently (sometimes regularly or even continuously) re-trained as more data becomes available. Consequently, this section is described last (unlike the standard CRISP-DM evaluation step performed before deployment).

ML audits may be performed in variable depths, requiring different levels of technical expertise from auditors, and different levels of access to the underlying technical components:

  1. The audit baseline consists of reviewing the documentation and ensuring that all key components are addressed, relevant risks are identified and mitigation strategies are in place.

  2. Close inspection of the data and a review of the code can give higher confidence in the accuracy of the documentation, in particular with regard to the specifications of the ML model.

  3. Reproduction of (parts of) the model training, testing, scoring and performance measures might be necessary to understand and verify details of the model, its performance and reproducibility, as well as its fairness implications. This might include tests of the model’s behaviour with manipulated data. This requires infrastructure that is suitable for conducting such verification. The costs and potential benefits of this approach have to be taken into account when considering it.

  4. Development of suitable alternatives to the model can be advantageous to highlight deficiencies and how they can be prevented. As with (3), careful consideration is required before deciding to develop alternative models, as doing so may lead to responsibilities that are not in accordance with the SAIs statutory tasks. Furthermore, the costs and benefits of this approach should be considered.

Auditors should assume that the data, or extracts of the data, are available in addition to the documentation. The code and the model itself might however be in a format that is not accessible to auditors.9. In that case, close cooperation with the auditee organisation is necessary to perform tests and/or demonstrations on their premises, supervised by auditors with a sound understanding of the topic.

Auditors who would like to perform all levels of review should have a deep technical understanding of the subject matter in order to give recommendations for improvements. Where the auditee organisation lacks expertise, any shortcomings should be recognised and recommendations offered. Depending on a risk assessment prior to the audit, it may suffice for the auditors to review the respective documentation and/or focus only on selected phases of the CRISP-DM cycle.

In case of proprietary models where neither the source code nor the detailed model specifications are known to (or verifiable by) the auditee organisation, it remains their responsibility to provide extensive documentation that proves compliance to the proper standards of public administration processes.


[10] A. Clark (2018): The Machine Learning Audit–CRISP-DM Framework, https://www.isaca.org/resources/isaca-journal/issues/2018/volume-1/the-machine-learning-auditcrisp-dm-framework.

  1. Keep in consideration that without solid business and data understanding by the auditee organisation, ML systems are set up for failure - that is, they may not improve or reach a business goal or the data may be misunderstood in the context of the business goal.↩︎

  2. Compare the concepts of fairness by design, transparency by design and privacy by design.↩︎

  3. While many ML models are developed in python or R, there are many other software writing options. Thus, it is unreasonable to request that auditors be familiar with every coding language and software framework. Costs related to the acquisition of additional software and/or computing power might create additional problems, but these could be circumvented by the auditors working on the auditee organisation’s premises.↩︎