4.0 Audit depth and audit design

AI system audits can be performed at different levels, requiring different degrees of technical expertise from auditors, and varying levels of access to the system’s technical components:

  1. At the baseline level, an audit focuses on reviewing documentation to check that all key governance components are addressed, relevant risks are identified and appropriate mitigation strategies are in place. This approach can assess how well the organisation controls risk factors, but it does not determine whether the AI system is actually breaching principles such as equal treatment.
  2. When auditors have access to run the model and test it using either real production data or synthetic data, they can directly evaluate the system’s behaviour and outputs. This enables them to audit the actual impact of the AI system. For example, auditors can test a predictive AI system by altering a single input variable and comparing the results, or probe a generative AI system for vulnerabilities by using specially crafted prompts.
  3. In a comprehensive audit of the AI system, more technical audit tests can give valuable insights into its components:
    1. Close inspection of the data and a review of the code can give higher confidence in the accuracy of the documentation, particularly regarding the specifications of the ML model.
    2. For predictive AI systems, reproduction of (or parts of) the model training, testing, scoring and performance measures might be necessary to understand and verify details of the model, its performance and reproducibility, as well as its fairness implications. This requires infrastructure that is capable of supporting such verification. The associated costs and potential benefits of this approach should be taken into account when considering it.
    3. Developing suitable alternatives to the model can be advantageous to highlight deficiencies and how they can be prevented. As with (3b), careful consideration is needed before deciding to develop alternative models, as doing so may lead to responsibilities that are not in accordance with the SAI’s statutory tasks. Furthermore, the costs and benefits of this approach should be considered. For complex AI systems and/or systems purchased from a third party, it is likely that this will not be feasible for an auditor.

Auditors who would like to perform all levels of review should have a deep understanding of the AI system’s technology in order to give meaningful recommendations for improvements. Where the auditee organisation lacks expertise, any shortcomings should be recognised and appropriate recommendations offered. This may require expertise from AI or IT specialists. The appropriate audit depth is determined by the risks connected to the AI system, the purpose of the audit and overall audit questions.57

Another defining factor when planning an AI audit is the auditee’s maturity and ambition with regards to AI, and whether the AI system was developed by the auditee or is an externally developed, purchased system. The level and character of available documentation is likely to vary significantly with the auditee’s organisational structure, its investment in AI and the size of the AI project. In the case of purchased AI systems, where neither the source code nor the detailed model specifications are known to (or verifiable by) the auditee organisation, it remains their responsibility to provide extensive documentation that proves compliance to the proper standards of public administration processes.

Auditors should assume that the data, or relevant extracts, are available alongside the documentation. However, the code and the model itself may not be in a format that is accessible to auditors58, particularly in the case of purchased AI systems. In such instances, close cooperation with the auditee organisation is necessary to perform tests and/or demonstrations on their premises, supervised by auditors with a sound understanding of the topic.


  1. For example, in The use of artificial intelligence in the central government, the overall audit questions were about responsible AI usage in the central government, and the technical details of the AI systems in use at the time were out of scope of the audit. Hence the audit of four AI systems that served as case studies was performed as a baseline audit of AI governance of these AI systems.↩︎

  2. While many ML models are developed in python or R, there are other software writing options. It is therefore unreasonable to request that auditors be familiar with every coding language and software framework. Costs related to the acquisition of additional software and/or computing power might create additional problems.↩︎