1 Executive summary

In 2017, the Supreme Audit Institutions (SAIs) of Brazil, Finland, Germany, the Netherlands, Norway and the UK entered into a ‘Memorandum of Understanding Data Analytics’ (MoU). Recognising the fact that digitisation and datafication changes the way governments work and hence requires SAIs to develop new methodologies and practices to ensure effective and efficient audits, they agreed to cooperate on the topic of data analytics by sharing knowledge, working experiences and code. In their annual conference in 2019, hosted by the Finnish SAI, the members agreed to jointly develop this paper for audits of artificial intelligence (AI) applications.

AI systems based on machine learning (ML) models are increasingly used in the public sector to improve services and reduce costs. ML is a field of computer science dealing with methods to develop (‘learn’) rules from input data to achieve a given goal. An ML model is the resulting set of rules, which may be used to make predictions on data previously unknown to the model. However, new technologies tend to be accompanied by new risks. While dedicated legislation is still underway, both international and European guidelines have been proposed (references are indicated within square brackets in the text) that emphasise the need for control mechanisms and audits. The AI community is addressing issues linked to ethical principles and the negative social impact of AI applications. A recent publication on trustworthy AI development, co-authored by a broad collaboration of researchers from both academia and industry, recommends conducting and funding third-party audits of AI systems.1

While data protection authorities are working on dedicated guidelines and can take on a supervisory role for personal data protection, many of the risks linked to AI applications are not related to personal data. For example, opaque ML models that might automate and institutionalise unequal treatment could damage trust in our auditee organisations and, by extension, in our democratic institutions. Therefore it will become increasingly necessary that SAIs are able to audit applications based on ML algorithms in both compliance and performance audits. Several MoU member SAIs are currently performing case studies or pilot audits to develop a generic methodology for audits of AI applications.

This paper summarises what we believe are key risks connected to the use of ML in public services, and suggests an audit catalogue2 that includes methodological approaches for audits of AI applications. These suggestions are based on contributions from the MoU member SAIs that stem from their respective experiences with ML audits and audits of other software development projects. The SAIs of Germany, the Netherlands, Norway and the UK are the main authors of this paper.

Depending on the application, audits of ML algorithms are usually performed as special cases of performance or compliance audits. ML models tend to be embedded in wider IT infrastructure. Therefore, elements from IT audit are often included. The proposed audit areas include the data understanding and model development process, the performance of the model and ethical considerations such as explainability and fairness. This paper is based on the broadly used ‘cross-industry standard process for data mining’ (CRISP-DM - see Chapter 3) that includes all phases of an AI application’s lifecycle - from business understanding to deployment and continued operation.

The main chapters of this paper focus on auditing the ML component. Appendix One discusses how to assess other steps of an AI application’s lifecycle and provides tips on how SAIs could create well-balanced audit teams to enhance the efficiency of their audit work on AI applications. We also include a helper tool that auditors can use to prepare their audits. It provides a host of suitable audit questions that auditors may draw upon. Auditors can select steps along the CRISP-DM cycle based on their risk assessment and get suggestions for suitable audit evidence and contacts within the auditee organisation.

We identified the following main general problem areas and risks:

  • Developers of ML algorithms will often focus on optimising specific numeric performance metrics. As a result, there is a high risk that requirements of compliance, transparency and fairness are neglected.

  • Product owners within the auditee organisation might not communicate their requirements well to ML developers, leading to ML algorithms that could, in a worst case scenario, increase costs and make routine tasks more time-consuming.

  • Auditee organisations often lack the resources and competence to develop ML applications internally and thus rely on consultants or procure ready-made solutions from commercial businesses. This increases the risk of using ML without the understanding necessary both for ML-based production/maintenance and compliance requirements.

  • There is significant uncertainty among public-sector entities in the MoU member states about the use of personal data in ML models. While the data protection agencies have begun to issue guidelines, organisational regulatory structures are not necessarily in place and accountability tends to be unclarified.

Auditors need specific training in the following areas of expertise in order to perform meaningful assessments of AI applications and to give appropriate recommendations:

  • Auditors need a good understanding of the high-level principles of ML algorithms and up-to-date knowledge of the rapid technical developments in this field - this is sufficient to perform a baseline audit by reviewing the respective documentation of an ML-system.

  • For a thorough audit that includes substantial tests, auditors need to understand common coding languages and model implementations, and be able to use appropriate software tools.

  • ML-related IT infrastructure often includes cloud-based solutions due to the high demand on computing power. Therefore, auditors need a basic understanding of cloud services for this kind of audit work.

This paper reaches the following conclusions and recommendations for SAIs:

  • SAIs should be able to audit ML-based AI applications in order to fulfil their statutory mission and to assess whether use of ML contributes to efficient and effective public services, in compliance with relevant rules and regulations.
  • ML audits require special auditor knowledge and skills, and SAIs should build up the competence of their auditors.
  • The ML audit catalogue and helper tool proposed in this paper have been tested in our case studies and may be used as templates. They are living documents and thus should be refined by application to more cases and to more diverse cases, and consistently updated with new AI research results.
  • SAIs should build up their capacities to perform more ML audit work.
  • The authors hope that the guidance and good practices provided within this paper, alongside the audit helper tool, will enable the international audit community to begin auditing ML.


[1] M. Brundage et al. (2020): Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims, https://arxiv.org/abs/2004.07213.

  1. See [1]↩︎

  2. By audit catalogue we mean a set of guidelines including both the suggested content of audit topics based on risks, as well as methodology to perform respective audit tests.↩︎