3.1 Project management & governance

We have already discussed the notion that in undertaking an audit of an ML project auditors would start with a baseline review of project documentation. Regardless of the depth of audit being undertaken, a review of documents and understanding the context surrounding the project is at least as important as undertaking more detailed investigative work into the model itself.

There are issues that are common to the review of other (software development) projects and with any projects where a statistical model is used to support decision making more broadly. The following considerations are loosely based on the UK National Audit Office’s framework for reviewing models [11], extended to meet the challenges of AI applications.

3.1.1 Misalignment/ diversion from project objectives

Low technical understanding within management and limited expertise with practical issues in the technical staff can lead to miscommunication and wrong expectations. Auditors should look for indications that the model is tailored to the project’s objective(s):

Has the model been developed in collaboration with the project owner? For example:
- Are technical and functional requirements specified?
- Are these requirements captured and documented into a specification and transferred upon the model?
- Are there performance indicators that allow to measure and monitor whether or not these requirements are met?
- Is consideration made for the relative importance of different types of error (false positives/ false negatives)?
- Are assumptions listed and agreed?
- Are assumptions in line with the project’s objectives?
- Is there an agreed quality assurance plan throughout the model development process and for the deployed model?
- Is there evidence that the model’s project owner has influenced the development of the model to meet expectations?
Is the level of transparency and fairness required well defined in planning? There is active research in explainable ML, so while we may find new techniques to ‘explain’ models previously deemed as black boxes, the issue we should identify is to what extent the model that was delivered meets the requirements of the project.
If the project is anticipated to be complex and expensive, its feasibility should be explored in a proof of concept. Such a proof of concept may reduce the probability of an insufficient data basis and of weak final model performance. The proof of concept should entail model performance as well as whether or not technical and functional requirements can be met during deployment.
Is the method of project management in line with the project’s objectives and does it allow an iterative model development process?
Is there a forum within the auditee organisation for people with relevant technical expertise, outside of the development process, to challenge the development and use of model outputs? This means both an independent internal control unit, and a forum with clear accountability for complaints from external users or data subjects.

Risks: - Unrealistic project objectives. - Misconceptions of what AI/ML and data analysis can deliver. - Project fails to deliver on stated objectives. - Project objectives are not transferred to model objectives and there is no connection between model performance and functional objectives. - Data basis is insufficient for model performance objectives. - Feasibility has not been verified through a proof of concept: + Increased project lifetime. + Project failure. - Misunderstanding between project owners and developers resulting in wasted or poorly focused effort. - Project meets documented requirements, but stakeholders are dissatisfied and, in practice, their desired outcomes are not realised.

3.1.2 Lack of business readiness/ inability to support the project

Knowledge about a particular model is often concentrated in few staff members with high ML competence. Oftentimes these are external consultants. Miscommunication between these ML developers and either the users of the model (such as case workers) or the IT staff responsible for maintenance in production, can lead to inefficient implementation of the ML algorithm and failure of the project. To mitigate against these risks, auditors should consider:

Are roles and responsibilities for all project phases documented and communicated?
Are decision makers and their responsibilities clearly defined?
Training of end-users: Has the potentially probabilistic nature of model predictions been properly explained to end users (such as case workers)? Has a policy or guidance for the interaction between AI system and human workers communicated, such as the authority and accountability regime to arbitrate in case of disagreement between human end-users and decisions or recommendations of the AI system?
What processes are in place for succession planning/handover when a key person leaves the project? Similarly, what processes are in place for the handover from the development project to operation and maintenance in production?

Risks:

Transition from development project into the business as usual process is dysfunctional.
Inability to support project on an ongoing basis.
Project delays due to lengthy decision making process(es).
End users are unable to understand/challenge model outcomes leading to non-transparent or unfair decisions.

3.1.3 Legal and ethical issues

There are additional laws and regulations applicable to ML algorithms when considered alongside standard IT systems. Possible issues strongly depend on the type of model and application context. Assistance systems, which serve to simplify functional processes and do not directly affect citizens, could be less deeply audited regarding ethical issues compared to automated decision making systems that directly or indirectly affect citizens.

What laws and regulations have to be considered? This includes
- Normal operation. For example what type and level of transparency/explainability is needed: is a global understanding of tendencies enough (global explainability)? Are citizens directly or indirectly affected and therefore do single decisions need to be justifiable to the extent that advice can be given about how citizens can change the outcome (such as to get approval for support)? Are data subjects informed about the processing of their data by an AI system, and/or an automated process (if that is the case)?
- Possible side effects of a perfectly well-operating system: For example reinforcement of existing structures, under- or overexposure. Details depend on the type of AI application. for example a recommender system used to suggest relevant job advertisements to unemployed citizens might concentrate on certain career paths, missing out on the potential for non-standard retraining.
- Possible side effects due to model imperfections: For example, a biased model that discriminates on protected characteristics.

3.1.4 Inappropriate use of ML

Another risk not unique to ML projects but notable due to the current ‘hype’ around such technology is the risk that some auditee organisations might apply ML techniques not because they add value but instead due to a desire to be seen to be using cutting edge technology. While we are positive about the potential of ML to add value in a broad spectrum of applications, we must also be clear in our assessment of these applications when there is a risk of negative outcomes to the general public. Instances of this risk should be identifiable from an understanding of the project objectives.

In evaluating this, auditors should ensure:

the component of the solution that ML is applied to is clearly identifiable, justifiable and separable from the other surrounding business logic. This avoids the tendency for optimistic project planning to treat ML as a ‘black-box’ that can solve any and all business problems;
in planning the project, the problem statement is well defined and gives experts scope to experiment. More specifically avoiding statements like ‘We will use deep learning to do X’ instead, focus on the outcome of the project, and how its success will be measured.
there is clear evidence that management has identified that their chosen ML model is a necessary improvement over current methodologies.
that all operational, technical and social risks of deploying ML are captured, communicated and where possible avoided.

Risks:

Project objectives are not realised.
Overly expensive or complicated solutions to otherwise simple problems.
Alternative solutions (e.g. rule based or analytical solutions) were not or inadequately evaluated.

3.1.5 Transparency, explainability and fairness

Public administration has to be transparent in the sense that the decision making process should be justifiable and to some extent understandable by the general public. Further, citizens usually have the right to explanations of decisions that impact them. The concepts of ‘transparency by design’ and ‘fairness by design’ incorporate considerations along these lines in every step of the development of the AI system (and rightfully so); the audit of these aspects is explored in Section 3.5 Evaluation. Section 3.5 Evaluation

3.1.6 Privacy

If personal data or proxy variables for personal attributes are used, the EU’s General Data Protection Regulation (GDPR) and/or national privacy laws apply and auditors should consider if the ML application is the least intrusive option to satisfy the objective¹⁰ and whether all features related to personal data contribute enough to performance to justify their use.
Additionally, it might be necessary, depending on the ML application, to consider the disclosure risk - this can occur when personal data has been used to train the model, this personal information is encoded in the model and it can be possible to reconstruct parts of the dataset.¹¹

Risks:

Violation of data protection regulations

3.1.7 Autonomy and accountability

Decisions with legal or similarly significant effects made by ML using personal data are not allowed to be fully automated - citizens have the right to human involvement (with some exceptions: see article 22 of GDPR). Hence, the auditor must evaluate the method of human involvement and ensure that: - the system includes the ability to execute this right. - rights and responsibilities of users for validation and correction of ML decision making are clearly defined. - sufficient information is communicated to the affected person(s).

In ML-assisted decisions, where a human is responsible for the decision but uses ML as one (possibly the main) source of information, the discretion of that person should be evaluated, examining their ability to decide against the algorithm’s advice. Additionally, auditors should examine the possible consequences if that decision turns out to be wrong.

In addition, processes and responsibilities in cases of planned or unplanned system failures should be investigated. Are these in line with service level agreements?

In particular, it must be clarified which real person or legal entity bears responsibility for AI-autonomous or AI-assisted decisions. Two separate questions need to be answered in this context [12]: (1) Who is responsible for harm caused by the ML algorithm performing as expected? (2) Who is responsible in the case of failure?

This can become particularly challenging if a third party has developed the ML system.

Risks:

Automated processing of personal data without the knowledge or consent of the affected persons
Automation bias and lack of questioning decisions by ML algorithms
The condition of a ‘human in the loop’ is not realised, or only formally
Unclear roles and responsibilities
Unclear alternative processes in case of system failures

3.1.8 Risk assessment: Project management and governance

Table 3.1: Aspects and contact persons: Project management and governance
Aspect	Roles												Tool
	Product owner	User	Helpdesk	Chief information officer	Project leader	Data analyst	Data engineer	Developer	Controller	IT security	Data protection official	Budget holder	Helper tool reference
Overall responsibility: Governance	x			x	x								A1, A7
Communication with data subjects (where applicable)				x							x		A7.007, A7.009
Policy for human-AI interactions	x				x								A7, A6.004, A6.009
Quality assurance plan	x				x			x					A7.003
Strategy for development/maintenance				x			x	x		x			A6, A7

Bibliography

[11] UK National Audit Office (2016): Framework to review models, https://www.nao.org.uk/report/framework-to-review-models/#:~:text=National%20Audit%20Office%20report%3A%20Framework%20to%20review%20models&text=This%20framework%20provides%20a%20structured%2C%20flexible%20approach%20to%20reviewing%20models.&text=The%20framework%20is%20based%20on,HM%20Treasury%20and%20international%20standards.

[12] M. Wieringa (2020): What to account for when accounting for algorithms, doi: https://doi.org/10.1145/3351095.3372833.

See Appendix One Personal data and GDPR in the context of ML and AI for a summary of relevant GDPR rules.↩︎
For example in the context of diagnosis codes or crime convictions, reconstructing which person was part of the training dataset can already be revealing personal information.↩︎