Personal data and GDPR in the context of ML and AI
Datatilsynet, the Norwegian data protection authority, summarised the most important challenges relating to the use of personal data in ML algorithms in a dedicated report . Relevant considerations for the auditor are:
Purpose limitation: Personal data may only be collected for a specific, expressly stated purpose. Any further processing has to be compatible with the original purpose, with some limited exceptions for scientific research. Use of ML in decision-making on new data has to be included in the purpose statement. When an ML algorithm is trained on historical data (possibly collected before the ML project started), further processing of this kind must be covered by the original purpose. In some cases (for example, medical applications), the development of the ML algorithm might be considered to be scientific research.
Data minimisation: The use of personal data has to be limited to what is necessary to fulfil the purpose it was collected for. This is challenging during the development of ML, where data is often used in training to later test its impact on performance. Even if personal data is not used in the final algorithm, this testing procedure already counts as processing of personal data and thus is protected under the EU’s General Data Protection Regulation (GDPR). For auditors it is therefore important to review both the variables used in the final algorithm and their importance for the performance, as well as the development process.
Proportionality: The data minimisation principle also restricts the degree of interference with a person’s privacy. The amount and nature of the data used has to be proportionate to the purpose and the least invasive for the data subject (for example, facial recognition to measure school attendance is considered out of proportion even when consent has been obtained ).
Transparency: Explainable processes and decisions are a general requirement for public services, not limited to the use of personal data, but of even higher importance if personal data is involved. This aspect is treated in detail in the proposed ML audit methodology.
If the use of ML poses a high risk to a person’s rights and freedoms, then a data protection impact assessment (DPIA) is mandatory. A DPIA is also required in the following cases [20,22]:
Profiling (and similar) with significant effect, where profiling is defined as “any automated processing of personal data with the objective to evaluate personal aspects about a natural person, in particular predictions of a natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behaviour, location or movements” 
Large scale use of sensitive data.
Systematic monitoring of a publicly accessible area on a large scale
The responsibility to guarantee compliance with GDPR, including all of the points above, lies with the authority developing and using the ML algorithm. A review of the development documentation should suffice for auditors.