Appendix 4: Glossary
This glossary adds details on the terminology used throughout this paper. A special focus is placed on the technical terms concerning algorithms and AI.
| Artificial Intelligence (AI) | Artificial intelligence systems are human-made technologies designed to perform tasks that typically require intelligence. These systems learn from data and adapt to achieve specific goals, often improving performance without explicit reprogramming. AI encompasses fields such as machine learning, natural language processing, computer vision, speech recognition, planning, and robotics. It enables capabilities like perception, reasoning, decision-making, and problem-solving. Most AI systems rely on machine learning, where models learn patterns from data to achieve specific goals. |
| Cross-Industry Standard Process for Data Mining (CRISP-DM) | CRISP-DM is a standard process for designing and developing ML and AI systems. It consists of six phases: * Business Understanding * Data Understanding * Data Preparation * Modeling * Evaluation * Deployment |
| Data poisoning | Manipulated examples are added to the training data on purpose to degrade the performance of the AI system or manipulate its behaviour in a specific way. |
| Deep Learning | In Deep Learning, AI systems combine several processing layers to solve complex tasks. |
| Explainability (by design) | Explainability is the ability to describe how the system arrives at its outputs or decisions. Explainability by design is a methodology that requires ensuring explainability from the outset during the design and development of an AI system. |
| Fairness (by design) | Fairness means treating individuals and groups equitably, avoiding negative impacts based on characteristics such as gender, ethnicity, or location. Unfair outcomes can result in discrimination, financial loss, or other social disadvantages. Fairness by design is a concept that implements fairness from the outset during the design and development of an AI system, rather than fixing it later. |
| Feature engineering | Feature engineering is the process of transforming raw data into meaningful variables for the model |
| Fine-tuning | With Fine-tuning, an LLM is adjusted using specific examples, making its answers more accurate for certain topics. |
| Foundation models or general-purpose AI models | Foundation models or general-purpose AI models are trained on very large and diverse datasets. They use deep learning techniques and can be adapted for a wide range of applications, from image recognition to scientific research. |
| Generative AI | Generative AI systems are designed to create new content (such as text, images, video, or audio) by learning from large datasets. They can produce realistic outputs, often with little or no human guidance. |
| Hallucination | A hallucination in AI is when an AI system generates output that contains incorrect information or is not based on real facts. |
| Hyperparameter | Hyperparameters are used to control the model architecture and the learning process in machine learning. These parameters are controlled by the coder, unlike the values of the model parameters which are derived via training. |
| Large Language Models (LLMs) | Large Language Models are a key type of foundation model, trained mainly on text. They support tasks such as translation, summarisation, and conversation by generating human-like language. |
| Machine Learning (ML), supervised/unsupervised/reinforcement | Machine learning is a method for AI systems to learn patterns from data and improve performance without being explicitly programmed. * Supervised learning: The AI system learns from labelled examples to make predictions. * Unsupervised learning: The AI system finds patterns or groups in unlabelled data. * Reinforcement learning: The AI system learns by trial and error, receiving “rewards” or “penalties” for actions. |
| MLOPs | MLOps is a set of practices that help manage, test, and update machine learning models. |
| Overfitting | An AI system adapts too closely to its training data. As a result, it does not perform well on new data. |
| Predictive AI | Predictive AI systemsuse historical data to identify patterns and make predictions about new cases. For example, they can estimate the likelihood of illness based on health records or flag potential errors in tax returns. These systems are commonly used for classification (such as sorting cases into categories) or regression (such as predicting a numerical value). |
| Privacy (by design) | Privacy incorporates protecting all personal data that is collected, used, shared or stored by the AI system. Privacy by design is a concept that implements privacy from the outset during the design and development of an AI system, rather than fixing it later. |
| Prompt | Prompts are the user input given to generative AI systems. The term is mainly used in the context of LLMs. |
| Reliability | Reliability refers to the model’s ability to perform consistently and accurately under expected conditions, particularly when handling data similar to that used in training. |
| Reproducibility | Reproducibility is the ability to recreate results using the same data, parameters, and environment. |
| Retrieval Augmented Generation (RAG) | Retrieval-augmented generation (RAG) is an approach that enables LLMs to draw on trusted sources of information when generating responses. By integrating relevant documents or data into the model’s context, RAG can help address persistent challenges in generative AI, such as hallucinations, outdated information, and the absence of source citations. |
| Robustness | Robustness is the model’s ability to maintain performance when faced with unexpected inputs – such as data from a different distribution to the data it was trained on, or ‘out of distribution’ data; noise; or changes in the environment. |
| Training, validation and test data | Initially training data is used to fit an optimal performing ML model. The validation data is then used to provide an evaluation of the model fit whilst tuning the hyperparameters. The test data is used last to provide a final evaluation of the model’s performance. |
| Transparency (by design) | Transparency means providing clear information about how, when, and why an AI system is used. Transparency by design is a methodology that requires ensuring transparency during every step of the design and development of an AI system. |
Abbreviations
AI Artificial intelligence
BSI German Federal Office for Information Security
CRISP-DM Cross-industry standard process for data mining69
DPIA Data protection impact assessment
FLOP Floating point operation
GDPR General Data Protection Regulation70
GPU Graphics processing unit
ICO Information commissioner’s office
LGPD Brazil’s General Data Protection Law
LLM Large Language Model
ML Machine learning
NIST U.S. National Institute of Standards and Technology
OECD Organisation for Economic Co-operation and Development
RAG Retrieval Augmented Generation
SAI Supreme Audit Institution
TCU Brazil’s Federal Court of Accounts
A. Clark (2018): The Machine Learning Audit–CRISP-DM Framework.↩︎
The European Parliament and of the Council of the European Union (2016): Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).↩︎