Equality and fairness measures in classification models

The performance of classification models is usually evaluated based on the confusion matrix and derived metrics. In a binary classification problem, one class is defined as the positive outcome. For example [23], suppose a model is being built to predict whether a a criminal will reoffend after being released from prison, the positive outcome would be that they do indeed re-offend.

Expanding the criminal reoffending example, the true positives are cases where the model correctly identifies that a former convict will reoffend, and the true positive rate (TPR) is the fraction of cases where our algorithm correctly identifies a reoffender out of all the reoffenders within the population. The false positive rate (FPR) therefore describes the fraction of cases where the algorithm predicts reoffending, but the individual is law-abiding.

Where TPR and FPR evaluate the model prediction for a given actual outcome, another important aspect is the evaluation of the actual outcome against a model’s prediction: The positive predicted value (PPV), also known as precision, describes the fraction of positive predictions that are correct, while the negative predictive value (NPV) describes the fraction of correct negative predictions.

In our example, these two metrics could be summarised as:

TPR: How likely is it that a released prisoner who does reoffend is given a positive prediction (correctly identified),

versus

PPV: How likely is it that a released prisoner actually reoffends when a positive prediction has been be given.

A common approach to fairness is to demand different groups of people to be treated in the same way according to either of these indicators, with the first indicator leading to measures categorised as disparate treatment/mistreatment, procedural fairness or equality of opportunity, and the second classified as disparate impact, distributive justice or minimal inequality of outcome.

A third approach to group-based fairness is to demand that the fraction of predicted positives (predicted prevalence) is independent of any group affiliation, irrespective of possible differences in the actual fraction of positives (prevalence) in these groups.

In order to calculate measures for equal/unequal treatment of different groups of people by a model, and to assess the extent of such disparities, the following metrics are helpful:

Prevalence : fraction of actual positives
Predicted prevalence : fraction of predicted positives
False positive rate: fraction of false positives in all real negatives
True positive rate, also called recall : fraction of true positives in all real positives
Precision, also called positive predicted value: fraction of true positives in all predicted positives
Negative predictive value: fraction of true negatives in all predicted negatives

Based on these, the following group fairness metrics can be calculated:

Statistical parity (also called demographic parity): The predicted prevalence is the same between groups - that is the probability for positive or negative prediction is equal
Equalised odds (also called disparate mistreatment): Same TPR and same FPR - that is the probability for a positive prediction given a positive or negative truth is equal
Sufficiency (also called predictive rate parity): Same PPV and same NPV - that is the probability of a real positive or negative given a predicted positive or negative is equal

It is important for auditors to understand that in the common case of different prevalence in different groups, no imperfect model can satisfy any two of the three fairness metrics at the same time. It is therefore important to take the prevalence (often called base rates) into account when assessing the seriousness of violations of these fairness principles, as well as the magnitude of the difference and (obviously) the practical implications in the specific ML application.

The group-fairness metrics discussed here have the advantages that they can easily be calculated and that they reflect anti-discrimination legislation. Other fairness concepts auditors should be aware of include:

Fairness through unawareness: The naïve idea that an algorithm cannot discriminate with respect to a personal attribute if that attribute is not given to the model is too simplistic for ML applications used in public services, as it neglects correlations with attibutes contained in the data.
Individual fairness: Focusing on individual cases, this approach demands similar cases to be treated in a similar way by the model. It is much more challenging to calculate a metric for individual fairness compared to group fairness, as the notion of ‘similarity’ needs to be defined in appropriate distance measures in both the feature space (model input) and the prediction space (model output).
Counterfactual fairness: This approach tries to determine the influence of a personal attribute on the prediction by changing that attribute plus all correlated attributes. It can thus help to analyse the reasons and mechanisms behind a potential bias rather than just to observe and quantify it. It is, however, unclear how to implement this approach, as one needs to make sure all relevant variables and correlations are correctly taken into account, and define a causal graph that relates them.

A good overview of these and more fairness concepts are given in [24].

Bibliography

[23] A. Feller et al. (2016): A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear., The Washington Post, http://www.cs.yale.edu/homes/jf/Feller.pdf.

[24] S. Verma and J. Rubin (2018): Fairness Definitions Explained, http://fairware.cs.umass.edu/papers/Verma.pdf.