Bias Detection and Mitigation: Building Fairer AI Models

Explore the technical strategies, fairness metrics, and algorithmic mitigation techniques developers can use to identify and resolve bias in machine learning models.

Published on • 2026-06-01

AI Assistant

As machine learning systems increasingly influence critical decisions—from hiring and loan approvals to healthcare and criminal justice—ensuring fairness has transitioned from a philosophical ideal to a technical imperative. AI models reflect the data they are trained on, and if that data contains historical biases or systemic imbalances, the model will naturally codify and amplify those inequalities.

In this tutorial, we will walk through the core developer strategies, mathematical fairness metrics, and algorithmic techniques for bias detection and mitigation using Python and the AIF360 (AI Fairness 360) framework.

Prerequisites

To follow this tutorial, you should have:

Python 3.10+ installed
Basic knowledge of Python data science tools (pandas, scikit-learn)
A fresh virtual environment

Install the necessary libraries:

pip install pandas scikit-learn aif360[all]

Understanding Bias and Fairness Metrics

In algorithmic fairness, we categorize bias into three distinct interventions:

Pre-processing: Mitigating bias in the training dataset before model training.
In-processing: Modifying the model training procedure (e.g., adding a fairness constraint to the loss function).
Post-processing: Adjusting the predictions of a trained model to enforce fairness constraints.

To measure bias, we rely on standard statistical metrics:

Disparate Impact (DI): The ratio of the selection rate of the unprivileged group to that of the privileged group. A value of $1.0$ is perfect equity; $0.8$ or below indicates significant bias.
Statistical Parity Difference (SPD): The difference in the rate of favorable outcomes received by the unprivileged group compared to the privileged group (ideally $0$).
Equalized Odds: The model should have equal True Positive Rates (TPR) and False Positive Rates (FPR) across both groups.

Step 1: Loading Data and Detecting Baseline Bias

Let’s write a Python script to load a synthetic credit-scoring dataset, detect bias against a protected attribute (e.g., age or gender), and train a baseline logistic regression model.

Create a file named bias_detector.py:

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import ClassificationMetric

# 1. Generate Synthetic Imbalanced Credit Data
np.random.seed(42)
n_samples = 1000

# Attribute: 'age' (1: Older/Privileged, 0: Younger/Unprivileged)
age = np.random.binomial(1, 0.7, n_samples)
# Score is biased: Older candidates are assigned higher income and credit scores in the historical data
credit_score = 600 + age * 100 + np.random.normal(0, 50, n_samples)
# Label: 'approved' (1: Good credit risk, 0: Bad risk)
approved = (credit_score > 680).astype(int)

df = pd.DataFrame({'age': age, 'credit_score': credit_score, 'approved': approved})

# 2. Convert to AIF360 Format
dataset = BinaryLabelDataset(
    df=df,
    label_names=['approved'],
    protected_attribute_names=['age'],
    favorable_label=1,
    unfavorable_label=0
)

# Define Privileged and Unprivileged groups
privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]

# 3. Train Test Split
train, test = dataset.split([0.7], shuffle=True)

# Train Baseline Logistic Regression
X_train, y_train = train.features[:, 1:], train.labels.ravel()
X_test, y_test = test.features[:, 1:], test.labels.ravel()

clf = LogisticRegression()
clf.fit(X_train, y_train)
preds = clf.predict(X_test)

# 4. Measure Baseline Bias
test_pred = test.copy()
test_pred.labels = preds.reshape(-1, 1)

metric = ClassificationMetric(
    test, test_pred,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)

print(f"Baseline Accuracy: {accuracy_score(y_test, preds):.4f}")
print(f"Disparate Impact (DI): {metric.disparate_impact():.4f}")
print(f"Statistical Parity Difference: {metric.statistical_parity_difference():.4f}")

Run this script:

python bias_detector.py

You should see a high accuracy but a Disparate Impact close to 0.30—indicating strong, unfair bias towards younger applicants!

Step 2: Mitigating Bias (Pre-Processing)

One popular pre-processing mitigation algorithm is Reweighing. It assigns different weights to the training examples based on their protected attributes and labels to ensure fairness before feeding them into a standard classifier.

Let’s modify bias_detector.py to include reweighing:

from aif360.algorithms.preprocessing import Reweighing

# Initialize Reweighing Transformer
RW = Reweighing(
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)
train_transformed = RW.fit_transform(train)

# Get sample weights for training
sample_weights = train_transformed.instance_weights

# Train Logistic Regression using Reweighed Dataset
clf_mitigated = LogisticRegression()
clf_mitigated.fit(X_train, y_train, sample_weight=sample_weights)
preds_mitigated = clf_mitigated.predict(X_test)

test_pred_mitigated = test.copy()
test_pred_mitigated.labels = preds_mitigated.reshape(-1, 1)

metric_mitigated = ClassificationMetric(
    test, test_pred_mitigated,
    unprivileged_groups=unprivileged_groups,
    privileged_groups=privileged_groups
)

print("\n--- After Pre-processing Mitigation (Reweighing) ---")
print(f"Mitigated Accuracy: {accuracy_score(y_test, preds_mitigated):.4f}")
print(f"Mitigated Disparate Impact (DI): {metric_mitigated.disparate_impact():.4f}")
print(f"Mitigated Statistical Parity Difference: {metric_mitigated.statistical_parity_difference():.4f}")

After reweighing, you’ll see the Disparate Impact metric shift much closer to $1.0$, drastically lowering the systemic bias while retaining high overall predictive performance!

Summary

In this guide, you successfully:

Formulated a standard credit classification problem with a protected feature attribute (age).
Used AIF360 to measure baseline algorithmic bias using metrics like Disparate Impact.
Mitigated systemic data imbalance using Reweighing pre-processing.

Ensuring your models are fair is an iterative lifecycle task. Moving forward, consider testing In-processing (e.g., Adversarial Debiasing) and Post-processing algorithms (e.g., Reject Option Classification) depending on your production architecture limits.

ethical-ai machine-learning data-science python tutorial