What is AI Ethics & Responsible AI?

Navigate AI bias, fairness metrics, model explainability with SHAP and LIME, AI regulations like the EU AI Act, and safety guardrails

AI Ethics & Responsible AI - Learn AI | TechLead

Why AI Ethics Matters Now

As AI systems make decisions about hiring, lending, healthcare, and criminal justice, the stakes of getting it wrong are enormous. Biased models perpetuate discrimination. Opaque models undermine trust. Irresponsible deployment causes real harm to real people.

Real-World AI Failures:

Amazon scrapped an AI hiring tool that discriminated against women
Healthcare algorithms gave less care to Black patients despite equal needs
Facial recognition has 10-100x higher error rates for darker skin tones
Predictive policing reinforces existing racial biases in arrest data

Types of Bias in AI

Data Bias

Training data reflects historical biases. If past hiring data favored men, the model will too. Underrepresented groups in training data get worse predictions.

Algorithmic Bias

Model design choices can amplify biases. Optimizing for overall accuracy may sacrifice minority group performance. Feature selection can introduce proxies for protected attributes.

Measurement Bias

What we measure as the "target" may be biased. Using arrests as a proxy for crime rate bakes in policing biases. Using grades as a proxy for ability reflects systemic inequities.

Deployment Bias

Models deployed in contexts they were not designed for. A model trained on one demographic applied universally. Automation bias: over-trusting model outputs.

Fairness Metrics

import numpy as np
from sklearn.metrics import confusion_matrix

def compute_fairness_metrics(y_true, y_pred, protected_attr):
    """
    Compute key fairness metrics across demographic groups.

    Common fairness definitions:
    - Demographic Parity: P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1)
    - Equal Opportunity: P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1)
    - Equalized Odds: Equal TPR and FPR across groups
    """
    results = {}

    for group_val in np.unique(protected_attr):
        mask = protected_attr == group_val
        y_t = y_true[mask]
        y_p = y_pred[mask]

        tn, fp, fn, tp = confusion_matrix(y_t, y_p, labels=[0,1]).ravel()

        results[f"group_{group_val}"] = {
            "selection_rate": y_p.mean(),             # demographic parity
            "true_positive_rate": tp / (tp + fn) if (tp + fn) > 0 else 0,  # equal opportunity
            "false_positive_rate": fp / (fp + tn) if (fp + tn) > 0 else 0,
            "accuracy": (tp + tn) / len(y_t),
            "count": len(y_t)
        }

    # Calculate disparities
    groups = list(results.values())
    results["disparity"] = {
        "selection_rate_ratio": groups[0]["selection_rate"] / max(groups[1]["selection_rate"], 1e-8),
        "tpr_difference": abs(groups[0]["true_positive_rate"] - groups[1]["true_positive_rate"]),
        "fpr_difference": abs(groups[0]["false_positive_rate"] - groups[1]["false_positive_rate"]),
    }

    return results

# Example: Loan approval model
np.random.seed(42)
n = 1000
y_true = np.random.randint(0, 2, n)
y_pred = np.random.randint(0, 2, n)
gender = np.random.choice([0, 1], n)  # 0=female, 1=male

metrics = compute_fairness_metrics(y_true, y_pred, gender)
for group, vals in metrics.items():
    print(f"\n{group}:")
    for k, v in vals.items():
        print(f"  {k}: {v:.4f}" if isinstance(v, float) else f"  {k}: {v}")

# The 4/5ths (80%) rule: selection rate of protected group
# should be at least 80% of the majority group's rate

Model Explainability: SHAP and LIME

import numpy as np
# SHAP: SHapley Additive exPlanations
# Based on game theory: each feature's contribution to the prediction
import shap
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Train a model
X, y = make_classification(n_samples=500, n_features=10,
                           n_informative=5, random_state=42)
feature_names = [f"feature_{i}" for i in range(10)]
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# SHAP explainer
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X[:100])

# Global feature importance
# shap.summary_plot(shap_values[1], X[:100], feature_names=feature_names)

# Explain a single prediction
idx = 0
print(f"Prediction for sample {idx}: {model.predict(X[idx:idx+1])[0]}")
print(f"Prediction probability: {model.predict_proba(X[idx:idx+1])[0]}")
print(f"\nTop feature contributions (SHAP values):")
sv = shap_values[1][idx] if isinstance(shap_values, list) else shap_values[idx]
for name, val in sorted(zip(feature_names, sv), key=lambda x: abs(x[1]), reverse=True)[:5]:
    direction = "increases" if val > 0 else "decreases"
    print(f"  {name}: {val:+.4f} ({direction} prediction)")

# LIME: Local Interpretable Model-agnostic Explanations
# Approximates the model locally with a simple, interpretable model
import lime.lime_tabular

lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    X, feature_names=feature_names, class_names=['No', 'Yes']
)
exp = lime_explainer.explain_instance(X[0], model.predict_proba)
print("\nLIME explanation:")
for feature, weight in exp.as_list()[:5]:
    print(f"  {feature}: {weight:+.4f}")

AI Regulations: The Landscape

EU AI Act (2024)

World's first comprehensive AI law. Risk-based framework: banned (social scoring), high-risk (hiring, credit), limited risk (chatbots), minimal risk (spam filters). Fines up to 7% of global revenue.

US Executive Order on AI (2023)

Requires safety testing for powerful AI models. Mandates watermarking of AI-generated content. Directs agencies to develop sector-specific guidance.

Industry Self-Regulation

Model cards (Hugging Face), responsible AI principles (Google, Microsoft, Anthropic), voluntary commitments on safety testing, red-teaming, and transparency.

Hallucination Mitigation and Safety Guardrails

# Strategies for reducing LLM hallucinations and unsafe outputs

# 1. Retrieval-Augmented Generation (RAG)
# Ground responses in factual documents
def rag_pipeline(query, knowledge_base):
    """
    Instead of relying on model memory, retrieve relevant
    documents and include them in the prompt.
    """
    relevant_docs = knowledge_base.search(query, top_k=3)
    context = "\n".join(doc.text for doc in relevant_docs)

    prompt = f"""Answer based ONLY on the following context.
If the answer is not in the context, say "I don't know."

Context: {context}

Question: {query}
Answer:"""
    return prompt

# 2. Output validation and guardrails
class SafetyGuardrails:
    def __init__(self):
        self.blocked_topics = ["harmful", "illegal", "dangerous"]
        self.pii_patterns = [
            r'\b\d{3}-\d{2}-\d{4}\b',  # SSN
            r'\b\d{16}\b',                 # credit card
        ]

    def check_input(self, text):
        """Screen inputs for safety."""
        text_lower = text.lower()
        for topic in self.blocked_topics:
            if topic in text_lower:
                return False, f"Blocked: contains '{topic}'"
        return True, "OK"

    def check_output(self, text):
        """Screen outputs for PII leakage and safety."""
        import re
        for pattern in self.pii_patterns:
            if re.search(pattern, text):
                return False, "Output contains potential PII"
        return True, "OK"

    def add_confidence(self, response, sources):
        """Add confidence indicators to responses."""
        if not sources:
            return response + "\n\n(Note: This response is not grounded in verified sources.)"
        return response + f"\n\n(Sources: {', '.join(sources)})"

# 3. Constitutional AI approach
# Train the model to self-critique and revise its responses
# based on a set of principles (Anthropic's approach for Claude)

guardrails = SafetyGuardrails()
safe, msg = guardrails.check_input("How do I bake a cake?")
print(f"Input check: {msg}")

print("\nKey strategies: RAG for grounding, guardrails for safety,")
print("Constitutional AI for self-correction")

Key Takeaways

AI bias comes from data, algorithms, measurement, and deployment -- address all four
Use fairness metrics (demographic parity, equal opportunity) alongside accuracy
SHAP and LIME provide post-hoc explanations for any model's predictions
The EU AI Act is the first comprehensive AI regulation; expect more worldwide
Combine RAG, guardrails, and Constitutional AI to mitigate hallucinations and unsafe outputs

AI Ethics & Responsible AI