Confusion Matrix — Beyond Accuracy
Accuracy is deceiving on imbalanced data: a model predicting 'no fraud' always achieves 99.9% accuracy on a dataset with 0.1% fraud rate but catches ZERO fraud. The confusion matrix breaks performance into four quadrants: True Positives, True Negatives, False Positives (Type I errors), and False Negatives (Type II errors). The cost of each error type drives which metric to optimize.
Confusion Matrix, Precision, Recall, and F1
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import (confusion_matrix, classification_report,
precision_score, recall_score, f1_score, accuracy_score)
np.random.seed(42)
X, y = make_classification(n_samples=5000, n_features=15, weights=[0.92, 0.08], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# NAIVE MODEL: always predict majority class
y_naive = np.zeros(len(y_test), dtype=int)
# REAL MODEL
model = RandomForestClassifier(n_estimators=100, class_weight="balanced", random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# PLOT CONFUSION MATRICES SIDE BY SIDE
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, (y_p, title) in zip(axes, [(y_naive, "Naive Model (always 0)"), (y_pred, "Random Forest")]):
cm = confusion_matrix(y_test, y_p)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax,
xticklabels=["Pred 0", "Pred 1"], yticklabels=["Actual 0", "Actual 1"])
acc = accuracy_score(y_test, y_p)
prec = precision_score(y_test, y_p, zero_division=0)
rec = recall_score(y_test, y_p, zero_division=0)
f1 = f1_score(y_test, y_p, zero_division=0)
ax.set_title(f"{title}\nAcc={acc:.2%} | Prec={prec:.2%} | Rec={rec:.2%} | F1={f1:.2%}")
plt.tight_layout()
plt.savefig("confusion_matrix.png", dpi=100, bbox_inches="tight")
plt.show()
# METRIC DEFINITIONS
metrics = {
"Accuracy": "Correct / Total -- misleading on imbalanced data",
"Precision": "TP / (TP+FP) -- of all predicted positives, how many are correct?",
"Recall": "TP / (TP+FN) -- of all actual positives, how many did we find?",
"F1 Score": "2 * (P*R)/(P+R) -- harmonic mean of precision and recall",
"Specificity": "TN / (TN+FP) -- of all actual negatives, how many correctly identified?",
}
print("Metric Definitions:")
for metric, definition in metrics.items():
print(f" {metric:15s}: {definition}")
# WHEN TO OPTIMIZE WHICH METRIC
optimization_guide = {
"Medical diagnosis": "Maximize RECALL -- missing a disease is worse than false alarm",
"Spam detection": "Maximize PRECISION -- false positives (real email marked spam) are costly",
"Fraud detection": "F1 or AUC -- balance precision/recall with cost-based threshold",
"Credit approval": "Maximize RECALL for bad loans -- missing default is expensive",
"Content moderation": "High RECALL -- better to over-flag than miss harmful content",
}
print("\nWhen to optimize which metric:")
for domain, guidance in optimization_guide.items():
print(f" {domain:25s}: {guidance}")
print("\nFull classification report:")
print(classification_report(y_test, y_pred))Tip
Tip
Practice Confusion Matrix Beyond Accuracy in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
F1 = harmonic mean of Precision and Recall (balanced metric)
Practice Task
Note
Practice Task — (1) Write a working example of Confusion Matrix Beyond Accuracy from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Confusion Matrix Beyond Accuracy is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.