Regression Metrics — MAE, MSE, RMSE, R², MAPE
Choosing the right metric depends on your error tolerance and domain. MAE is robust to outliers (median-like). RMSE penalizes large errors more — use when big errors are costly (safety systems). R² measures explained variance — useful for comparing models. MAPE measures percentage error — useful when errors should scale with the magnitude of predictions. Always choose your metric BEFORE training, based on business requirements.
Regression Metric Deep Dive
import numpy as np
import pandas as pd
from sklearn.metrics import (mean_absolute_error, mean_squared_error,
r2_score, mean_absolute_percentage_error)
# ACTUAL vs PREDICTED values
y_true = np.array([100, 200, 150, 300, 250, 180, 220, 400, 90, 350])
y_pred = np.array([110, 195, 160, 285, 270, 175, 230, 380, 95, 360])
# --- METRICS ---
mae = mean_absolute_error(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_true, y_pred)
mape = mean_absolute_percentage_error(y_true, y_pred)
print("Regression Metrics Summary:")
print(f" MAE (Mean Absolute Error): {mae:.2f} -- average |error|, same unit as target")
print(f" MSE (Mean Squared Error): {mse:.2f} -- penalizes large errors more")
print(f" RMSE (Root Mean Squared Error): {rmse:.2f} -- same unit as target, MSE penalty")
print(f" R2 (Coefficient of Det.): {r2:.4f} -- fraction of variance explained")
print(f" MAPE (Mean Abs % Error): {mape:.2%} -- scale-independent percentage")
# WHEN OUTLIER INFLATES RMSE vs MAE
y_true_with_outlier = np.array([100, 200, 150, 300, 5000]) # one huge outlier
y_pred_with_outlier = np.array([110, 195, 160, 285, 290]) # terrible prediction for outlier
print("\nWith outlier (true=5000, pred=290):")
print(f" MAE: {mean_absolute_error(y_true_with_outlier, y_pred_with_outlier):.1f} (not too bad)")
print(f" RMSE: {np.sqrt(mean_squared_error(y_true_with_outlier, y_pred_with_outlier)):.1f} (dominated by outlier!)")
# NEGATIVE R2 EXAMPLE
y_bad = np.array([500, 400, 600, 300, 700]) # very wrong predictions
r2_bad = r2_score(y_true[:5], y_bad)
print(f" R2 can be negative: {r2_bad:.2f} (worse than always predicting mean!)")
# METRIC SELECTION GUIDE
print("\nMetric Selection Guide:")
metric_guide = {
"MAE": "General purpose, robust to outliers, interpretable (avg error in $ or kg)",
"RMSE": "When large errors are costly (safety, finance) -- penalizes outlier predictions",
"R2": "Comparing models on same dataset, understanding explained variance (0=bad, 1=perfect)",
"MAPE": "When prediction errors should scale: 10% error on $100 vs $10,000",
"R2 < 0":"Model is WORSE than predicting the mean -- definitely broken",
"R2 > 0.9":"Very high -- check for data leakage or overfitting",
}
for metric, guidance in metric_guide.items():
print(f" {metric:12s}: {guidance}")Tip
Tip
Practice Regression Metrics MAE MSE RMSE R MAPE in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Simplest ML. y = mx + b. Minimize MSE.
Practice Task
Note
Practice Task — (1) Write a working example of Regression Metrics MAE MSE RMSE R MAPE from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Regression Metrics MAE MSE RMSE R MAPE is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.