Evaluation for Regression — Beyond R²

Regression evaluation goes deeper than a single R² score. Prediction intervals give confidence bounds. Residual analysis reveals systematic errors. Quantile regression evaluates performance across the prediction distribution. Comparing models on multiple metrics prevents overfitting to a single score and gives a more complete picture of model quality.

15 min•By Priygop Team•Updated 2026

Comprehensive Regression Evaluation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.metrics import (mean_absolute_error, mean_squared_error,
                              r2_score, mean_absolute_percentage_error)

housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)

model = GradientBoostingRegressor(n_estimators=200, learning_rate=0.1, max_depth=4, random_state=42)
model.fit(X_train, y_train)
y_pred    = model.predict(X_test)
residuals = y_test - y_pred

# MULTIPLE REGRESSION METRICS
metrics = {
    "MAE":  mean_absolute_error(y_test, y_pred),
    "RMSE": np.sqrt(mean_squared_error(y_test, y_pred)),
    "R2":   r2_score(y_test, y_pred),
    "MAPE": mean_absolute_percentage_error(y_test, y_pred),
    "Max error": np.abs(residuals).max(),
    "% within 10%": (np.abs(residuals) < 0.1 * y_test).mean(),
}

print("Regression Evaluation Summary:")
for name, val in metrics.items():
    print(f"  {name:20s}: {val:.4f}")

# RESIDUAL ANALYSIS BY PREDICTED VALUE RANGE
bins = pd.cut(y_pred, bins=5)
residual_df = pd.DataFrame({"y_pred": y_pred, "residual": residuals, "bin": bins})
print("\nResiduals by prediction range (systematic bias?):")
print(residual_df.groupby("bin", observed=True)["residual"].describe().round(4))

# VISUALIZE: ACTUAL vs PREDICTED
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Scatter: actual vs predicted
axes[0].scatter(y_test, y_pred, alpha=0.3, s=15, color="steelblue")
lims = [min(y_test.min(), y_pred.min()), max(y_test.max(), y_pred.max())]
axes[0].plot(lims, lims, "r--", linewidth=2, label="Perfect")
axes[0].set_xlabel("Actual House Value")
axes[0].set_ylabel("Predicted House Value")
axes[0].set_title(f"Actual vs Predicted\nR2={r2_score(y_test, y_pred):.3f}")
axes[0].legend()

# Residual distribution
axes[1].hist(residuals, bins=50, color="steelblue", edgecolor="white", alpha=0.8)
axes[1].axvline(0, color="red", linestyle="--")
axes[1].set_title(f"Residual Distribution\nmean={residuals.mean():.3f}, std={residuals.std():.3f}")
axes[1].set_xlabel("Residual (Actual - Predicted)")

# Residuals vs predicted
axes[2].scatter(y_pred, residuals, alpha=0.3, s=15, color="coral")
axes[2].axhline(0, color="red", linestyle="--")
axes[2].set_xlabel("Predicted Value")
axes[2].set_ylabel("Residual")
axes[2].set_title("Residuals vs Fitted\n(should be random scatter around 0)")

plt.tight_layout()
plt.savefig("regression_evaluation.png", dpi=100, bbox_inches="tight")
plt.show()

Tip

Practice Evaluation for Regression Beyond R in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Simplest ML. y = mx + b. Minimize MSE.

Practice Task

Note

Practice Task — (1) Write a working example of Evaluation for Regression Beyond R from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Evaluation for Regression Beyond R is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module