SHAP Values — Explaining Ensemble Predictions
SHAP (SHapley Additive exPlanations) is the gold standard for ML model interpretability. Based on game theory, SHAP assigns each feature a value representing its contribution to the specific prediction — positive SHAP values push the prediction higher, negative push it lower. Tree models compute SHAP values in milliseconds. SHAP is now required by regulators in finance and healthcare to explain individual predictions.
SHAP for Tree Model Interpretation
import numpy as np
import pandas as pd
import shap
import matplotlib.pyplot as plt
import xgboost as xgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# TRAIN A FAST XGBOOST MODEL
model = xgb.XGBRegressor(n_estimators=200, learning_rate=0.1, max_depth=5, verbosity=0, random_state=42)
model.fit(X_train, y_train)
print(f"Test R2: {model.score(X_test, y_test):.4f}")
# COMPUTE SHAP VALUES
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test) # shape: (n_test, n_features)
print(f"SHAP values shape: {shap_values.shape}")
print(f"Base value (global mean prediction): {explainer.expected_value:.3f}")
# INDIVIDUAL PREDICTION EXPLANATION
sample_idx = 0
sample_pred = model.predict(X_test.iloc[[sample_idx]])[0]
print(f"\nExplaining prediction #{sample_idx}: {sample_pred:.3f} (actual: {y_test.iloc[sample_idx]:.3f})")
print(f" Sum of SHAP values: {shap_values[sample_idx].sum():.3f}")
print(f" Base value: {explainer.expected_value:.3f}")
print(f" Final prediction: {explainer.expected_value + shap_values[sample_idx].sum():.3f}")
print("\nTop SHAP contributors:")
sample_shap_df = pd.DataFrame({"Feature": housing.feature_names, "SHAP": shap_values[sample_idx], "Value": X_test.iloc[sample_idx].values})
sample_shap_df = sample_shap_df.reindex(sample_shap_df["SHAP"].abs().sort_values(ascending=False).index)
for _, row in sample_shap_df.head(5).iterrows():
direction = "increases" if row["SHAP"] > 0 else "decreases"
print(f" {row['Feature']:12s}={row['Value']:.3f}: SHAP={row['SHAP']:+.3f} ({direction} prediction by {abs(row['SHAP']):.3f})")
# GLOBAL FEATURE IMPORTANCE FROM SHAP (mean |SHAP|)
mean_abs_shap = pd.DataFrame({"Feature": housing.feature_names, "Mean |SHAP|": np.abs(shap_values).mean(axis=0)}).sort_values("Mean |SHAP|", ascending=False)
print("\nGlobal feature importance (mean |SHAP| on test set):")
print(mean_abs_shap.round(4).to_string(index=False))
# SUMMARY PLOT
plt.figure(figsize=(9, 6))
shap.summary_plot(shap_values, X_test, plot_type="dot", show=False)
plt.title("SHAP Summary Plot -- Red=high value, Blue=low value")
plt.tight_layout()
plt.savefig("shap_summary.png", dpi=100, bbox_inches="tight")
plt.show()Tip
Tip
Practice SHAP Values Explaining Ensemble Predictions in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of SHAP Values Explaining Ensemble Predictions from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with SHAP Values Explaining Ensemble Predictions is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.