SHAP Values — Explaining Ensemble Predictions

SHAP (SHapley Additive exPlanations) is the gold standard for ML model interpretability. Based on game theory, SHAP assigns each feature a value representing its contribution to the specific prediction — positive SHAP values push the prediction higher, negative push it lower. Tree models compute SHAP values in milliseconds. SHAP is now required by regulators in finance and healthcare to explain individual predictions.

20 min•By Priygop Team•Updated 2026

SHAP for Tree Model Interpretation

import numpy as np
import pandas as pd
import shap
import matplotlib.pyplot as plt
import xgboost as xgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TRAIN A FAST XGBOOST MODEL
model = xgb.XGBRegressor(n_estimators=200, learning_rate=0.1, max_depth=5, verbosity=0, random_state=42)
model.fit(X_train, y_train)
print(f"Test R2: {model.score(X_test, y_test):.4f}")

# COMPUTE SHAP VALUES
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)  # shape: (n_test, n_features)

print(f"SHAP values shape: {shap_values.shape}")
print(f"Base value (global mean prediction): {explainer.expected_value:.3f}")

# INDIVIDUAL PREDICTION EXPLANATION
sample_idx = 0
sample_pred = model.predict(X_test.iloc[[sample_idx]])[0]
print(f"\nExplaining prediction #{sample_idx}: {sample_pred:.3f} (actual: {y_test.iloc[sample_idx]:.3f})")
print(f"  Sum of SHAP values: {shap_values[sample_idx].sum():.3f}")
print(f"  Base value:         {explainer.expected_value:.3f}")
print(f"  Final prediction:   {explainer.expected_value + shap_values[sample_idx].sum():.3f}")
print("\nTop SHAP contributors:")
sample_shap_df = pd.DataFrame({"Feature": housing.feature_names, "SHAP": shap_values[sample_idx], "Value": X_test.iloc[sample_idx].values})
sample_shap_df = sample_shap_df.reindex(sample_shap_df["SHAP"].abs().sort_values(ascending=False).index)
for _, row in sample_shap_df.head(5).iterrows():
    direction = "increases" if row["SHAP"] > 0 else "decreases"
    print(f"  {row['Feature']:12s}={row['Value']:.3f}: SHAP={row['SHAP']:+.3f} ({direction} prediction by {abs(row['SHAP']):.3f})")

# GLOBAL FEATURE IMPORTANCE FROM SHAP (mean |SHAP|)
mean_abs_shap = pd.DataFrame({"Feature": housing.feature_names, "Mean |SHAP|": np.abs(shap_values).mean(axis=0)}).sort_values("Mean |SHAP|", ascending=False)
print("\nGlobal feature importance (mean |SHAP| on test set):")
print(mean_abs_shap.round(4).to_string(index=False))

# SUMMARY PLOT
plt.figure(figsize=(9, 6))
shap.summary_plot(shap_values, X_test, plot_type="dot", show=False)
plt.title("SHAP Summary Plot -- Red=high value, Blue=low value")
plt.tight_layout()
plt.savefig("shap_summary.png", dpi=100, bbox_inches="tight")
plt.show()

Tip

Practice SHAP Values Explaining Ensemble Predictions in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of SHAP Values Explaining Ensemble Predictions from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with SHAP Values Explaining Ensemble Predictions is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module