Gradient Boosting — Sequential Error Correction
Gradient Boosting builds trees sequentially. Each new tree fits the residual errors (gradients of the loss function) from all previous trees. The final prediction is the weighted sum of all trees. Unlike Random Forest (parallel, high-variance), Gradient Boosting reduces bias iteratively — starting from a constant prediction and improving step by step. The learning_rate controls how much each tree contributes.
Gradient Boosting — Theory and Key Parameters
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import r2_score
np.random.seed(42)
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)
# VISUALIZE HOW BOOSTING IMPROVES OVER ITERATIONS
gb_staged = GradientBoostingRegressor(n_estimators=300, learning_rate=0.1, max_depth=4, random_state=42)
gb_staged.fit(X_train, y_train)
train_scores = [r2_score(y_train, y_pred) for y_pred in gb_staged.staged_predict(X_train)]
test_scores = [r2_score(y_test, y_pred) for y_pred in gb_staged.staged_predict(X_test)]
plt.figure(figsize=(10, 5))
plt.plot(train_scores, "b-", label="Train R2", linewidth=1.5)
plt.plot(test_scores, "r-", label="Test R2", linewidth=1.5)
optimal_n = np.argmax(test_scores)
plt.axvline(optimal_n, color="green", linestyle="--", linewidth=2, label=f"Best n_estimators={optimal_n}")
plt.xlabel("Number of Boosting Iterations (Trees)")
plt.ylabel("R2 Score")
plt.title("Gradient Boosting: Performance vs Number of Trees")
plt.legend()
plt.tight_layout()
plt.savefig("gb_iteration_curve.png", dpi=100, bbox_inches="tight")
plt.show()
print(f"Optimal number of trees: {optimal_n} | Test R2: {max(test_scores):.4f}")
# KEY PARAMETERS
print("\nKey Gradient Boosting Parameters:")
param_guide = {
"n_estimators": "Number of trees. More = better fit, but risk overfitting. Use early stopping.",
"learning_rate": "Shrinkage: how much each tree contributes. Lower = better + more trees needed. Try 0.05-0.1.",
"max_depth": "Tree depth. Deeper = more complex. For GB: 3-6 is typical (shallower than RF).",
"subsample": "Fraction of rows per tree (stochastic GB). 0.7-0.9 reduce variance.",
"min_samples_leaf":"Minimum samples at leaf. Higher = more regularization.",
}
for param, desc in param_guide.items():
print(f" {param:20s}: {desc}")
# LEARNING RATE vs N_ESTIMATORS TRADEOFF
print("\nLearning rate vs Trees tradeoff (same time budget):")
for lr, n in [(0.001, 1000), (0.01, 500), (0.05, 300), (0.1, 200), (0.5, 50)]:
gb = GradientBoostingRegressor(n_estimators=n, learning_rate=lr, max_depth=4, random_state=42)
cv_r2 = cross_val_score(gb, X_train, y_train, cv=3, scoring="r2").mean()
print(f" lr={lr:5.3f}, n={n:4d}: CV R2 = {cv_r2:.4f}")Tip
Tip
Practice Gradient Boosting Sequential Error Correction in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
θ = θ - α × ∇L(θ). Too high α = diverge. Too low = slow.
Practice Task
Note
Practice Task — (1) Write a working example of Gradient Boosting Sequential Error Correction from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Gradient Boosting Sequential Error Correction is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.