XGBoost & LightGBM — Production Gradient Boosting
XGBoost and LightGBM are optimized implementations of gradient boosting used by 90%+ of Kaggle competition winners. XGBoost adds L1/L2 regularization and second-order gradient approximation. LightGBM uses leaf-wise tree growth (faster, lower memory), histogram-based splits, and native categorical feature support. Both support early stopping, GPU training, and built-in missing value handling.
XGBoost and LightGBM — Production Usage
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import r2_score
import xgboost as xgb
import lightgbm as lgb
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)
# XGBOOST
xgb_model = xgb.XGBRegressor(
n_estimators=500,
learning_rate=0.05,
max_depth=6,
subsample=0.8,
colsample_bytree=0.8, # use 80% of features per tree
reg_alpha=0.1, # L1 regularization (like Lasso)
reg_lambda=1.0, # L2 regularization (like Ridge)
early_stopping_rounds=20,
eval_metric="rmse",
random_state=42,
n_jobs=-1,
verbosity=0,
)
xgb_model.fit(
X_train, y_train,
eval_set=[(X_test, y_test)], # early stopping uses this
verbose=False,
)
print(f"XGBoost: best_iteration={xgb_model.best_iteration} | Test R2={r2_score(y_test, xgb_model.predict(X_test)):.4f}")
# LIGHTGBM
lgb_model = lgb.LGBMRegressor(
n_estimators=500,
learning_rate=0.05,
num_leaves=63, # key LightGBM parameter: max leaves per tree
min_child_samples=20, # minimum data in one leaf
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.1,
reg_lambda=1.0,
random_state=42,
n_jobs=-1,
verbose=-1,
)
lgb_model.fit(
X_train, y_train,
eval_set=[(X_test, y_test)],
callbacks=[lgb.early_stopping(20, verbose=False)],
)
print(f"LightGBM: best_iteration={lgb_model.best_iteration_} | Test R2={r2_score(y_test, lgb_model.predict(X_test)):.4f}")
# XGBOOST vs LGBM vs SKLEARN GB
print("\nFull comparison (5-fold CV on training data):")
from sklearn.ensemble import GradientBoostingRegressor
models = {
"Sklearn GB": GradientBoostingRegressor(n_estimators=200, learning_rate=0.05, max_depth=4, random_state=42),
"XGBoost": xgb.XGBRegressor(n_estimators=200, learning_rate=0.05, max_depth=4, verbosity=0, random_state=42),
"LightGBM": lgb.LGBMRegressor(n_estimators=200, learning_rate=0.05, num_leaves=31, verbose=-1, random_state=42),
}
for name, model in models.items():
cv_r2 = cross_val_score(model, X_train, y_train, cv=5, scoring="r2", n_jobs=-1)
print(f" {name:<15}: CV R2 = {cv_r2.mean():.4f} +/- {cv_r2.std():.4f}")
# LIGHTGBM ADVANTAGES
advantages = {
"Speed": "10-100x faster than sklearn GB on large datasets (histogram algorithm)",
"Memory": "Uses compressed histogram bins -> much lower RAM usage",
"Categoricals": "Native categorical support: no OneHotEncoding needed (set categorical_feature)",
"Missing values": "Handles NaN natively -- no imputation needed",
"Leaf-wise growth": "Grows by best leaf not by level -> deeper trees with same leaves",
"Large datasets": "Easily handles 10M+ rows with enough RAM",
}
print("\nLightGBM advantages over sklearn GB:")
for adv, desc in advantages.items():
print(f" {adv:20s}: {desc}")Tip
Tip
Practice XGBoost LightGBM Production Gradient Boosting in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
θ = θ - α × ∇L(θ). Too high α = diverge. Too low = slow.
Practice Task
Note
Practice Task — (1) Write a working example of XGBoost LightGBM Production Gradient Boosting from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with XGBoost LightGBM Production Gradient Boosting is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.