Hyperparameter Tuning for Ensembles
Gradient boosting models have many hyperparameters that interact. Systematic tuning is essential. Grid search is exhaustive but slow. Randomized search is faster with good coverage. Optuna (Bayesian optimization) with Tree Parzen Estimators finds optimal parameters ~10x faster than random search by learning from past trials which regions of the parameter space are promising.
Optuna Bayesian Optimization for XGBoost
import numpy as np
import optuna
import xgboost as xgb
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import r2_score
optuna.logging.set_verbosity(optuna.logging.WARNING)
housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
housing.data, housing.target, test_size=0.2, random_state=42
)
# OPTUNA: define an objective function that Optuna minimizes
def objective(trial: optuna.Trial) -> float:
params = {
"n_estimators": trial.suggest_int("n_estimators", 100, 600),
"learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3, log=True),
"max_depth": trial.suggest_int("max_depth", 3, 9),
"subsample": trial.suggest_float("subsample", 0.6, 1.0),
"colsample_bytree":trial.suggest_float("colsample_bytree", 0.5, 1.0),
"reg_alpha": trial.suggest_float("reg_alpha", 1e-8, 10.0, log=True),
"reg_lambda": trial.suggest_float("reg_lambda", 1e-8, 10.0, log=True),
"min_child_weight":trial.suggest_int("min_child_weight", 1, 10),
"verbosity": 0,
"n_jobs": -1,
"random_state": 42,
}
model = xgb.XGBRegressor(**params)
cv_scores = cross_val_score(model, X_train, y_train, cv=3, scoring="r2", n_jobs=-1)
return cv_scores.mean() # Optuna maximizes this (by minimizing -objective)
# RUN OPTUNA STUDY
study = optuna.create_study(direction="maximize", sampler=optuna.samplers.TPESampler(seed=42))
study.optimize(objective, n_trials=30, show_progress_bar=True)
print(f"\nBest trial: {study.best_trial.number}")
print(f"Best CV R2: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")
# TRAIN FINAL MODEL WITH BEST PARAMS
best_model = xgb.XGBRegressor(**study.best_params, verbosity=0, random_state=42)
best_model.fit(X_train, y_train)
test_r2 = r2_score(y_test, best_model.predict(X_test))
print(f"\nTest R2 with Optuna-tuned params: {test_r2:.4f}")
# COMPARE TUNING METHODS
print("\nTuning strategy comparison:")
strategies = {
"Default params": "CV R2 ~ 0.82 -- 0 tuning cost",
"GridSearchCV": "CV R2 ~ 0.84 -- exhaustive, very slow for large grids",
"RandomizedSearchCV": "CV R2 ~ 0.83 -- fast, good coverage",
"Optuna (30 trials)": f"CV R2 ~ {study.best_value:.2f} -- smart, finds best faster",
"Optuna (100 trials)": "CV R2 ~ 0.86 -- better with more budget",
}
for strategy, result in strategies.items():
print(f" {strategy:25s}: {result}")Tip
Tip
Practice Hyperparameter Tuning for Ensembles in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Optuna = best tool.
Practice Task
Note
Practice Task — (1) Write a working example of Hyperparameter Tuning for Ensembles from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Hyperparameter Tuning for Ensembles is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.