RandomizedSearchCV — Smarter Search for Large Spaces
When the parameter space is large (10+ hyperparameters, continuous ranges), GridSearchCV is impractically slow. RandomizedSearchCV samples n_iter random combinations — achieving comparable or better results in a fraction of the time. Using scipy distributions (loguniform, randint) instead of fixed lists allows sampling continuous ranges, which is essential for learning_rate and regularization strength.
RandomizedSearchCV with scipy Distributions
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV, train_test_split, cross_val_score
from sklearn.datasets import load_breast_cancer
from scipy.stats import loguniform, randint, uniform
import time
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
pipe = Pipeline([("scaler", StandardScaler()), ("model", GradientBoostingClassifier(random_state=42))])
# RANDOM SEARCH with DISTRIBUTIONS (samples from continuous ranges)
param_distributions = {
"model__n_estimators": randint(50, 500), # integers from 50 to 500
"model__learning_rate": loguniform(0.005, 0.5), # 0.005 to 0.5 on log scale (better for lr)
"model__max_depth": randint(2, 8),
"model__subsample": uniform(0.6, 0.4), # 0.6 to 1.0 uniform
"model__min_samples_leaf":randint(1, 20),
"model__max_features": uniform(0.4, 0.6), # 0.4 to 1.0
}
# HOW MANY WOULD GRID SEARCH TAKE? With 5 values each on 6 params: 5^6 = 15,625 fits!
# RandomizedSearch: 50 trials x 5-fold = 250 fits
t0 = time.time()
rand_search = RandomizedSearchCV(
pipe, param_distributions,
n_iter=50, # number of random combinations to try
cv=5,
scoring="roc_auc",
n_jobs=-1,
random_state=42,
refit=True,
verbose=0,
)
rand_search.fit(X_train, y_train)
t_rand = time.time() - t0
print(f"RandomizedSearchCV (50 iter): {t_rand:.1f}s | Best AUC: {rand_search.best_score_:.4f}")
print(f"Best params: {rand_search.best_params_}")
# COMPARE: SMALL GRID SEARCH on same budget
param_grid_small = {
"model__n_estimators": [100, 200, 300],
"model__learning_rate": [0.01, 0.05, 0.1, 0.2],
"model__max_depth": [3, 4, 5],
# 3*4*3 = 36 combos, close to 50 random
}
t0 = time.time()
grid = GridSearchCV(pipe, param_grid_small, cv=5, scoring="roc_auc", n_jobs=-1)
grid.fit(X_train, y_train)
t_grid = time.time() - t0
print(f"\nGridSearchCV (36 combos): {t_grid:.1f}s | Best AUC: {grid.best_score_:.4f}")
# TEST SET EVALUATION
for name, search in [("RandomizedSearch", rand_search), ("GridSearch", grid)]:
test_auc = cross_val_score(search.best_estimator_, X_test, y_test, cv=None, scoring="roc_auc")
print(f"{name} test AUC: {search.best_estimator_.score(X_test, y_test):.4f}")
# TIPS FOR RANDOMIZED SEARCH
tips = {
"n_iter": "Start with 50-100. Increase if time allows. Diminishing returns after 200.",
"loguniform": "Always use for learning_rate, C, alpha, gamma -- they span orders of magnitude",
"randint": "For integers: n_estimators, max_depth, min_samples_leaf",
"uniform": "For bounded floats: subsample (0.6-1.0), colsample (0.5-1.0)",
"random_state": "Fix for reproducibility. Different seeds give slightly different best params.",
"refit": "Set True to get best_estimator_ retrained on full training data",
}
print("\nRandomizedSearchCV tips:")
for param, tip in tips.items():
print(f" {param:16s}: {tip}")Tip
Tip
Practice RandomizedSearchCV Smarter Search for Large Spaces in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Halve the search space each step. Must be sorted.
Practice Task
Note
Practice Task — (1) Write a working example of RandomizedSearchCV Smarter Search for Large Spaces from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with RandomizedSearchCV Smarter Search for Large Spaces is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.