Model Selection Framework
Systematic model selection prevents you from accidentally optimizing for a lucky split. The correct protocol: use cross-validation on training data to compare models, select the best, tune it further, then evaluate ONCE on the held-out test set. Never use test set performance to select models — that introduces optimism bias and you'll be disappointed in production.
Statistical Model Comparison and Selection
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import (RandomForestClassifier, GradientBoostingClassifier,
AdaBoostClassifier)
from sklearn.svm import SVC
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
# STEP 1: SYSTEMATIC COMPARISON WITH FIXED CV STRATEGY
cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
candidates = {
"LogReg": Pipeline([("sc", StandardScaler()), ("m", LogisticRegression(C=1.0, max_iter=1000, random_state=42))]),
"RF-100": Pipeline([("m", RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1))]),
"RF-200": Pipeline([("m", RandomForestClassifier(n_estimators=200, random_state=42, n_jobs=-1))]),
"GB": Pipeline([("m", GradientBoostingClassifier(n_estimators=200, learning_rate=0.05, random_state=42))]),
"SVM-RBF": Pipeline([("sc", StandardScaler()), ("m", SVC(kernel="rbf", C=1.0, probability=True, random_state=42))]),
"Ada": Pipeline([("m", AdaBoostClassifier(n_estimators=100, random_state=42))]),
}
results = {}
print("Model comparison (10-fold stratified CV, AUC-ROC):")
for name, model in candidates.items():
scores = cross_val_score(model, X, y, cv=cv, scoring="roc_auc")
results[name] = scores
print(f" {name:10s}: {scores.mean():.4f} +/- {scores.std():.4f} 95% CI=[{scores.mean()-2*scores.std():.4f}, {scores.mean()+2*scores.std():.4f}]")
# STEP 2: STATISTICAL TEST -- are differences meaningful?
from scipy.stats import ttest_rel
best_name = max(results, key=lambda k: results[k].mean())
print(f"\nBest model: {best_name} ({results[best_name].mean():.4f})")
print("\nPaired t-test vs best model:")
for name, scores in results.items():
if name == best_name:
continue
t_stat, p_val = ttest_rel(results[best_name], scores)
significant = "SIGNIFICANTLY BETTER" if p_val < 0.05 else "no significant difference"
print(f" {best_name} vs {name:10s}: p={p_val:.4f} -> {significant}")
# STEP 3: FINAL TEST SET EVALUATION (DO THIS ONLY ONCE)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=99)
# Train best model on ALL training data
best_model = candidates[best_name]
best_model.fit(X_train, y_train)
y_prob = best_model.predict_proba(X_test)[:, 1]
from sklearn.metrics import roc_auc_score, classification_report
test_auc = roc_auc_score(y_test, y_prob)
print(f"\nFINAL Test AUC (evaluated ONCE): {test_auc:.4f}")
print(" NEVER go back and change model based on this result!")
print(" If you do, your test score is no longer an honest estimate.")Tip
Tip
Practice Model Selection Framework in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of Model Selection Framework from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Model Selection Framework is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.