Scikit-learn API — The Universal Interface
Scikit-learn's genius is its consistent API. Every estimator (model, transformer, scaler) follows the same pattern: `.fit()` learns from training data, `.predict()` makes predictions, `.transform()` converts data, `.fit_transform()` combines both. This uniformity means you can swap any model in a pipeline without changing any other code.
Scikit-learn Estimator API
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# UNIVERSAL SCIKIT-LEARN PATTERN:
# 1. scaler.fit(X_train) -- learn mean/std from training data ONLY
# 2. scaler.transform(X_train) -- apply scaling to training data
# 3. scaler.transform(X_test) -- apply SAME scaling to test data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # fit + transform in one step
X_test_scaled = scaler.transform(X_test) # transform only (DO NOT fit on test!)
# SWAP ANY MODEL -- same 3-line pattern
models = {
"Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
"Decision Tree": DecisionTreeClassifier(max_depth=5, random_state=42),
"Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
"SVM (RBF kernel)": SVC(kernel="rbf", C=1.0, random_state=42),
}
print(f"{'Model':<25} {'Train Acc':>10} {'Test Acc':>10}")
print("-" * 50)
for name, model in models.items():
model.fit(X_train_scaled, y_train) # Step 1: learn
train_acc = accuracy_score(y_train, model.predict(X_train_scaled))
test_acc = accuracy_score(y_test, model.predict(X_test_scaled)) # Step 2: predict
print(f"{name:<25} {train_acc:>10.3f} {test_acc:>10.3f}")
# CRITICAL: Always fit scaler ONLY on training data
# Common mistake: scaler.fit_transform(X) on whole dataset
# This causes DATA LEAKAGE -- test data info leaks into training
print("\nDATA LEAKAGE WARNING:")
print(" WRONG: scaler.fit_transform(X_all) <- test data contaminates scaling")
print(" RIGHT: fit on X_train, transform X_train and X_test separately")
print(" BEST: Use sklearn Pipeline -- handles this automatically")Tip
Tip
Practice Scikitlearn API The Universal Interface in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Neural networks learn by adjusting connection weights via backpropagation
Practice Task
Note
Practice Task — (1) Write a working example of Scikitlearn API The Universal Interface from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Scikitlearn API The Universal Interface is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.