Feature Scaling — Why and When
Feature scaling aligns the magnitude of different features so no single feature dominates due to its numeric scale. Without scaling, age (18-80) and income (15,000-200,000) give income 2,500x more influence in distance-based models (KNN, SVM) and gradient descent. Tree models (Random Forest, XGBoost) are immune to scaling — they split on individual feature values regardless of scale.
StandardScaler, MinMaxScaler, RobustScaler
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler, MaxAbsScaler
# SAMPLE DATA with very different scales
np.random.seed(42)
df = pd.DataFrame({
"age": np.random.normal(40, 10, 200).clip(18, 80),
"income": np.random.normal(60000, 20000, 200).clip(15000, 200000),
"credit_score": np.random.normal(680, 80, 200).clip(300, 850),
})
# Inject some outliers in income
df.loc[5:7, "income"] = [500000, 800000, 1200000]
print("Before scaling:")
print(df.describe().round(1))
# 1. StandardScaler: z = (x - mean) / std
# Result: mean=0, std=1
# BEST FOR: linear models, SVM, neural networks, PCA
# PROBLEM: sensitive to outliers (outliers affect mean/std)
std_scaler = StandardScaler()
df_std = pd.DataFrame(std_scaler.fit_transform(df), columns=df.columns)
# 2. MinMaxScaler: z = (x - min) / (max - min)
# Result: values between 0 and 1
# BEST FOR: image pixel values (never negative), neural network inputs when bounded range needed
# PROBLEM: very sensitive to outliers (min/max shift drastically)
mm_scaler = MinMaxScaler()
df_mm = pd.DataFrame(mm_scaler.fit_transform(df), columns=df.columns)
# 3. RobustScaler: z = (x - median) / IQR
# Result: centered around 0, IQR-normalized
# BEST FOR: data with outliers (financial data, transaction amounts)
# WHY: median and IQR are not affected by extreme values
rob_scaler = RobustScaler()
df_rob = pd.DataFrame(rob_scaler.fit_transform(df), columns=df.columns)
print("\nAfter scaling -- income column comparison:")
print(f"{'Scaler':<20} {'Mean':>8} {'Std':>8} {'Min':>10} {'Max':>10}")
print("-" * 60)
for name, scaled_df in [("Original", df), ("StandardScaler", df_std), ("MinMaxScaler", df_mm), ("RobustScaler", df_rob)]:
col = scaled_df["income"]
print(f"{name:<20} {col.mean():>8.3f} {col.std():>8.3f} {col.min():>10.3f} {col.max():>10.3f}")
# WHICH SCALER TO USE?
scaler_guide = {
"StandardScaler": "Default choice -- linear models, SVM, PCA, logistic regression",
"RobustScaler": "When outliers present -- financial data, medical measurements",
"MinMaxScaler": "When need [0,1] range -- image data, neural network specific needs",
"No scaling": "Tree models: RandomForest, XGBoost, LightGBM, DecisionTree",
}
print("\nScaler selection guide:")
for scaler, use_case in scaler_guide.items():
print(f" {scaler:17s}: {use_case}")Tip
Tip
Practice Feature Scaling Why and When in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Feature engineering = 80% of ML success.
Practice Task
Note
Practice Task — (1) Write a working example of Feature Scaling Why and When from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Feature Scaling Why and When is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.