Ridge (L2) and Lasso (L1) Regularization
Regularization adds a penalty to the loss function that shrinks model coefficients — preventing overfitting when you have many features or correlated features. Ridge (L2) shrinks all coefficients toward zero but rarely makes them exactly zero. Lasso (L1) performs automatic feature selection — it drives unimportant coefficients to exactly zero. ElasticNet combines both. The strength is controlled by alpha (higher alpha = more regularization).
Ridge, Lasso, and ElasticNet Comparison
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, RidgeCV, LassoCV
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target
# Add polynomial features to create a regularization-worthy problem
poly = PolynomialFeatures(degree=2, include_bias=False, interaction_only=False)
X_poly = poly.fit_transform(X)
print(f"Features after polynomial expansion: {X.shape[1]} -> {X_poly.shape[1]}")
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_poly)
# COMPARE REGULARIZATION METHODS
models = {
"No regularization": LinearRegression(),
"Ridge (alpha=0.1)": Ridge(alpha=0.1),
"Ridge (alpha=10)": Ridge(alpha=10),
"Lasso (alpha=0.01)":Lasso(alpha=0.01, max_iter=5000),
"Lasso (alpha=0.1)": Lasso(alpha=0.1, max_iter=5000),
"ElasticNet": ElasticNet(alpha=0.1, l1_ratio=0.5, max_iter=5000),
}
print(f"\n{'Model':<25} {'CV R2 (5-fold)':>15} {'Non-zero coefs':>15} {'Max |coef|':>12}")
print("-" * 70)
for name, model in models.items():
scores = cross_val_score(model, X_scaled, y, cv=5, scoring="r2")
model.fit(X_scaled, y)
coefs = model.coef_ if hasattr(model, 'coef_') else np.zeros(X_scaled.shape[1])
nonzero = (np.abs(coefs) > 1e-6).sum()
max_coef = np.abs(coefs).max()
print(f"{name:<25} {scores.mean():>12.4f} +/-{scores.std():.3f} {nonzero:>10d} {max_coef:>12.4f}")
# RIDGE: FIND OPTIMAL ALPHA via cross-validation
ridge_cv = RidgeCV(alphas=np.logspace(-3, 4, 50), cv=5, scoring="r2")
ridge_cv.fit(X_scaled, y)
print(f"\nRidgeCV optimal alpha: {ridge_cv.alpha_:.4f}")
print(f"RidgeCV best R2: {ridge_cv.best_score_:.4f}")
# LASSO: FIND OPTIMAL ALPHA
lasso_cv = LassoCV(cv=5, max_iter=5000, n_alphas=50, random_state=42)
lasso_cv.fit(X_scaled, y)
print(f"\nLassoCV optimal alpha: {lasso_cv.alpha_:.6f}")
zero_coefs = (np.abs(lasso_cv.coef_) < 1e-6).sum()
print(f"Features eliminated by Lasso: {zero_coefs} / {X_scaled.shape[1]}")
# VISUALIZE COEFFICIENT SHRINKAGE (regularization path)
alphas = np.logspace(-3, 3, 50)
ridge_coefs = [Ridge(alpha=a).fit(X_scaled, y).coef_[:8] for a in alphas]
ridge_coefs = np.array(ridge_coefs)
plt.figure(figsize=(10, 5))
for i in range(8):
plt.semilogx(alphas, ridge_coefs[:, i], linewidth=1.5, label=housing.feature_names[i] if i < len(housing.feature_names) else f"poly_{i}")
plt.xlabel("alpha (regularization strength)")
plt.ylabel("Coefficient value")
plt.title("Ridge Regularization Path -- coefficients shrink as alpha increases")
plt.legend(fontsize=8, loc="upper right")
plt.axhline(0, color="black", linewidth=0.5)
plt.tight_layout()
plt.savefig("ridge_path.png", dpi=100, bbox_inches="tight")
plt.show()Tip
Tip
Practice Ridge L2 and Lasso L1 Regularization in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
L1 feature selection. L2 shrinkage.
Practice Task
Note
Practice Task — (1) Write a working example of Ridge L2 and Lasso L1 Regularization from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Ridge L2 and Lasso L1 Regularization is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.