Bias-Variance Tradeoff — The Core ML Dilemma

The bias-variance tradeoff explains why some models fail. High bias (underfitting): model is too simple, misses patterns even in training data. High variance (overfitting): model memorizes training data including noise, fails on new data. The goal is the sweet spot: low bias AND low variance. Regularization, cross-validation, and more data help you find it.

20 min•By Priygop Team•Updated 2026

Visualizing Bias-Variance Tradeoff

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import train_test_split, cross_val_score

np.random.seed(42)

# TRUE underlying relationship: y = sin(x) + noise
X_full = np.linspace(0, 2 * np.pi, 100).reshape(-1, 1)
y_true = np.sin(X_full.ravel())
y_noisy = y_true + np.random.normal(0, 0.3, 100)  # add measurement noise

X_train, X_test, y_train, y_test = train_test_split(X_full, y_noisy, test_size=0.2, random_state=42)

# Test models of increasing complexity
for degree in [1, 3, 8, 15]:
    model = Pipeline([
        ("poly", PolynomialFeatures(degree=degree, include_bias=False)),
        ("reg",  LinearRegression()),
    ])
    model.fit(X_train, y_train)

    train_r2 = model.score(X_train, y_train)
    test_r2  = model.score(X_test, y_test)
    gap      = train_r2 - test_r2

    if degree == 1:
        diagnosis = "HIGH BIAS (underfitting) -- too simple"
    elif degree == 3:
        diagnosis = "GOOD FIT -- just right"
    elif degree == 8:
        diagnosis = "Starting to overfit"
    else:
        diagnosis = "HIGH VARIANCE (overfitting) -- memorizes noise"

    print(f"degree={degree:2d} | train R2={train_r2:.3f} | test R2={test_r2:.3f} | gap={gap:.3f} | {diagnosis}")

# SOLUTION: Regularization keeps complexity in check
print("\nRidge Regularization (L2) effect:")
for alpha in [0.0001, 1.0, 100.0]:
    model = Pipeline([
        ("poly", PolynomialFeatures(degree=15, include_bias=False)),
        ("reg",  Ridge(alpha=alpha)),
    ])
    scores = cross_val_score(model, X_full, y_noisy, cv=5, scoring="r2")
    print(f"  alpha={alpha:7.4f}: CV R2 = {scores.mean():.3f} +/- {scores.std():.3f}")

# KEY RULES OF THUMB:
rules = [
    "Train accuracy >> Test accuracy  ->  overfitting  ->  add regularization or more data",
    "Both train and test accuracy low  ->  underfitting  ->  use more complex model",
    "Always use cross-validation to estimate real-world performance",
    "More training data reduces variance but not bias",
    "Regularization reduces variance, may increase bias slightly",
]
print("\nBias-Variance Rules:")
for r in rules:
    print(f"  * {r}")

Tip

Practice BiasVariance Tradeoff The Core ML Dilemma in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Underfitting = too simple. Overfitting = memorized training data. Balance with cross-validation.

Practice Task

Note

Practice Task — (1) Write a working example of BiasVariance Tradeoff The Core ML Dilemma from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with BiasVariance Tradeoff The Core ML Dilemma is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ml code.

Topics in This Module

Bias-Variance Tradeoff — The Core ML Dilemma

20 min•By Priygop Team•Updated 2026

Visualizing Bias-Variance Tradeoff

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import train_test_split, cross_val_score

np.random.seed(42)

# TRUE underlying relationship: y = sin(x) + noise
X_full = np.linspace(0, 2 * np.pi, 100).reshape(-1, 1)
y_true = np.sin(X_full.ravel())
y_noisy = y_true + np.random.normal(0, 0.3, 100)  # add measurement noise

X_train, X_test, y_train, y_test = train_test_split(X_full, y_noisy, test_size=0.2, random_state=42)

# Test models of increasing complexity
for degree in [1, 3, 8, 15]:
    model = Pipeline([
        ("poly", PolynomialFeatures(degree=degree, include_bias=False)),
        ("reg",  LinearRegression()),
    ])
    model.fit(X_train, y_train)

    train_r2 = model.score(X_train, y_train)
    test_r2  = model.score(X_test, y_test)
    gap      = train_r2 - test_r2

    if degree == 1:
        diagnosis = "HIGH BIAS (underfitting) -- too simple"
    elif degree == 3:
        diagnosis = "GOOD FIT -- just right"
    elif degree == 8:
        diagnosis = "Starting to overfit"
    else:
        diagnosis = "HIGH VARIANCE (overfitting) -- memorizes noise"

    print(f"degree={degree:2d} | train R2={train_r2:.3f} | test R2={test_r2:.3f} | gap={gap:.3f} | {diagnosis}")

# SOLUTION: Regularization keeps complexity in check
print("\nRidge Regularization (L2) effect:")
for alpha in [0.0001, 1.0, 100.0]:
    model = Pipeline([
        ("poly", PolynomialFeatures(degree=15, include_bias=False)),
        ("reg",  Ridge(alpha=alpha)),
    ])
    scores = cross_val_score(model, X_full, y_noisy, cv=5, scoring="r2")
    print(f"  alpha={alpha:7.4f}: CV R2 = {scores.mean():.3f} +/- {scores.std():.3f}")

# KEY RULES OF THUMB:
rules = [
    "Train accuracy >> Test accuracy  ->  overfitting  ->  add regularization or more data",
    "Both train and test accuracy low  ->  underfitting  ->  use more complex model",
    "Always use cross-validation to estimate real-world performance",
    "More training data reduces variance but not bias",
    "Regularization reduces variance, may increase bias slightly",
]
print("\nBias-Variance Rules:")
for r in rules:
    print(f"  * {r}")

Tip

Diagram

Loading diagram…

Underfitting = too simple. Overfitting = memorized training data. Balance with cross-validation.

Topics in This Module