Loss Functions — Measuring Model Error
The loss function quantifies how wrong the model is. Gradient descent minimizes this number. Choosing the wrong loss function is one of the most common beginner mistakes — using MSE for classification gives poor training signals.
Key Loss Functions & When to Use Them
import torch
import torch.nn as nn
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# LOSS FUNCTIONS — match to your task type
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# ── 1. Cross-Entropy Loss — Multi-class Classification ──
# Use this when: predicting one of K classes (MNIST, ImageNet, sentiment)
# Expects: raw logits (NOT softmax output) — PyTorch handles softmax internally
ce_loss = nn.CrossEntropyLoss()
logits = torch.tensor([[2.0, 1.0, 0.1], # model predicts class 0 strongly
[0.5, 2.5, 0.3]]) # model predicts class 1 strongly
targets = torch.tensor([0, 1]) # true class labels
loss = ce_loss(logits, targets)
print(f"CrossEntropyLoss: {loss.item():.4f}") # ~0.4 — low because model is correct
# Formula intuition: CE = -log(P(correct class))
# If model assigns P=0.99 to correct class → loss = -log(0.99) ≈ 0.01 (great)
# If model assigns P=0.01 to correct class → loss = -log(0.01) ≈ 4.6 (terrible)
# ── 2. Binary Cross Entropy — Binary Classification ──
# Use when: spam/not spam, fraud/not fraud, positive/negative
bce_loss = nn.BCEWithLogitsLoss() # more numerically stable than BCE + sigmoid separately
logits_binary = torch.tensor([2.0, -1.0, 0.5]) # raw scores
labels_binary = torch.tensor([1.0, 0.0, 1.0]) # 0 or 1
loss_bce = bce_loss(logits_binary, labels_binary)
print(f"BCEWithLogitsLoss: {loss_bce.item():.4f}")
# ── 3. MSE Loss — Regression ──
# Use when: predicting house price, temperature, stock price, image pixels
mse_loss = nn.MSELoss()
predictions = torch.tensor([3.2, 5.8, 2.1])
actuals = torch.tensor([3.0, 6.0, 2.0])
loss_mse = mse_loss(predictions, actuals)
print(f"MSELoss: {loss_mse.item():.4f}")
# ── 4. Huber Loss (Smooth L1) — Robust Regression ──
# MSE is sensitive to outliers (outlier error is squared → huge gradient)
# Huber: behaves like MSE near 0, like L1 far from 0 → ignores outliers
huber = nn.SmoothL1Loss()
preds_with_outlier = torch.tensor([3.2, 100.0, 2.1]) # 100.0 is an outlier
loss_huber = huber(preds_with_outlier, actuals)
print(f"HuberLoss: {loss_huber.item():.4f}")
# ── 5. Contrastive / Triplet Loss — Embeddings, Similarity ──
# Use when: face recognition, sentence similarity, image retrieval
# Pushes similar pairs together, dissimilar pairs apart in embedding space
# Not in nn directly but widely used with HuggingFace models
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CHOOSING THE RIGHT LOSS — Decision Table
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
print("\nLoss Selection Guide:")
loss_guide = {
"Multi-class classification (K>2)": "nn.CrossEntropyLoss() — expects raw logits",
"Binary classification": "nn.BCEWithLogitsLoss() — expects raw logits",
"Regression": "nn.MSELoss() or nn.SmoothL1Loss()",
"Language modeling (next token)": "nn.CrossEntropyLoss() over vocab size",
"Object detection box offset": "nn.SmoothL1Loss() for box coordinates",
}
for task, loss_fn in loss_guide.items():
print(f" {task:40s} → {loss_fn}")Tip
Tip
Practice Loss Functions Measuring Model Error in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of Loss Functions Measuring Model Error from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Loss Functions Measuring Model Error is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.