Data Augmentation — Getting More from Less Data

Data augmentation artificially increases training set diversity without collecting new data. It's the difference between 92% and 97% accuracy. The principle: if flipping, rotating, or cropping an image doesn't change its label, augment with these transformations to make the model invariant to them.

15 min•By Priygop Team•Updated 2026

Augmentation Strategies

import torch
import torchvision.transforms as T
import torchvision.transforms.v2 as T2
from PIL import Image

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# STANDARD AUGMENTATION (torchvision.transforms)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

train_transform = T.Compose([
    T.Resize((256, 256)),
    T.RandomCrop(224),                  # random crop → position invariance
    T.RandomHorizontalFlip(p=0.5),     # 50% chance of horizontal mirror
    T.RandomVerticalFlip(p=0.1),        # rarely vertical (not for landscape)
    T.RandomRotation(degrees=15),       # ±15 degree rotation
    T.ColorJitter(                      # random color changes
        brightness=0.3,
        contrast=0.3,
        saturation=0.3,
        hue=0.1,
    ),
    T.RandomGrayscale(p=0.05),          # occasionally convert to grayscale
    T.GaussianBlur(kernel_size=3, sigma=(0.1, 2.0)),  # random blur
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# ADVANCED AUGMENTATION
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# RandAugment — automatically finds best augmentation policies
# Used in EfficientNet training, achieves SOTA on ImageNet
rand_augment = T2.Compose([
    T2.RandomResizedCrop(224, scale=(0.08, 1.0)),
    T2.RandAugment(num_ops=2, magnitude=9),  # 2 random ops from 14 augmentations
    T2.ToTensor(),
    T2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# CutMix and MixUp — blend two images and their labels
# Dramatically improves generalization, used to train ImageNet SOTA models

class CutMix:
    """Cuts a random patch from image B and pastes into image A."""
    def __init__(self, alpha: float = 1.0):
        self.alpha = alpha

    def __call__(self, batch: tuple) -> tuple:
        imgs, labels = batch
        B = imgs.size(0)
        import numpy as np
        lam = np.random.beta(self.alpha, self.alpha)
        rand_idx = torch.randperm(B)

        # Cut box dimensions
        W, H = imgs.shape[-1], imgs.shape[-2]
        cut_w = int(W * (1 - lam) ** 0.5)
        cut_h = int(H * (1 - lam) ** 0.5)
        cx, cy = torch.randint(0, W, (1,)).item(), torch.randint(0, H, (1,)).item()
        x1, x2 = max(0, cx - cut_w//2), min(W, cx + cut_w//2)
        y1, y2 = max(0, cy - cut_h//2), min(H, cy + cut_h//2)

        imgs[:, :, y1:y2, x1:x2] = imgs[rand_idx, :, y1:y2, x1:x2]
        lam_actual = 1 - (x2-x1)*(y2-y1) / (W*H)

        # Mixed labels for the cut region
        return imgs, (labels, labels[rand_idx], lam_actual)

# Use in training:
# criterion_mixup = lambda pred, (y1, y2, lam): lam * loss(pred, y1) + (1-lam) * loss(pred, y2)

# Augmentation RULES:
# ✅ Always: Random crop, horizontal flip, normalize
# ✅ Usually: Color jitter, rotation, scale
# ❌ Avoid for digits/text: vertical flip, rotation > 20 degrees
# ❌ Never augment VALIDATION data (only test-time augmentations optionally)

Tip

Practice Data Augmentation Getting More from Less Data in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of Data Augmentation Getting More from Less Data from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Data Augmentation Getting More from Less Data is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module