Diffusion Models — How They Work

Diffusion models generate images by learning to REVERSE a noise-addition process. During training, we gradually add Gaussian noise until the image is pure noise. The model learns to predict how to reverse each noise step. During generation, we start from pure noise and iteratively denoise — 1000 denoising steps for high quality.

30 min•By Priygop Team•Updated 2026

Diffusion Model — Forward and Reverse Process

import torch
import torch.nn as nn
import numpy as np

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# THE CORE IDEA
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Forward process q(x_t | x_{t-1}): ADD noise step by step
# x_0 (real image) -> x_1 (slight noise) -> ... -> x_T (pure noise)

# Reverse process p_theta(x_{t-1} | x_t): REMOVE noise
# x_T (pure noise) -> ... -> x_1 -> x_0 (generated image)

# KEY INSIGHT: The model only needs to predict the NOISE that was added
# Then: x_{t-1} = x_t - predicted_noise  (simplified)

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# DDPM NOISE SCHEDULE
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
class DDPMScheduler:
    def __init__(self, T: int = 1000, beta_start: float = 1e-4, beta_end: float = 0.02):
        self.T = T
        # Linear noise schedule: more noise added as t increases
        self.betas = torch.linspace(beta_start, beta_end, T)
        self.alphas = 1 - self.betas
        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)  # alpha_bar_t
        self.sqrt_alphas_cumprod = torch.sqrt(self.alphas_cumprod)
        self.sqrt_one_minus_alphas_cumprod = torch.sqrt(1 - self.alphas_cumprod)

    def add_noise(self, x0: torch.Tensor, t: torch.Tensor) -> tuple:
        '''
        Forward process: x_0 -> x_t
        x_t = sqrt(alpha_bar_t) * x_0 + sqrt(1 - alpha_bar_t) * epsilon
        This is the "reparameterization trick" -- sample x_t in ONE step!
        '''
        noise = torch.randn_like(x0)
        sqrt_ab = self.sqrt_alphas_cumprod[t].view(-1, 1, 1, 1)
        sqrt_1_ab = self.sqrt_one_minus_alphas_cumprod[t].view(-1, 1, 1, 1)
        return sqrt_ab * x0 + sqrt_1_ab * noise, noise  # noisy image + actual noise

    def step(self, model_output: torch.Tensor, t: int, x_t: torch.Tensor) -> torch.Tensor:
        '''
        Reverse process: x_t -> x_{t-1}
        Given the model's predicted noise, compute the denoised x_{t-1}
        '''
        beta_t = self.betas[t]
        alpha_t = self.alphas[t]
        alpha_bar_t = self.alphas_cumprod[t]

        # Predicted x_0 from model output (noise prediction)
        x0_pred = (x_t - torch.sqrt(1 - alpha_bar_t) * model_output) / torch.sqrt(alpha_bar_t)
        x0_pred = x0_pred.clamp(-1, 1)

        # Posterior mean
        mean = (torch.sqrt(alpha_bar_t) * beta_t / (1 - alpha_bar_t)) * x0_pred +                (torch.sqrt(alpha_t) * (1 - alpha_bar_t / alpha_t) / (1 - alpha_bar_t)) * x_t

        # Add noise for non-final steps
        if t > 0:
            noise = torch.randn_like(x_t)
            mean = mean + torch.sqrt(beta_t) * noise
        return mean


# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# TRAINING LOOP (simplified)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
def training_step(model, scheduler, x0: torch.Tensor, optimizer):
    '''One training step for a diffusion model.'''
    B = x0.shape[0]
    t = torch.randint(0, scheduler.T, (B,))  # random timestep for each image

    x_t, noise = scheduler.add_noise(x0, t)  # add noise at timestep t
    predicted_noise = model(x_t, t)           # predict the noise that was added
    loss = torch.nn.functional.mse_loss(predicted_noise, noise)  # MSE between predicted and actual noise

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    return loss.item()

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# INFERENCE -- generate new images from pure noise
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
def generate(model, scheduler, shape: tuple, device: str) -> torch.Tensor:
    '''Generate an image by iteratively denoising from Gaussian noise.'''
    x = torch.randn(shape, device=device)  # start from pure noise

    for t in reversed(range(scheduler.T)):  # T=1000 -> 0
        t_batch = torch.full((shape[0],), t, device=device)
        with torch.no_grad():
            predicted_noise = model(x, t_batch)
        x = scheduler.step(predicted_noise, t, x)

    return x.clamp(-1, 1)  # [-1, 1] normalized image

# DDIM: faster sampling -- 50 steps instead of 1000, same quality
# Skip timesteps: t = [0, 20, 40, ..., 1000] -> 50 steps
# Used in all production diffusion models

Tip

Practice Diffusion Models How They Work in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of Diffusion Models How They Work from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Diffusion Models How They Work is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module