Diffusion Models — How They Work
Diffusion models generate images by learning to REVERSE a noise-addition process. During training, we gradually add Gaussian noise until the image is pure noise. The model learns to predict how to reverse each noise step. During generation, we start from pure noise and iteratively denoise — 1000 denoising steps for high quality.
Diffusion Model — Forward and Reverse Process
import torch
import torch.nn as nn
import numpy as np
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# THE CORE IDEA
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Forward process q(x_t | x_{t-1}): ADD noise step by step
# x_0 (real image) -> x_1 (slight noise) -> ... -> x_T (pure noise)
# Reverse process p_theta(x_{t-1} | x_t): REMOVE noise
# x_T (pure noise) -> ... -> x_1 -> x_0 (generated image)
# KEY INSIGHT: The model only needs to predict the NOISE that was added
# Then: x_{t-1} = x_t - predicted_noise (simplified)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# DDPM NOISE SCHEDULE
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
class DDPMScheduler:
def __init__(self, T: int = 1000, beta_start: float = 1e-4, beta_end: float = 0.02):
self.T = T
# Linear noise schedule: more noise added as t increases
self.betas = torch.linspace(beta_start, beta_end, T)
self.alphas = 1 - self.betas
self.alphas_cumprod = torch.cumprod(self.alphas, dim=0) # alpha_bar_t
self.sqrt_alphas_cumprod = torch.sqrt(self.alphas_cumprod)
self.sqrt_one_minus_alphas_cumprod = torch.sqrt(1 - self.alphas_cumprod)
def add_noise(self, x0: torch.Tensor, t: torch.Tensor) -> tuple:
'''
Forward process: x_0 -> x_t
x_t = sqrt(alpha_bar_t) * x_0 + sqrt(1 - alpha_bar_t) * epsilon
This is the "reparameterization trick" -- sample x_t in ONE step!
'''
noise = torch.randn_like(x0)
sqrt_ab = self.sqrt_alphas_cumprod[t].view(-1, 1, 1, 1)
sqrt_1_ab = self.sqrt_one_minus_alphas_cumprod[t].view(-1, 1, 1, 1)
return sqrt_ab * x0 + sqrt_1_ab * noise, noise # noisy image + actual noise
def step(self, model_output: torch.Tensor, t: int, x_t: torch.Tensor) -> torch.Tensor:
'''
Reverse process: x_t -> x_{t-1}
Given the model's predicted noise, compute the denoised x_{t-1}
'''
beta_t = self.betas[t]
alpha_t = self.alphas[t]
alpha_bar_t = self.alphas_cumprod[t]
# Predicted x_0 from model output (noise prediction)
x0_pred = (x_t - torch.sqrt(1 - alpha_bar_t) * model_output) / torch.sqrt(alpha_bar_t)
x0_pred = x0_pred.clamp(-1, 1)
# Posterior mean
mean = (torch.sqrt(alpha_bar_t) * beta_t / (1 - alpha_bar_t)) * x0_pred + (torch.sqrt(alpha_t) * (1 - alpha_bar_t / alpha_t) / (1 - alpha_bar_t)) * x_t
# Add noise for non-final steps
if t > 0:
noise = torch.randn_like(x_t)
mean = mean + torch.sqrt(beta_t) * noise
return mean
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# TRAINING LOOP (simplified)
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
def training_step(model, scheduler, x0: torch.Tensor, optimizer):
'''One training step for a diffusion model.'''
B = x0.shape[0]
t = torch.randint(0, scheduler.T, (B,)) # random timestep for each image
x_t, noise = scheduler.add_noise(x0, t) # add noise at timestep t
predicted_noise = model(x_t, t) # predict the noise that was added
loss = torch.nn.functional.mse_loss(predicted_noise, noise) # MSE between predicted and actual noise
optimizer.zero_grad()
loss.backward()
optimizer.step()
return loss.item()
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# INFERENCE -- generate new images from pure noise
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
def generate(model, scheduler, shape: tuple, device: str) -> torch.Tensor:
'''Generate an image by iteratively denoising from Gaussian noise.'''
x = torch.randn(shape, device=device) # start from pure noise
for t in reversed(range(scheduler.T)): # T=1000 -> 0
t_batch = torch.full((shape[0],), t, device=device)
with torch.no_grad():
predicted_noise = model(x, t_batch)
x = scheduler.step(predicted_noise, t, x)
return x.clamp(-1, 1) # [-1, 1] normalized image
# DDIM: faster sampling -- 50 steps instead of 1000, same quality
# Skip timesteps: t = [0, 20, 40, ..., 1000] -> 50 steps
# Used in all production diffusion modelsTip
Tip
Practice Diffusion Models How They Work in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of Diffusion Models How They Work from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Diffusion Models How They Work is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.