Fine-Tuning Strategies — Full, LoRA, and QLoRA

Fine-tuning makes a general LLM specialized for your task. Full fine-tuning updates all parameters (very expensive — 80GB+ VRAM for 8B model). LoRA trains tiny adapter matrices injected into existing layers (8x more efficient). QLoRA combines 4-bit quantization with LoRA to fine-tune 8B models on a single 8GB GPU.

25 min•By Priygop Team•Updated 2026

LoRA and QLoRA Setup

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training

# WHY LoRA? Parameter efficiency
# Full fine-tune Llama 3 8B: 8B params x 4 bytes = 32GB (weights only)
# LoRA: freeze all original weights, add two tiny low-rank matrices A and B
# Original: W is d x d (e.g., 4096x4096 = 16.7M params)
# LoRA adds: B x A where A is r x d, B is d x r, r << d
# If r=16: (4096x16) + (16x4096) = 131K params -- 128x fewer than original!
# During inference: W_eff = W + (alpha/r) x BA (merge -- zero overhead)

# QLoRA: 4-bit quantize the base model + add LoRA adapters on top
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)
model = prepare_model_for_kbit_training(model)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Trainable params: 13,631,488 (0.17%)
# Non-trainable: 8,016,175,104 (99.83%)

Tip

Practice FineTuning Strategies Full LoRA and QLoRA in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of FineTuning Strategies Full LoRA and QLoRA from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with FineTuning Strategies Full LoRA and QLoRA is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module