Fine-Tuning Strategies — Full, LoRA, and QLoRA
Fine-tuning makes a general LLM specialized for your task. Full fine-tuning updates all parameters (very expensive — 80GB+ VRAM for 8B model). LoRA trains tiny adapter matrices injected into existing layers (8x more efficient). QLoRA combines 4-bit quantization with LoRA to fine-tune 8B models on a single 8GB GPU.
LoRA and QLoRA Setup
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training
# WHY LoRA? Parameter efficiency
# Full fine-tune Llama 3 8B: 8B params x 4 bytes = 32GB (weights only)
# LoRA: freeze all original weights, add two tiny low-rank matrices A and B
# Original: W is d x d (e.g., 4096x4096 = 16.7M params)
# LoRA adds: B x A where A is r x d, B is d x r, r << d
# If r=16: (4096x16) + (16x4096) = 131K params -- 128x fewer than original!
# During inference: W_eff = W + (alpha/r) x BA (merge -- zero overhead)
# QLoRA: 4-bit quantize the base model + add LoRA adapters on top
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
)
model = prepare_model_for_kbit_training(model)
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.CAUSAL_LM,
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Trainable params: 13,631,488 (0.17%)
# Non-trainable: 8,016,175,104 (99.83%)Tip
Tip
Practice FineTuning Strategies Full LoRA and QLoRA in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of FineTuning Strategies Full LoRA and QLoRA from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with FineTuning Strategies Full LoRA and QLoRA is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.