SFTTrainer — HuggingFace Fine-Tuning Pipeline

TRL's SFTTrainer handles the entire fine-tuning pipeline: tokenization with packing, gradient accumulation, AMP training, logging to W&B, checkpointing, and proper loss computation only on assistant tokens. It's the standard tool used at every major AI lab for RLHF/SFT.

25 min•By Priygop Team•Updated 2026

SFTTrainer Complete Setup

from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
import torch

training_args = SFTConfig(
    output_dir="./llama3-medical-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.05,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    optim="paged_adamw_8bit",
    max_seq_length=1024,
    packing=True,
    dataset_text_field="text",
    logging_steps=10,
    save_strategy="steps",
    save_steps=50,
    evaluation_strategy="steps",
    eval_steps=50,
    load_best_model_at_end=True,
    report_to="wandb",
    run_name="llama3-medical-v1",
)

trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_ds,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=lora_config,
)

trainer.train()

# SAVING AND MERGING LORA WEIGHTS
# Option 1: Save LoRA adapter only (~50MB)
trainer.save_model("./llama3-medical-finetuned")

# Option 2: Merge LoRA weights into base model
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
merged_model = PeftModel.from_pretrained(base_model, "./llama3-medical-finetuned")
merged_model = merged_model.merge_and_unload()
merged_model.save_pretrained("./llama3-medical-merged")
tokenizer.save_pretrained("./llama3-medical-merged")

# Test the fine-tuned model
def generate(prompt: str, mdl, tok, max_new_tokens: int = 300) -> str:
    msgs = [{"role": "user", "content": prompt}]
    fmt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
    inputs = tok(fmt, return_tensors="pt").to(mdl.device)
    with torch.no_grad():
        out = mdl.generate(**inputs, max_new_tokens=max_new_tokens, temperature=0.3, do_sample=True)
    return tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

answer = generate("What are the treatment options for type 2 diabetes?", merged_model, tokenizer)
print(answer)

Tip

Practice SFTTrainer HuggingFace FineTuning Pipeline in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of SFTTrainer HuggingFace FineTuning Pipeline from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with SFTTrainer HuggingFace FineTuning Pipeline is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module