RAG — Retrieval-Augmented Generation
RAG solves the knowledge cutoff problem of LLMs: instead of relying on memorized training data, the model retrieves relevant documents in real-time and uses them to answer questions. This enables Q&A over your own documents, knowledge bases, or live data — without fine-tuning.
RAG Pipeline from Scratch
from openai import OpenAI
import chromadb
from chromadb.utils import embedding_functions
import json
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# RAG PIPELINE: Index → Retrieve → Augment → Generate
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
client = OpenAI()
# ── STEP 1: DOCUMENT INGESTION & EMBEDDING ─────────────
# In production: load PDF, Word, web pages, databases
sample_docs = [
{"id": "1", "text": "Our refund policy: Items can be returned within 30 days with receipt. Digital products are non-refundable. Electronics require original packaging.", "source": "policy.pdf", "page": 1},
{"id": "2", "text": "Shipping times: Standard delivery 5-7 business days. Express delivery 1-2 business days. Free standard shipping on orders over $50.", "source": "policy.pdf", "page": 2},
{"id": "3", "text": "Payment methods: We accept Visa, Mastercard, PayPal, and Apple Pay. All transactions are secured with 256-bit SSL encryption.", "source": "policy.pdf", "page": 3},
{"id": "4", "text": "Warranty: All products have a 1-year manufacturer warranty. Extended 3-year warranty available for $19.99.", "source": "policy.pdf", "page": 4},
]
def get_embedding(text: str) -> list[float]:
"""Create embedding using OpenAI text-embedding-3-small."""
response = client.embeddings.create(
input=text,
model="text-embedding-3-small", # 1536 dims, fast, cheap ($0.02/1M tokens)
)
return response.data[0].embedding
# ── STEP 2: VECTOR STORE ───────────────────────────────
# ChromaDB: open-source, runs in-process, no server needed
chroma = chromadb.Client()
collection = chroma.create_collection(
"company_policies",
metadata={"hnsw:space": "cosine"}, # use cosine similarity for text embeddings
)
# Index all documents
for doc in sample_docs:
embedding = get_embedding(doc["text"])
collection.add(
ids=[doc["id"]],
embeddings=[embedding],
documents=[doc["text"]],
metadatas=[{"source": doc["source"], "page": doc["page"]}],
)
print(f"Indexed {collection.count()} documents")
# ── STEP 3: RETRIEVAL ──────────────────────────────────
def retrieve(query: str, n_results: int = 3) -> list[dict]:
"""Embed query → find semantically similar documents."""
query_embedding = get_embedding(query)
results = collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
include=["documents", "metadatas", "distances"],
)
return [
{"text": doc, "metadata": meta, "distance": dist}
for doc, meta, dist in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0],
)
]
# ── STEP 4: AUGMENTED GENERATION ──────────────────────
def rag_query(question: str) -> str:
"""Retrieve context → augment prompt → generate grounded answer."""
# Retrieve relevant docs
retrieved = retrieve(question, n_results=3)
# Build context string
context = "\n\n".join([
f"[Source: {r['metadata']['source']}, p.{r['metadata']['page']}]\n{r['text']}"
for r in retrieved
])
# Augmented prompt
system_prompt = """You are a helpful customer support assistant.
Answer questions ONLY based on the provided context.
If the answer is not in the context, say "I don't have that information."
Always cite the source page when answering."""
user_prompt = f"""Context:
{context}
Question: {question}
Answer based on the context above:"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
temperature=0.1,
max_tokens=300,
)
return response.choices[0].message.content
# Test RAG
questions = ["How long do I have to return an item?", "What payment methods do you accept?", "Do you offer drone delivery?"]
for q in questions:
answer = rag_query(q)
print(f"Q: {q}")
print(f"A: {answer}\n")Tip
Tip
Practice RAG RetrievalAugmented Generation in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
RAG = search + generate. Reduces hallucinations. Vector DB stores knowledge. LLM reasons over retrieved context.
Practice Task
Note
Practice Task — (1) Write a working example of RAG RetrievalAugmented Generation from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with RAG RetrievalAugmented Generation is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.