RAG — Retrieval-Augmented Generation

RAG solves the knowledge cutoff problem of LLMs: instead of relying on memorized training data, the model retrieves relevant documents in real-time and uses them to answer questions. This enables Q&A over your own documents, knowledge bases, or live data — without fine-tuning.

30 min•By Priygop Team•Updated 2026

RAG Pipeline from Scratch

from openai import OpenAI
import chromadb
from chromadb.utils import embedding_functions
import json

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# RAG PIPELINE: Index → Retrieve → Augment → Generate
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

client = OpenAI()

# ── STEP 1: DOCUMENT INGESTION & EMBEDDING ─────────────
# In production: load PDF, Word, web pages, databases
sample_docs = [
    {"id": "1", "text": "Our refund policy: Items can be returned within 30 days with receipt. Digital products are non-refundable. Electronics require original packaging.", "source": "policy.pdf", "page": 1},
    {"id": "2", "text": "Shipping times: Standard delivery 5-7 business days. Express delivery 1-2 business days. Free standard shipping on orders over $50.", "source": "policy.pdf", "page": 2},
    {"id": "3", "text": "Payment methods: We accept Visa, Mastercard, PayPal, and Apple Pay. All transactions are secured with 256-bit SSL encryption.", "source": "policy.pdf", "page": 3},
    {"id": "4", "text": "Warranty: All products have a 1-year manufacturer warranty. Extended 3-year warranty available for $19.99.", "source": "policy.pdf", "page": 4},
]

def get_embedding(text: str) -> list[float]:
    """Create embedding using OpenAI text-embedding-3-small."""
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small",  # 1536 dims, fast, cheap ($0.02/1M tokens)
    )
    return response.data[0].embedding

# ── STEP 2: VECTOR STORE ───────────────────────────────
# ChromaDB: open-source, runs in-process, no server needed
chroma = chromadb.Client()
collection = chroma.create_collection(
    "company_policies",
    metadata={"hnsw:space": "cosine"},  # use cosine similarity for text embeddings
)

# Index all documents
for doc in sample_docs:
    embedding = get_embedding(doc["text"])
    collection.add(
        ids=[doc["id"]],
        embeddings=[embedding],
        documents=[doc["text"]],
        metadatas=[{"source": doc["source"], "page": doc["page"]}],
    )

print(f"Indexed {collection.count()} documents")

# ── STEP 3: RETRIEVAL ──────────────────────────────────
def retrieve(query: str, n_results: int = 3) -> list[dict]:
    """Embed query → find semantically similar documents."""
    query_embedding = get_embedding(query)
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results,
        include=["documents", "metadatas", "distances"],
    )
    return [
        {"text": doc, "metadata": meta, "distance": dist}
        for doc, meta, dist in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0],
        )
    ]

# ── STEP 4: AUGMENTED GENERATION ──────────────────────
def rag_query(question: str) -> str:
    """Retrieve context → augment prompt → generate grounded answer."""
    # Retrieve relevant docs
    retrieved = retrieve(question, n_results=3)

    # Build context string
    context = "\n\n".join([
        f"[Source: {r['metadata']['source']}, p.{r['metadata']['page']}]\n{r['text']}"
        for r in retrieved
    ])

    # Augmented prompt
    system_prompt = """You are a helpful customer support assistant.
Answer questions ONLY based on the provided context.
If the answer is not in the context, say "I don't have that information."
Always cite the source page when answering."""

    user_prompt = f"""Context:
{context}

Question: {question}

Answer based on the context above:"""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=0.1,
        max_tokens=300,
    )
    return response.choices[0].message.content

# Test RAG
questions = ["How long do I have to return an item?", "What payment methods do you accept?", "Do you offer drone delivery?"]
for q in questions:
    answer = rag_query(q)
    print(f"Q: {q}")
    print(f"A: {answer}\n")

Tip

Practice RAG RetrievalAugmented Generation in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

RAG = search + generate. Reduces hallucinations. Vector DB stores knowledge. LLM reasons over retrieved context.

Practice Task

Note

Practice Task — (1) Write a working example of RAG RetrievalAugmented Generation from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with RAG RetrievalAugmented Generation is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module