Skip to main content
ModelTerms

Learning path · 30 min · intermediate

The RAG stack

From "let the model answer from documents" to a production retrieval system.

RAG sounds simple — retrieve the right chunks, feed them to the model. In production it is a dense stack of decisions: how to chunk, how to embed, how to retrieve, how to rerank, how to evaluate. This path walks the whole thing.

  1. Retrieval-Augmented GenerationRAG

    Why this step: The pattern itself. Start here so the rest of the pieces fit a known shape.

    RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

    Read full entry →Agents & Tools · intermediate
  2. Embeddingvector embedding

    Why this step: The vector representation that makes semantic retrieval possible.

    An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

    Read full entry →Architecture · intermediate
  3. Chunkingdocument chunking

    Why this step: The most under-appreciated decision in RAG quality.

    Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.

    Read full entry →Agents & Tools · intermediate
  4. Recursive Chunkingrecursive character splitter

    Why this step: The default chunking strategy in practice.

    Recursive chunking splits text by trying progressively smaller separators — paragraphs, then sentences, then words — until each chunk fits the target size, preserving natural boundaries where possible.

    Read full entry →Agents & Tools · intermediate
  5. Semantic Chunking

    Why this step: The smarter alternative when documents have variable topic density.

    Semantic chunking embeds each sentence and inserts a chunk boundary wherever consecutive embeddings diverge sharply — producing chunks that respect topic boundaries rather than character counts.

    Read full entry →Agents & Tools · advanced
  6. Vector Databasevector store

    Why this step: Where the embeddings live and get queried.

    A vector database stores high-dimensional embeddings and answers "find the K nearest vectors to this query" extremely fast. The retrieval engine behind most RAG systems.

    Read full entry →Agents & Tools · intermediate
  7. BM25Okapi BM25

    Why this step: The keyword side of search — still indispensable.

    BM25 is the classical keyword-based ranking algorithm: a refined TF-IDF that scores documents by query-term frequency, document length, and corpus-wide rarity. The keyword side of hybrid search.

    Read full entry →Agents & Tools · intermediate
  8. Hybrid Searchhybrid retrieval

    Why this step: The fusion of vector + BM25. The production default.

    Hybrid search combines vector (semantic) and keyword (BM25) retrieval and fuses their results — usually via Reciprocal Rank Fusion — to get the best of both: semantic recall and exact-match precision.

    Read full entry →Agents & Tools · intermediate
  9. Rerankerre-ranking

    Why this step: The second-pass scoring everyone forgets. Highest-leverage RAG fix.

    A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

    Read full entry →Agents & Tools · intermediate
  10. Cross-Encoder

    Why this step: What rerankers actually are under the hood.

    A cross-encoder takes a (query, document) pair as joint input and outputs a single relevance score. Slower than the bi-encoders used for dense retrieval but much more accurate — the standard reranker architecture.

    Read full entry →Agents & Tools · advanced
  11. Contextual Retrieval

    Why this step: Anthropic's technique to make chunks know which document they came from.

    Contextual retrieval, introduced by Anthropic, prepends a model-generated context summary to each chunk before embedding — so chunks know which document and section they came from, improving retrieval precision by ~50%.

    Read full entry →Agents & Tools · advanced
  12. Faithfulnessgroundedness

    Why this step: The canonical eval metric. RAG quality is mostly about faithfulness.

    Faithfulness measures whether an LLM's answer is supported by the retrieved context — every claim either appears in the source material or follows directly from it. The most important RAG quality metric.

    Read full entry →Evaluation · intermediate
  13. Answer RelevanceQ&A relevance

    Why this step: The other half of RAG eval — does the answer actually answer the question?

    Answer relevance measures whether the response actually answers the question asked — independent of whether it is true. The complement to faithfulness in RAG eval.

    Read full entry →Evaluation · intermediate

You finished the path.

Now stress-test what you remember.

Take the mixed quiz →Pick another path