Agents & Tools · advanced

Contextual Retrieval

Contextual retrieval, introduced by Anthropic, prepends a model-generated context summary to each chunk before embedding — so chunks know which document and section they came from, improving retrieval precision by ~50%.

Published May 31, 2026

Explanation

Standard chunking strips a chunk of its context: a paragraph from page 47 of a contract reads like a paragraph, with no signal about the contract or the section. Contextual retrieval first asks an LLM to generate a short context sentence ("This chunk is from the indemnification clause of the MSA between Acme and Beta Corp"), prepends it to the chunk, then embeds and BM25-indexes the augmented version.

Anthropic showed this cuts retrieval failure rate by ~35% with embeddings alone and ~49% with hybrid + reranker. The cost is one cheap LLM call per chunk at indexing time — typically Haiku or similar.

Combined with prompt caching (the same long document is the cache prefix for every chunk's contextualization call), the cost is roughly 1% of the original embedding budget.

Examples

A legal RAG with thousands of contracts: contextual retrieval generates "Section X of Contract Y" prefixes; retrieval precision on cross-contract questions jumps materially.
Anthropic's reference implementation: Haiku contextualizes chunks; Voyage embeddings + BM25 + Cohere Rerank on top.

When to use contextual retrieval

When your corpus is large, varied, and chunks lose context when stripped from their parent document.

Frequently asked

What is Contextual Retrieval?

What is an example of contextual retrieval?

A legal RAG with thousands of contracts: contextual retrieval generates "Section X of Contract Y" prefixes; retrieval precision on cross-contract questions jumps materially.

How is Contextual Retrieval related to Retrieval-Augmented Generation?

Contextual Retrieval and Retrieval-Augmented Generation are both agents & tools concepts. RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

When should I use contextual retrieval?

When your corpus is large, varied, and chunks lose context when stripped from their parent document.

Is Contextual Retrieval considered advanced?

Contextual Retrieval is generally considered advanced-level material in the AI and LLM space.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

ChunkingAgents & Tools

Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.

Hybrid SearchAgents & Tools

Hybrid search combines vector (semantic) and keyword (BM25) retrieval and fuses their results — usually via Reciprocal Rank Fusion — to get the best of both: semantic recall and exact-match precision.

RerankerAgents & Tools

A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

Prompt CachingInference

Prompt caching stores the KV-cache state of a long prefix (system prompt, large document, tool definitions) so subsequent calls that reuse it skip the prefill compute — cutting TTFT and cost by 50-90%.

EmbeddingArchitecture

An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

Side-by-side comparisons

Sources

Anthropic — Contextual Retrieval