Agents & Tools · advanced
Contextual Retrieval
Contextual retrieval, introduced by Anthropic, prepends a model-generated context summary to each chunk before embedding — so chunks know which document and section they came from, improving retrieval precision by ~50%.
Explanation
Standard chunking strips a chunk of its context: a paragraph from page 47 of a contract reads like a paragraph, with no signal about the contract or the section. Contextual retrieval first asks an LLM to generate a short context sentence ("This chunk is from the indemnification clause of the MSA between Acme and Beta Corp"), prepends it to the chunk, then embeds and BM25-indexes the augmented version.
Anthropic showed this cuts retrieval failure rate by ~35% with embeddings alone and ~49% with hybrid + reranker. The cost is one cheap LLM call per chunk at indexing time — typically Haiku or similar.
Combined with prompt caching (the same long document is the cache prefix for every chunk's contextualization call), the cost is roughly 1% of the original embedding budget.
Examples
- A legal RAG with thousands of contracts: contextual retrieval generates "Section X of Contract Y" prefixes; retrieval precision on cross-contract questions jumps materially.
- Anthropic's reference implementation: Haiku contextualizes chunks; Voyage embeddings + BM25 + Cohere Rerank on top.
When to use contextual retrieval
When your corpus is large, varied, and chunks lose context when stripped from their parent document.
Frequently asked
What is Contextual Retrieval?
Contextual retrieval, introduced by Anthropic, prepends a model-generated context summary to each chunk before embedding — so chunks know which document and section they came from, improving retrieval precision by ~50%.
What is an example of contextual retrieval?
A legal RAG with thousands of contracts: contextual retrieval generates "Section X of Contract Y" prefixes; retrieval precision on cross-contract questions jumps materially.
How is Contextual Retrieval related to Retrieval-Augmented Generation?
Contextual Retrieval and Retrieval-Augmented Generation are both agents & tools concepts. RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.
When should I use contextual retrieval?
When your corpus is large, varied, and chunks lose context when stripped from their parent document.
Is Contextual Retrieval considered advanced?
Contextual Retrieval is generally considered advanced-level material in the AI and LLM space.