Skip to main content
ModelTerms

Agents & Tools · intermediate

Chunking (document chunking, text splitting)

Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.

Explanation

LLMs and embedding models have context limits; retrieval performance also degrades on very long passages. So before indexing, documents get split into chunks — often 200-1000 tokens with 10-20% overlap.

Strategy matters. Fixed-size character chunks are easy but split sentences mid-thought. Recursive chunking (LangChain-style) tries to respect natural boundaries (paragraphs, then sentences). Semantic chunking uses embedding distance to find topic shifts. Markdown-aware splitters preserve heading hierarchy.

Bad chunking is the most common cause of bad RAG. Symptoms: retrieved chunks are missing the relevant sentence, the answer is in a different chunk than the question keyword, or the model is forced to stitch fragments together.

Examples

  • A 50-page PDF split into 200-token chunks with 50-token overlap → ~150 chunks indexed.
  • A Markdown wiki chunked by H2 sections, then further split if a section exceeds 1500 tokens.

When to use chunking

Always — chunking is upstream of every other RAG decision. Spending 2 hours on chunking strategy commonly beats 2 weeks of prompt tuning.

Frequently asked

What is Chunking?

Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.

What is an example of chunking?

A 50-page PDF split into 200-token chunks with 50-token overlap → ~150 chunks indexed.

How is Chunking related to Retrieval-Augmented Generation?

Chunking and Retrieval-Augmented Generation are both agents & tools concepts. RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

When should I use chunking?

Always — chunking is upstream of every other RAG decision. Spending 2 hours on chunking strategy commonly beats 2 weeks of prompt tuning.

Is Chunking considered intermediate?

Chunking is generally considered intermediate-level material in the AI and LLM space.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

Recursive ChunkingAgents & Tools

Recursive chunking splits text by trying progressively smaller separators — paragraphs, then sentences, then words — until each chunk fits the target size, preserving natural boundaries where possible.

Semantic ChunkingAgents & Tools

Semantic chunking embeds each sentence and inserts a chunk boundary wherever consecutive embeddings diverge sharply — producing chunks that respect topic boundaries rather than character counts.

EmbeddingArchitecture

An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

RerankerAgents & Tools

A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

Contextual RetrievalAgents & Tools

Contextual retrieval, introduced by Anthropic, prepends a model-generated context summary to each chunk before embedding — so chunks know which document and section they came from, improving retrieval precision by ~50%.

Side-by-side comparisons

Sources