Agents & Tools · intermediate
Chunking (document chunking, text splitting)
Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.
Explanation
LLMs and embedding models have context limits; retrieval performance also degrades on very long passages. So before indexing, documents get split into chunks — often 200-1000 tokens with 10-20% overlap.
Strategy matters. Fixed-size character chunks are easy but split sentences mid-thought. Recursive chunking (LangChain-style) tries to respect natural boundaries (paragraphs, then sentences). Semantic chunking uses embedding distance to find topic shifts. Markdown-aware splitters preserve heading hierarchy.
Bad chunking is the most common cause of bad RAG. Symptoms: retrieved chunks are missing the relevant sentence, the answer is in a different chunk than the question keyword, or the model is forced to stitch fragments together.
Examples
- A 50-page PDF split into 200-token chunks with 50-token overlap → ~150 chunks indexed.
- A Markdown wiki chunked by H2 sections, then further split if a section exceeds 1500 tokens.
When to use chunking
Always — chunking is upstream of every other RAG decision. Spending 2 hours on chunking strategy commonly beats 2 weeks of prompt tuning.
Frequently asked
What is Chunking?
Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.
What is an example of chunking?
A 50-page PDF split into 200-token chunks with 50-token overlap → ~150 chunks indexed.
How is Chunking related to Retrieval-Augmented Generation?
Chunking and Retrieval-Augmented Generation are both agents & tools concepts. RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.
When should I use chunking?
Always — chunking is upstream of every other RAG decision. Spending 2 hours on chunking strategy commonly beats 2 weeks of prompt tuning.
Is Chunking considered intermediate?
Chunking is generally considered intermediate-level material in the AI and LLM space.