Agents & Tools · intermediate
Retrieval-Augmented Generation (RAG)
RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.
Explanation
Pipeline: chunk your documents, embed the chunks, store in a vector database. At query time: embed the user's question, retrieve the top-K nearest chunks, stuff them into the prompt before the question, let the LLM generate the answer.
RAG sidesteps three of the LLM's biggest limitations: knowledge cutoff (you can index new content daily), private data (the model never sees your data during training), and hallucination (the model can cite specific retrieved sources).
It is the dominant pattern for "chat with your docs", customer support bots, and internal knowledge tools.
Examples
- "Chat with your PDFs" — Notion, Glean, ChatGPT custom GPTs.
- Customer-support bot that cites help-center articles by URL.
When to use retrieval-augmented generation
When the model needs information that is not baked into its weights — fresh, private, or domain-specific.
Frequently asked
What is Retrieval-Augmented Generation?
RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.
What is an example of retrieval-augmented generation?
"Chat with your PDFs" — Notion, Glean, ChatGPT custom GPTs.
How is Retrieval-Augmented Generation related to Embedding?
Retrieval-Augmented Generation and Embedding are both agents & tools concepts. An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.
When should I use retrieval-augmented generation?
When the model needs information that is not baked into its weights — fresh, private, or domain-specific.
Is Retrieval-Augmented Generation considered intermediate?
Retrieval-Augmented Generation is generally considered intermediate-level material in the AI and LLM space.