Agents & Tools

Letting models call tools and act in the world.

An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

intermediate

Agent Memory

Agent memory is the mechanism that lets an agent carry information across turns or sessions — short-term (current conversation context) or long-term (persistent facts about the user or world).

intermediate

Agentic Coding

Agentic coding is an LLM-driven workflow where the model reads code, plans changes, edits files, runs commands, and iterates against feedback — autonomously closing tasks rather than just suggesting code.

intermediate

BM25

BM25 is the classical keyword-based ranking algorithm: a refined TF-IDF that scores documents by query-term frequency, document length, and corpus-wide rarity. The keyword side of hybrid search.

intermediate

Chunking

Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.

intermediate

Computer Use

Computer use is an emerging agent capability where the model takes screenshots of a desktop or browser, identifies UI elements, and controls the mouse and keyboard — letting LLMs operate any software like a human would.

advanced

Contextual Retrieval

Contextual retrieval, introduced by Anthropic, prepends a model-generated context summary to each chunk before embedding — so chunks know which document and section they came from, improving retrieval precision by ~50%.

advanced

Conversational Memory

Conversational memory is the strategy for carrying chat history across turns within a single session — append all, sliding window, summarization, or hybrid retrieval over past messages.

intermediate

Cross-Encoder

A cross-encoder takes a (query, document) pair as joint input and outputs a single relevance score. Slower than the bi-encoders used for dense retrieval but much more accurate — the standard reranker architecture.

advanced

Function Calling

Function calling is the specific API mechanism by which an LLM emits a structured request to invoke a named function with typed arguments. The OpenAI-popularized way to do tool use.

beginner

Hybrid Search

Hybrid search combines vector (semantic) and keyword (BM25) retrieval and fuses their results — usually via Reciprocal Rank Fusion — to get the best of both: semantic recall and exact-match precision.

intermediate

MCP Server

An MCP server exposes tools, resources, or prompts via the Model Context Protocol so any compliant client (Claude Desktop, Cursor, IDE plugins) can call them without bespoke integration.

intermediate

Model Context Protocol

MCP is an open standard for connecting LLMs to external tools and data sources. It defines a JSON-RPC protocol so any client (Claude Desktop, Cursor, IDE plugins) can talk to any MCP server.

intermediate

Multi-Agent

A multi-agent system uses several LLM agents that talk to each other — a manager and workers, a debate, a pipeline — instead of a single agent doing everything.

advanced

Plan-and-Execute

Plan-and-execute splits agent loops into a planning step (produce the full step list up front) and an execution step (run each step). Cheaper than per-step ReAct and easier to inspect.

intermediate

ReAct

ReAct is a prompting pattern that interleaves reasoning ("Thought:") with actions ("Action:") and observations ("Observation:"). It is the foundation of most tool-using agents.

intermediate

Recursive Chunking

Recursive chunking splits text by trying progressively smaller separators — paragraphs, then sentences, then words — until each chunk fits the target size, preserving natural boundaries where possible.

intermediate

Reflexion

Reflexion is a pattern where an agent runs, observes failures, generates a short natural-language "reflection" on what went wrong, and retries with that reflection appended to its prompt — improving via self-critique without weight updates.

advanced

Reranker

A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

intermediate

Retrieval-Augmented Generation

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

intermediate

Semantic Chunking

Semantic chunking embeds each sentence and inserts a chunk boundary wherever consecutive embeddings diverge sharply — producing chunks that respect topic boundaries rather than character counts.

advanced

Semantic Search

Semantic search ranks documents by meaning rather than keyword match, using embedding similarity. "Affordable laptops" can match "cheap notebooks" even with no overlapping words.

beginner

Tool Use

Tool use is when an LLM can call external functions — APIs, code interpreters, databases, web fetchers — and read their results. The mechanism that turns chat into action.

beginner

Vector Database

A vector database stores high-dimensional embeddings and answers "find the K nearest vectors to this query" extremely fast. The retrieval engine behind most RAG systems.

intermediate

Workflow vs Agent

A workflow is a deterministic pipeline where humans hard-code the LLM call sequence. An agent lets the LLM decide which steps to take. Anthropic's recommended default is workflow first, agent only when needed.

intermediate