Agents & Tools · intermediate

Agent Memory

Agent memory is the mechanism that lets an agent carry information across turns or sessions — short-term (current conversation context) or long-term (persistent facts about the user or world).

Published May 31, 2026

Explanation

An LLM call is stateless — every message arrives in a fresh context unless the application stuffs prior turns in. Short-term memory is the conversation buffer: previous user and assistant turns appended to the next prompt, optionally summarized when the buffer overflows.

Long-term memory is persistent — facts about the user, past projects, preferences — typically stored in a vector or key-value store and retrieved per turn. Mem0, Letta (formerly MemGPT), Zep, and OpenAI's assistant Memory feature are the common implementations.

The hard part is forgetting: deciding what to keep, what to summarize away, and what to delete. Naive append-everything-to-context becomes expensive and noisy fast.

Examples

A chat product that remembers "the user prefers Python over JavaScript" across sessions via a vector-backed memory store.
A coding agent that summarizes previous file edits into a "session notes" prefix for subsequent prompts.

Frequently asked

What is Agent Memory?

Agent memory is the mechanism that lets an agent carry information across turns or sessions — short-term (current conversation context) or long-term (persistent facts about the user or world).

What is an example of agent memory?

A chat product that remembers "the user prefers Python over JavaScript" across sessions via a vector-backed memory store.

How is Agent Memory related to Agent?

Agent Memory and Agent are both agents & tools concepts. An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

Is Agent Memory considered intermediate?

Agent Memory is generally considered intermediate-level material in the AI and LLM space.

AgentAgents & Tools

An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

Context WindowInference

The context window is the maximum number of tokens an LLM can consider in a single call — prompt plus generated output combined.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

Conversational MemoryAgents & Tools

Conversational memory is the strategy for carrying chat history across turns within a single session — append all, sliding window, summarization, or hybrid retrieval over past messages.

Side-by-side comparisons

Sources

Letta (MemGPT) paper (arXiv)