Agents & Tools · advanced

Reflexion (self-reflection)

Reflexion is a pattern where an agent runs, observes failures, generates a short natural-language "reflection" on what went wrong, and retries with that reflection appended to its prompt — improving via self-critique without weight updates.

Published May 31, 2026

Explanation

Reflexion sits between pure prompting and fine-tuning: the agent keeps a verbal memory of past mistakes ("last time I tried to call `getUser` before authenticating, which failed; I should authenticate first") and uses it on subsequent attempts.

The original paper showed substantial gains on HumanEval and coding benchmarks when reflections from failed runs are appended to the next run's prompt. It is the conceptual basis for many "the agent learns from this session" patterns.

Limits: reflections only help within a session unless persisted to long-term memory; reflections themselves can be wrong.

Examples

A coding agent that gets a test failure, generates a reflection ("the function signature expects a list, I passed a string"), and retries with the corrected understanding.
A research agent that tracks "queries that returned no results" as reflections and reformulates future queries accordingly.

When to use reflexion

Multi-attempt agents on verifiable tasks (code, math, structured outputs). Less useful for one-shot tasks.

Frequently asked

What is Reflexion?

What is an example of reflexion?

A coding agent that gets a test failure, generates a reflection ("the function signature expects a list, I passed a string"), and retries with the corrected understanding.

How is Reflexion related to Agent?

Reflexion and Agent are both agents & tools concepts. An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

When should I use reflexion?

Multi-attempt agents on verifiable tasks (code, math, structured outputs). Less useful for one-shot tasks.

Is Reflexion considered advanced?

Reflexion is generally considered advanced-level material in the AI and LLM space.

AgentAgents & Tools

An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

ReActAgents & Tools

ReAct is a prompting pattern that interleaves reasoning ("Thought:") with actions ("Action:") and observations ("Observation:"). It is the foundation of most tool-using agents.

Agentic CodingAgents & Tools

Agentic coding is an LLM-driven workflow where the model reads code, plans changes, edits files, runs commands, and iterates against feedback — autonomously closing tasks rather than just suggesting code.

Self-ConsistencyPrompting

Self-consistency samples N chain-of-thought completions for the same problem and takes the majority answer. Improves accuracy on math and reasoning tasks at N× the cost.

Agent MemoryAgents & Tools

Agent memory is the mechanism that lets an agent carry information across turns or sessions — short-term (current conversation context) or long-term (persistent facts about the user or world).

Side-by-side comparisons

Sources

Reflexion paper (arXiv)