Prompting · advanced

Tree of Thoughts (ToT)

Tree of Thoughts generalizes chain-of-thought to a search tree: at each step the model produces multiple candidate thoughts, evaluates them, and explores the most promising branches — like beam search over reasoning.

Published May 31, 2026

Explanation

Where CoT is one linear chain, ToT branches at each step (e.g. generate 3 next-thoughts, evaluate each, expand the best). Search strategies include BFS (explore K candidates per step), DFS (go deep, backtrack on dead ends), and best-first search guided by an evaluator (often another LLM call).

In practice ToT is expensive (many LLM calls per problem) and often beaten by simpler self-consistency or by reasoning models with internal search. It remains useful as a framework for thinking about LLM search-time compute and shows up in research more than production.

Spiritual successor: o1-style models that do this internally with RL-trained search policies.

Examples

A Game of 24 solver: model branches on which numbers to combine first; evaluator scores partial states; tree expanded best-first.
A theorem-proving agent that proposes multiple proof steps per node and prunes losing branches.

Frequently asked

What is Tree of Thoughts?

What is an example of tree of thoughts?

A Game of 24 solver: model branches on which numbers to combine first; evaluator scores partial states; tree expanded best-first.

How is Tree of Thoughts related to Chain-of-Thought?

Tree of Thoughts and Chain-of-Thought are both prompting concepts. Chain-of-thought prompting asks the model to show its reasoning step by step before giving a final answer. It dramatically improves performance on multi-step problems.

Is Tree of Thoughts considered advanced?

Tree of Thoughts is generally considered advanced-level material in the AI and LLM space.

Chain-of-ThoughtPrompting

Chain-of-thought prompting asks the model to show its reasoning step by step before giving a final answer. It dramatically improves performance on multi-step problems.

Self-ConsistencyPrompting

Self-consistency samples N chain-of-thought completions for the same problem and takes the majority answer. Improves accuracy on math and reasoning tasks at N× the cost.

Test-Time ComputePrompting

Test-time compute is the extra reasoning, sampling, or search a model can do at inference time to improve quality — more thinking tokens, more candidate answers, or verifier-guided search.

Reasoning ModelArchitecture

A reasoning model spends extra compute thinking step-by-step before answering. OpenAI o1/o3, DeepSeek R1, and Anthropic's extended thinking are reasoning models.

Side-by-side comparisons

Sources

Tree of Thoughts (arXiv)