Architecture · intermediate

Reasoning Model (thinking model)

A reasoning model spends extra compute thinking step-by-step before answering. OpenAI o1/o3, DeepSeek R1, and Anthropic's extended thinking are reasoning models.

Published May 30, 2026

Explanation

Standard LLMs generate output as fast as you can read it. Reasoning models pause first — internally producing a long scratchpad of chain-of-thought that the user typically does not see — then commit to a final answer informed by that reasoning.

The training recipe combines large-scale chain-of-thought data with reinforcement learning that rewards correctness on verifiable tasks (math, code, logic). The result: large gains on math/coding/science benchmarks at the cost of higher latency and per-call price.

Reasoning models opened a new scaling axis — test-time compute — that runs alongside parameter and data scaling.

Examples

OpenAI o1 solving a competition math problem with hidden CoT.
DeepSeek R1 open-weights reasoning model.
Claude extended thinking mode.

When to use reasoning model

When the task is hard, verifiable, and quality dominates latency cost — math, code, scientific analysis, multi-step planning.

Frequently asked

What is Reasoning Model?

A reasoning model spends extra compute thinking step-by-step before answering. OpenAI o1/o3, DeepSeek R1, and Anthropic's extended thinking are reasoning models.

What is an example of reasoning model?

OpenAI o1 solving a competition math problem with hidden CoT.

How is Reasoning Model related to Chain-of-Thought?

Reasoning Model and Chain-of-Thought are both architecture concepts. Chain-of-thought prompting asks the model to show its reasoning step by step before giving a final answer. It dramatically improves performance on multi-step problems.

When should I use reasoning model?

When the task is hard, verifiable, and quality dominates latency cost — math, code, scientific analysis, multi-step planning.

Is Reasoning Model considered intermediate?

Reasoning Model is generally considered intermediate-level material in the AI and LLM space.

Chain-of-ThoughtPrompting

Chain-of-thought prompting asks the model to show its reasoning step by step before giving a final answer. It dramatically improves performance on multi-step problems.

Test-Time ComputePrompting

Test-time compute is the extra reasoning, sampling, or search a model can do at inference time to improve quality — more thinking tokens, more candidate answers, or verifier-guided search.

InferenceInference

Inference is what happens when you actually run a trained model on new input. For LLMs that means generating tokens one at a time, with sampling and a KV cache.

Large Language ModelFoundations

A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

Reinforcement Learning from Human FeedbackTraining

RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.

BenchmarkEvaluation

A benchmark is a standardized test that scores models on a fixed task, letting you compare them on equal footing. MMLU, HumanEval, and HELM are common examples.

Reasoning Model (thinking model)

Explanation

Examples

When to use reasoning model

Frequently asked

What is Reasoning Model?

What is an example of reasoning model?

How is Reasoning Model related to Chain-of-Thought?

When should I use reasoning model?

Is Reasoning Model considered intermediate?

Side-by-side comparisons

Sources

Explanation

Examples

When to use reasoning model

Frequently asked

What is Reasoning Model?

What is an example of reasoning model?

How is Reasoning Model related to Chain-of-Thought?

When should I use reasoning model?

Is Reasoning Model considered intermediate?

Related terms

Side-by-side comparisons

Sources