Skip to main content
ModelTerms

Architecture · intermediate

Reasoning Model (thinking model)

A reasoning model spends extra compute thinking step-by-step before answering. OpenAI o1/o3, DeepSeek R1, and Anthropic's extended thinking are reasoning models.

Explanation

Standard LLMs generate output as fast as you can read it. Reasoning models pause first — internally producing a long scratchpad of chain-of-thought that the user typically does not see — then commit to a final answer informed by that reasoning.

The training recipe combines large-scale chain-of-thought data with reinforcement learning that rewards correctness on verifiable tasks (math, code, logic). The result: large gains on math/coding/science benchmarks at the cost of higher latency and per-call price.

Reasoning models opened a new scaling axis — test-time compute — that runs alongside parameter and data scaling.

Examples

  • OpenAI o1 solving a competition math problem with hidden CoT.
  • DeepSeek R1 open-weights reasoning model.
  • Claude extended thinking mode.

When to use reasoning model

When the task is hard, verifiable, and quality dominates latency cost — math, code, scientific analysis, multi-step planning.

Frequently asked

What is Reasoning Model?

A reasoning model spends extra compute thinking step-by-step before answering. OpenAI o1/o3, DeepSeek R1, and Anthropic's extended thinking are reasoning models.

What is an example of reasoning model?

OpenAI o1 solving a competition math problem with hidden CoT.

How is Reasoning Model related to Chain-of-Thought?

Reasoning Model and Chain-of-Thought are both architecture concepts. Chain-of-thought prompting asks the model to show its reasoning step by step before giving a final answer. It dramatically improves performance on multi-step problems.

When should I use reasoning model?

When the task is hard, verifiable, and quality dominates latency cost — math, code, scientific analysis, multi-step planning.

Is Reasoning Model considered intermediate?

Reasoning Model is generally considered intermediate-level material in the AI and LLM space.

Chain-of-ThoughtPrompting

Chain-of-thought prompting asks the model to show its reasoning step by step before giving a final answer. It dramatically improves performance on multi-step problems.

Test-Time ComputePrompting

Test-time compute is the extra reasoning, sampling, or search a model can do at inference time to improve quality — more thinking tokens, more candidate answers, or verifier-guided search.

InferenceInference

Inference is what happens when you actually run a trained model on new input. For LLMs that means generating tokens one at a time, with sampling and a KV cache.

Large Language ModelFoundations

A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

Reinforcement Learning from Human FeedbackTraining

RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.

BenchmarkEvaluation

A benchmark is a standardized test that scores models on a fixed task, letting you compare them on equal footing. MMLU, HumanEval, and HELM are common examples.

Side-by-side comparisons

Sources