Prompting · intermediate
Self-Consistency
Self-consistency samples N chain-of-thought completions for the same problem and takes the majority answer. Improves accuracy on math and reasoning tasks at N× the cost.
Explanation
Chain-of-thought produces variable reasoning paths under sampling. Self-consistency exploits this by sampling many paths and voting on the final answer — wrong paths tend to disagree with each other, while the right path is reached by more sample trajectories.
The result: substantial accuracy gains on math and reasoning benchmarks (e.g. ~10 points on GSM8K) at the cost of N× compute. N is typically 5-40.
In 2024-2026, frontier reasoning models (o1, R1) effectively bake self-consistency into their training — they explore many paths internally before committing. Explicit self-consistency at the application level is now most relevant for non-reasoning models.
Examples
- A GSM8K eval: sample 32 CoT completions per problem, take the majority numeric answer.
- A SQL-generation app: sample 5 candidate queries, run each against the schema, pick the one that returns the expected row count.
When to use self-consistency
When the task has a verifiable answer (math, logic, code that compiles) and N× compute is acceptable.
Frequently asked
What is Self-Consistency?
Self-consistency samples N chain-of-thought completions for the same problem and takes the majority answer. Improves accuracy on math and reasoning tasks at N× the cost.
What is an example of self-consistency?
A GSM8K eval: sample 32 CoT completions per problem, take the majority numeric answer.
How is Self-Consistency related to Chain-of-Thought?
Self-Consistency and Chain-of-Thought are both prompting concepts. Chain-of-thought prompting asks the model to show its reasoning step by step before giving a final answer. It dramatically improves performance on multi-step problems.
When should I use self-consistency?
When the task has a verifiable answer (math, logic, code that compiles) and N× compute is acceptable.
Is Self-Consistency considered intermediate?
Self-Consistency is generally considered intermediate-level material in the AI and LLM space.