Skip to main content
ModelTerms

Inference · intermediate

Top-p (nucleus sampling)

Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.

Explanation

If the model assigns 50% probability to "the", 20% to "a", 10% to "this", and tiny amounts to thousands of others, top-p=0.8 would only sample from {the, a, this} (whose cumulative probability is 80%).

This avoids the failure mode where the model occasionally picks a wildly unlikely token from the long tail (which degrades coherence) while still allowing diversity within the high-probability region.

Top-p is generally preferred over top-k because the size of the sampling set adapts to how confident the model is.

Examples

  • top-p = 0.9: typical for chat assistants.
  • top-p = 1.0: no filtering, sample from full distribution.

Frequently asked

What is Top-p?

Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.

What is an example of top-p?

top-p = 0.9: typical for chat assistants.

How is Top-p related to Temperature?

Top-p and Temperature are both inference concepts. Temperature is a generation parameter that controls randomness. 0 is deterministic (always pick the most likely token); higher values produce more diverse, surprising output.

Is Top-p considered intermediate?

Top-p is generally considered intermediate-level material in the AI and LLM space.

Side-by-side comparisons

Sources