Inference · intermediate
Top-p (nucleus sampling)
Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.
Explanation
If the model assigns 50% probability to "the", 20% to "a", 10% to "this", and tiny amounts to thousands of others, top-p=0.8 would only sample from {the, a, this} (whose cumulative probability is 80%).
This avoids the failure mode where the model occasionally picks a wildly unlikely token from the long tail (which degrades coherence) while still allowing diversity within the high-probability region.
Top-p is generally preferred over top-k because the size of the sampling set adapts to how confident the model is.
Examples
- top-p = 0.9: typical for chat assistants.
- top-p = 1.0: no filtering, sample from full distribution.
Frequently asked
What is Top-p?
Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.
What is an example of top-p?
top-p = 0.9: typical for chat assistants.
How is Top-p related to Temperature?
Top-p and Temperature are both inference concepts. Temperature is a generation parameter that controls randomness. 0 is deterministic (always pick the most likely token); higher values produce more diverse, surprising output.
Is Top-p considered intermediate?
Top-p is generally considered intermediate-level material in the AI and LLM space.