Inference · intermediate
Top-k
Top-k restricts token sampling to the k highest-probability tokens, then samples from that set. A simpler alternative to top-p.
Explanation
Pick k (say, 50) and at each step the model only ever samples from the 50 most-likely next tokens. This caps the worst-case "weird token" failure mode.
Top-k is less adaptive than top-p: it always considers exactly k tokens, regardless of how confident or uncertain the model is at that step. Most modern systems prefer top-p for that reason, though top-k still appears in older codebases and some open-source defaults.
Examples
- top-k = 50: a common default in Hugging Face generation.
- top-k = 1: same as greedy decoding (always pick the top token).
Frequently asked
What is Top-k?
Top-k restricts token sampling to the k highest-probability tokens, then samples from that set. A simpler alternative to top-p.
What is an example of top-k?
top-k = 50: a common default in Hugging Face generation.
How is Top-k related to Top-p?
Top-k and Top-p are both inference concepts. Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.
Is Top-k considered intermediate?
Top-k is generally considered intermediate-level material in the AI and LLM space.