Skip to main content
ModelTerms

Inference · intermediate

Top-k

Top-k restricts token sampling to the k highest-probability tokens, then samples from that set. A simpler alternative to top-p.

Explanation

Pick k (say, 50) and at each step the model only ever samples from the 50 most-likely next tokens. This caps the worst-case "weird token" failure mode.

Top-k is less adaptive than top-p: it always considers exactly k tokens, regardless of how confident or uncertain the model is at that step. Most modern systems prefer top-p for that reason, though top-k still appears in older codebases and some open-source defaults.

Examples

  • top-k = 50: a common default in Hugging Face generation.
  • top-k = 1: same as greedy decoding (always pick the top token).

Frequently asked

What is Top-k?

Top-k restricts token sampling to the k highest-probability tokens, then samples from that set. A simpler alternative to top-p.

What is an example of top-k?

top-k = 50: a common default in Hugging Face generation.

How is Top-k related to Top-p?

Top-k and Top-p are both inference concepts. Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.

Is Top-k considered intermediate?

Top-k is generally considered intermediate-level material in the AI and LLM space.

Side-by-side comparisons

Sources