Inference · intermediate
Sampling
Sampling is the act of choosing the next token from the model's output distribution, typically after applying temperature and a truncation strategy like top-p or top-k.
Explanation
An LLM produces a probability for every token in its vocabulary. Sampling is how a concrete token gets picked from that distribution. The simplest strategy is greedy (always pick the top one); the others (temperature + top-p/top-k) introduce controlled randomness.
Sampling choices dominate the "feel" of model output more than people realize. A bad sampling configuration can make an excellent model sound dull or unhinged.
Examples
- OpenAI default: temperature 1.0, top-p 1.0.
- Anthropic default: temperature 1.0 with sane truncation.
Frequently asked
What is Sampling?
Sampling is the act of choosing the next token from the model's output distribution, typically after applying temperature and a truncation strategy like top-p or top-k.
What is an example of sampling?
OpenAI default: temperature 1.0, top-p 1.0.
How is Sampling related to Temperature?
Sampling and Temperature are both inference concepts. Temperature is a generation parameter that controls randomness. 0 is deterministic (always pick the most likely token); higher values produce more diverse, surprising output.
Is Sampling considered intermediate?
Sampling is generally considered intermediate-level material in the AI and LLM space.