Inference · beginner
Temperature
Temperature is a generation parameter that controls randomness. 0 is deterministic (always pick the most likely token); higher values produce more diverse, surprising output.
Explanation
Mathematically, temperature divides the logits before softmax. temperature=1 is "use the model's native distribution"; less than 1 sharpens the distribution (more conservative); greater than 1 flattens it (more creative).
In practice: 0.0-0.3 for factual or code generation, 0.7 for general chat, 1.0+ for brainstorming. Most APIs cap around 2.0 because higher values produce gibberish.
Temperature is almost always paired with top-p or top-k sampling to truncate the distribution's long tail before sampling.
Examples
- Temperature 0: same prompt, same response, every time.
- Temperature 1.0: a chat model gives varied but coherent responses.
When to use temperature
Low for code/extraction; medium for chat; high for creative writing.
Frequently asked
What is Temperature?
Temperature is a generation parameter that controls randomness. 0 is deterministic (always pick the most likely token); higher values produce more diverse, surprising output.
What is an example of temperature?
Temperature 0: same prompt, same response, every time.
How is Temperature related to Top-p?
Temperature and Top-p are both inference concepts. Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.
When should I use temperature?
Low for code/extraction; medium for chat; high for creative writing.
Is Temperature considered beginner?
Temperature is generally considered beginner-level material in the AI and LLM space.