Skip to main content
ModelTerms

Inference · beginner

Temperature

Temperature is a generation parameter that controls randomness. 0 is deterministic (always pick the most likely token); higher values produce more diverse, surprising output.

Explanation

Mathematically, temperature divides the logits before softmax. temperature=1 is "use the model's native distribution"; less than 1 sharpens the distribution (more conservative); greater than 1 flattens it (more creative).

In practice: 0.0-0.3 for factual or code generation, 0.7 for general chat, 1.0+ for brainstorming. Most APIs cap around 2.0 because higher values produce gibberish.

Temperature is almost always paired with top-p or top-k sampling to truncate the distribution's long tail before sampling.

Examples

  • Temperature 0: same prompt, same response, every time.
  • Temperature 1.0: a chat model gives varied but coherent responses.

When to use temperature

Low for code/extraction; medium for chat; high for creative writing.

Frequently asked

What is Temperature?

Temperature is a generation parameter that controls randomness. 0 is deterministic (always pick the most likely token); higher values produce more diverse, surprising output.

What is an example of temperature?

Temperature 0: same prompt, same response, every time.

How is Temperature related to Top-p?

Temperature and Top-p are both inference concepts. Top-p (nucleus sampling) restricts token selection to the smallest set of tokens whose cumulative probability reaches p. Common values are 0.9-0.95.

When should I use temperature?

Low for code/extraction; medium for chat; high for creative writing.

Is Temperature considered beginner?

Temperature is generally considered beginner-level material in the AI and LLM space.

Side-by-side comparisons

Sources