Skip to main content
ModelTerms

Training · intermediate

Loss Function (objective, cost function)

A loss function measures how wrong a model's predictions are. Training minimizes it. For LLMs the loss is the cross-entropy of predicted vs. actual next tokens.

Explanation

For LLMs, the loss is almost always cross-entropy over the vocabulary: the negative log-probability the model assigned to the correct next token. Average it over all positions and all training examples.

Lower loss usually means a better model, but not always — especially after RLHF, where the goal shifts to maximizing a learned reward rather than matching specific text.

Examples

  • Cross-entropy loss in next-token prediction.
  • Reward model loss in RLHF: how well it ranks pairs of responses.

Frequently asked

What is Loss Function?

A loss function measures how wrong a model's predictions are. Training minimizes it. For LLMs the loss is the cross-entropy of predicted vs. actual next tokens.

What is an example of loss function?

Cross-entropy loss in next-token prediction.

How is Loss Function related to Gradient Descent?

Loss Function and Gradient Descent are both training concepts. Gradient descent is the optimization algorithm at the heart of training: nudge each weight in the direction that reduces the loss, with a small step size set by the learning rate.

Is Loss Function considered intermediate?

Loss Function is generally considered intermediate-level material in the AI and LLM space.

Side-by-side comparisons

Sources