Training · intermediate
Loss Function (objective, cost function)
A loss function measures how wrong a model's predictions are. Training minimizes it. For LLMs the loss is the cross-entropy of predicted vs. actual next tokens.
Explanation
For LLMs, the loss is almost always cross-entropy over the vocabulary: the negative log-probability the model assigned to the correct next token. Average it over all positions and all training examples.
Lower loss usually means a better model, but not always — especially after RLHF, where the goal shifts to maximizing a learned reward rather than matching specific text.
Examples
- Cross-entropy loss in next-token prediction.
- Reward model loss in RLHF: how well it ranks pairs of responses.
Frequently asked
What is Loss Function?
A loss function measures how wrong a model's predictions are. Training minimizes it. For LLMs the loss is the cross-entropy of predicted vs. actual next tokens.
What is an example of loss function?
Cross-entropy loss in next-token prediction.
How is Loss Function related to Gradient Descent?
Loss Function and Gradient Descent are both training concepts. Gradient descent is the optimization algorithm at the heart of training: nudge each weight in the direction that reduces the loss, with a small step size set by the learning rate.
Is Loss Function considered intermediate?
Loss Function is generally considered intermediate-level material in the AI and LLM space.