Comparison

Loss Function vs Pretraining

Loss Function and Pretraining are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Loss Function

Loss Function comes up when the question is fundamentally about training.

Cross-entropy loss in next-token prediction.

When you would reach for Pretraining

Pretraining comes up when the question is fundamentally about training.

GPT-3 pretrained on ~300B tokens.

Frequently asked

What is the difference between Loss Function and Pretraining?

Loss Function: A loss function measures how wrong a model's predictions are. Training minimizes it. For LLMs the loss is the cross-entropy of predicted vs. actual next tokens. Pretraining: Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

When should I use Loss Function vs Pretraining?

Loss Function is the right concept when you are focused on training. Pretraining applies when you are focused on training.

Are Loss Function and Pretraining the same thing?

No. Loss Function is training; Pretraining is training. They are related but address different parts of the AI stack.