Comparison

Scaling Laws vs Training Compute

Scaling Laws and Training Compute are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Scaling Laws

Scaling Laws comes up when the question is fundamentally about training.

Predicting GPT-4's loss before training based on smaller-scale runs.

When you would reach for Training Compute

Training Compute comes up when the question is fundamentally about training.

GPT-3: ~3 × 10^23 FLOPs.

Frequently asked

What is the difference between Scaling Laws and Training Compute?

Scaling Laws: Scaling laws are the empirical power-law relationship between model size, training data, training compute, and resulting loss. They predict that bigger, more data-fed models keep improving in a smooth, forecastable way. Training Compute: Training compute is the total floating-point operations used to pretrain a model, usually expressed as FLOPs (e.g. 10^25 FLOPs). It is the headline number governments now regulate.

When should I use Scaling Laws vs Training Compute?

Scaling Laws is the right concept when you are focused on training. Training Compute applies when you are focused on training.

Are Scaling Laws and Training Compute the same thing?

No. Scaling Laws is training; Training Compute is training. They are related but address different parts of the AI stack.