Comparison
Scaling Laws vs Training Compute
Scaling Laws and Training Compute are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.
When you would reach for Scaling Laws
Scaling Laws comes up when the question is fundamentally about training.
Predicting GPT-4's loss before training based on smaller-scale runs.
When you would reach for Training Compute
Training Compute comes up when the question is fundamentally about training.
GPT-3: ~3 × 10^23 FLOPs.
Frequently asked
What is the difference between Scaling Laws and Training Compute?
Scaling Laws: Scaling laws are the empirical power-law relationship between model size, training data, training compute, and resulting loss. They predict that bigger, more data-fed models keep improving in a smooth, forecastable way. Training Compute: Training compute is the total floating-point operations used to pretrain a model, usually expressed as FLOPs (e.g. 10^25 FLOPs). It is the headline number governments now regulate.
When should I use Scaling Laws vs Training Compute?
Scaling Laws is the right concept when you are focused on training. Training Compute applies when you are focused on training.
Are Scaling Laws and Training Compute the same thing?
No. Scaling Laws is training; Training Compute is training. They are related but address different parts of the AI stack.