Comparison

Pretraining vs Training Compute

Pretraining and Training Compute are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Pretraining

Pretraining comes up when the question is fundamentally about training.

GPT-3 pretrained on ~300B tokens.

When you would reach for Training Compute

Training Compute comes up when the question is fundamentally about training.

GPT-3: ~3 × 10^23 FLOPs.

Frequently asked

What is the difference between Pretraining and Training Compute?

Pretraining: Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later. Training Compute: Training compute is the total floating-point operations used to pretrain a model, usually expressed as FLOPs (e.g. 10^25 FLOPs). It is the headline number governments now regulate.

When should I use Pretraining vs Training Compute?

Pretraining is the right concept when you are focused on training. Training Compute applies when you are focused on training.

Are Pretraining and Training Compute the same thing?

No. Pretraining is training; Training Compute is training. They are related but address different parts of the AI stack.