Comparison

Parameter Count vs Training Compute

Parameter Count and Training Compute are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Parameter Count

Parameter Count comes up when the question is fundamentally about architecture.

Llama 3 family: 8B, 70B, 405B.

When you would reach for Training Compute

Training Compute comes up when the question is fundamentally about training.

GPT-3: ~3 × 10^23 FLOPs.

Frequently asked

What is the difference between Parameter Count and Training Compute?

Parameter Count: Parameter count is the total number of learnable weights in a model — "7B" means 7 billion parameters. It is the most cited model-size metric, though not always the most informative. Training Compute: Training compute is the total floating-point operations used to pretrain a model, usually expressed as FLOPs (e.g. 10^25 FLOPs). It is the headline number governments now regulate.

When should I use Parameter Count vs Training Compute?

Parameter Count is the right concept when you are focused on architecture. Training Compute applies when you are focused on training.

Are Parameter Count and Training Compute the same thing?

No. Parameter Count is architecture; Training Compute is training. They are related but address different parts of the AI stack.