Comparison

BFloat16 vs Training Compute

BFloat16 and Training Compute are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for BFloat16

BFloat16 comes up when the question is fundamentally about infrastructure.

Llama 3 trained end-to-end in BF16.

When you would reach for Training Compute

Training Compute comes up when the question is fundamentally about training.

GPT-3: ~3 × 10^23 FLOPs.

Frequently asked

What is the difference between BFloat16 and Training Compute?

BFloat16: BFloat16 is a 16-bit floating-point format with FP32's exponent range but only 8 bits of mantissa. The default precision for LLM training and most inference. Training Compute: Training compute is the total floating-point operations used to pretrain a model, usually expressed as FLOPs (e.g. 10^25 FLOPs). It is the headline number governments now regulate.

When should I use BFloat16 vs Training Compute?

BFloat16 is the right concept when you are focused on infrastructure. Training Compute applies when you are focused on training.

Are BFloat16 and Training Compute the same thing?

No. BFloat16 is infrastructure; Training Compute is training. They are related but address different parts of the AI stack.