Skip to main content
ModelTerms

Infrastructure · advanced

Mixed Precision (mixed-precision training)

Mixed-precision training does the bulk of forward and backward computation in 16-bit floats (BF16 or FP16) while keeping master weights and certain accumulations in 32-bit. Faster, smaller, same accuracy.

Explanation

Modern training is almost universally mixed precision. BF16 (bfloat16) has the same exponent range as FP32 but only 8 bits of mantissa, so it avoids the gradient-underflow problems FP16 has, at the same memory cost.

Combined with loss scaling (for FP16) or automatic mixed precision in PyTorch / JAX, mixed precision is essentially "free" speed and memory savings.

Examples

  • Pretraining a 7B model in BF16 instead of FP32.
  • torch.cuda.amp for automatic mixed precision in PyTorch.

Frequently asked

What is Mixed Precision?

Mixed-precision training does the bulk of forward and backward computation in 16-bit floats (BF16 or FP16) while keeping master weights and certain accumulations in 32-bit. Faster, smaller, same accuracy.

What is an example of mixed precision?

Pretraining a 7B model in BF16 instead of FP32.

How is Mixed Precision related to Quantization?

Mixed Precision and Quantization are both infrastructure concepts. Quantization reduces model weights from 16- or 32-bit floats to lower-precision types (INT8, INT4) so the model needs less memory and runs faster, usually with minor quality loss.

Is Mixed Precision considered advanced?

Mixed Precision is generally considered advanced-level material in the AI and LLM space.

Side-by-side comparisons

Sources