Infrastructure · advanced
Mixed Precision (mixed-precision training)
Mixed-precision training does the bulk of forward and backward computation in 16-bit floats (BF16 or FP16) while keeping master weights and certain accumulations in 32-bit. Faster, smaller, same accuracy.
Explanation
Modern training is almost universally mixed precision. BF16 (bfloat16) has the same exponent range as FP32 but only 8 bits of mantissa, so it avoids the gradient-underflow problems FP16 has, at the same memory cost.
Combined with loss scaling (for FP16) or automatic mixed precision in PyTorch / JAX, mixed precision is essentially "free" speed and memory savings.
Examples
- Pretraining a 7B model in BF16 instead of FP32.
- torch.cuda.amp for automatic mixed precision in PyTorch.
Frequently asked
What is Mixed Precision?
Mixed-precision training does the bulk of forward and backward computation in 16-bit floats (BF16 or FP16) while keeping master weights and certain accumulations in 32-bit. Faster, smaller, same accuracy.
What is an example of mixed precision?
Pretraining a 7B model in BF16 instead of FP32.
How is Mixed Precision related to Quantization?
Mixed Precision and Quantization are both infrastructure concepts. Quantization reduces model weights from 16- or 32-bit floats to lower-precision types (INT8, INT4) so the model needs less memory and runs faster, usually with minor quality loss.
Is Mixed Precision considered advanced?
Mixed Precision is generally considered advanced-level material in the AI and LLM space.