Infrastructure · advanced

Mixed Precision (mixed-precision training)

Mixed-precision training does the bulk of forward and backward computation in 16-bit floats (BF16 or FP16) while keeping master weights and certain accumulations in 32-bit. Faster, smaller, same accuracy.

Published May 29, 2026

Explanation

Modern training is almost universally mixed precision. BF16 (bfloat16) has the same exponent range as FP32 but only 8 bits of mantissa, so it avoids the gradient-underflow problems FP16 has, at the same memory cost.

Combined with loss scaling (for FP16) or automatic mixed precision in PyTorch / JAX, mixed precision is essentially "free" speed and memory savings.

Examples

Pretraining a 7B model in BF16 instead of FP32.
torch.cuda.amp for automatic mixed precision in PyTorch.

Frequently asked

What is Mixed Precision?

What is an example of mixed precision?

Pretraining a 7B model in BF16 instead of FP32.

How is Mixed Precision related to Quantization?

Mixed Precision and Quantization are both infrastructure concepts. Quantization reduces model weights from 16- or 32-bit floats to lower-precision types (INT8, INT4) so the model needs less memory and runs faster, usually with minor quality loss.

Is Mixed Precision considered advanced?

Mixed Precision is generally considered advanced-level material in the AI and LLM space.

QuantizationInfrastructure

Quantization reduces model weights from 16- or 32-bit floats to lower-precision types (INT8, INT4) so the model needs less memory and runs faster, usually with minor quality loss.

GPUInfrastructure

GPUs are the parallel processors that train and run nearly every modern AI model. Their throughput on matrix multiplication is what makes deep learning practical.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

BFloat16Infrastructure

BFloat16 is a 16-bit floating-point format with FP32's exponent range but only 8 bits of mantissa. The default precision for LLM training and most inference.

Side-by-side comparisons

Sources

Mixed Precision Training (arXiv)