Infrastructure · intermediate
BFloat16 (BF16)
BFloat16 is a 16-bit floating-point format with FP32's exponent range but only 8 bits of mantissa. The default precision for LLM training and most inference.
Explanation
Storing weights and activations in 16 bits halves memory vs FP32 and roughly halves bandwidth requirements. But FP16 has a tiny exponent range that causes gradient underflow during training.
BFloat16 keeps FP32's 8-bit exponent (huge range) and trims the mantissa to 8 bits (less precision). For neural networks, range matters more than fine precision — so BF16 trains stably out of the box, no loss scaling needed.
All modern frontier training runs in BF16 (sometimes with mixed-precision FP8 for memory-bound layers).
Examples
- Llama 3 trained end-to-end in BF16.
- NVIDIA H100 native BF16 throughput ~989 TFLOPs.
Frequently asked
What is BFloat16?
BFloat16 is a 16-bit floating-point format with FP32's exponent range but only 8 bits of mantissa. The default precision for LLM training and most inference.
What is an example of bfloat16?
Llama 3 trained end-to-end in BF16.
How is BFloat16 related to Mixed Precision?
BFloat16 and Mixed Precision are both infrastructure concepts. Mixed-precision training does the bulk of forward and backward computation in 16-bit floats (BF16 or FP16) while keeping master weights and certain accumulations in 32-bit. Faster, smaller, same accuracy.
Is BFloat16 considered intermediate?
BFloat16 is generally considered intermediate-level material in the AI and LLM space.