Training · intermediate
Backpropagation (backprop)
Backpropagation is the algorithm used to compute how each weight in a neural network should change to reduce error, by propagating gradients backward through the network.
Explanation
Forward pass: input flows through the network, producing a prediction and a loss. Backward pass: the loss's derivative is propagated backward layer by layer using the chain rule, producing a gradient for every weight. An optimizer (usually a flavor of stochastic gradient descent) then nudges each weight in the direction that reduces loss.
Backprop is implemented automatically by every modern deep learning framework (PyTorch, JAX, TensorFlow) via automatic differentiation. You write the forward pass; the framework handles the gradient computation.
Examples
- PyTorch's loss.backward() triggers backpropagation.
- Training a transformer is backprop applied to billions of weights, billions of times.
Frequently asked
What is Backpropagation?
Backpropagation is the algorithm used to compute how each weight in a neural network should change to reduce error, by propagating gradients backward through the network.
What is an example of backpropagation?
PyTorch's loss.backward() triggers backpropagation.
How is Backpropagation related to Gradient Descent?
Backpropagation and Gradient Descent are both training concepts. Gradient descent is the optimization algorithm at the heart of training: nudge each weight in the direction that reduces the loss, with a small step size set by the learning rate.
Is Backpropagation considered intermediate?
Backpropagation is generally considered intermediate-level material in the AI and LLM space.