Skip to main content
ModelTerms

Training · intermediate

Backpropagation (backprop)

Backpropagation is the algorithm used to compute how each weight in a neural network should change to reduce error, by propagating gradients backward through the network.

Explanation

Forward pass: input flows through the network, producing a prediction and a loss. Backward pass: the loss's derivative is propagated backward layer by layer using the chain rule, producing a gradient for every weight. An optimizer (usually a flavor of stochastic gradient descent) then nudges each weight in the direction that reduces loss.

Backprop is implemented automatically by every modern deep learning framework (PyTorch, JAX, TensorFlow) via automatic differentiation. You write the forward pass; the framework handles the gradient computation.

Examples

  • PyTorch's loss.backward() triggers backpropagation.
  • Training a transformer is backprop applied to billions of weights, billions of times.

Frequently asked

What is Backpropagation?

Backpropagation is the algorithm used to compute how each weight in a neural network should change to reduce error, by propagating gradients backward through the network.

What is an example of backpropagation?

PyTorch's loss.backward() triggers backpropagation.

How is Backpropagation related to Gradient Descent?

Backpropagation and Gradient Descent are both training concepts. Gradient descent is the optimization algorithm at the heart of training: nudge each weight in the direction that reduces the loss, with a small step size set by the learning rate.

Is Backpropagation considered intermediate?

Backpropagation is generally considered intermediate-level material in the AI and LLM space.

Side-by-side comparisons

Sources