Training · intermediate

Backpropagation (backprop)

Backpropagation is the algorithm used to compute how each weight in a neural network should change to reduce error, by propagating gradients backward through the network.

Published May 29, 2026

Explanation

Forward pass: input flows through the network, producing a prediction and a loss. Backward pass: the loss's derivative is propagated backward layer by layer using the chain rule, producing a gradient for every weight. An optimizer (usually a flavor of stochastic gradient descent) then nudges each weight in the direction that reduces loss.

Backprop is implemented automatically by every modern deep learning framework (PyTorch, JAX, TensorFlow) via automatic differentiation. You write the forward pass; the framework handles the gradient computation.

Examples

PyTorch's loss.backward() triggers backpropagation.
Training a transformer is backprop applied to billions of weights, billions of times.

Frequently asked

What is Backpropagation?

Backpropagation is the algorithm used to compute how each weight in a neural network should change to reduce error, by propagating gradients backward through the network.

What is an example of backpropagation?

PyTorch's loss.backward() triggers backpropagation.

How is Backpropagation related to Gradient Descent?

Backpropagation and Gradient Descent are both training concepts. Gradient descent is the optimization algorithm at the heart of training: nudge each weight in the direction that reduces the loss, with a small step size set by the learning rate.

Is Backpropagation considered intermediate?

Backpropagation is generally considered intermediate-level material in the AI and LLM space.

Gradient DescentTraining

Gradient descent is the optimization algorithm at the heart of training: nudge each weight in the direction that reduces the loss, with a small step size set by the learning rate.

Neural NetworkFoundations

A neural network is a stack of simple mathematical units ("neurons") that learn to transform inputs into outputs by adjusting numeric weights during training.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

Loss FunctionTraining

A loss function measures how wrong a model's predictions are. Training minimizes it. For LLMs the loss is the cross-entropy of predicted vs. actual next tokens.

Side-by-side comparisons

Sources

Wikipedia — Backpropagation