Comparison

Pipeline Parallelism vs Tensor Parallelism

Pipeline Parallelism and Tensor Parallelism are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Pipeline Parallelism

Pipeline Parallelism comes up when the question is fundamentally about infrastructure.

A 405B model trained on 4096 GPUs: TP=8 within node × PP=16 across the pod × DP=32 across pods.

When you would reach for Tensor Parallelism

Tensor Parallelism comes up when the question is fundamentally about infrastructure.

Llama 3 70B in BF16 (~140 GB) split across 4× H100 (80 GB each) with TP=4.

Frequently asked

What is the difference between Pipeline Parallelism and Tensor Parallelism?

Pipeline Parallelism: Pipeline parallelism splits the model by layer across GPUs — GPU 1 holds layers 0-15, GPU 2 holds 16-31, etc. Forward passes flow through the pipeline like an assembly line. Tensor Parallelism: Tensor parallelism shards individual layers across multiple GPUs — splitting each matrix multiplication so different GPUs compute different output dimensions in parallel.

When should I use Pipeline Parallelism vs Tensor Parallelism?

Pipeline Parallelism is the right concept when you are focused on infrastructure. Tensor Parallelism applies when you are focused on infrastructure.

Are Pipeline Parallelism and Tensor Parallelism the same thing?

No. Pipeline Parallelism is infrastructure; Tensor Parallelism is infrastructure. They are related but address different parts of the AI stack.