Comparison

GPU vs Tensor Parallelism

GPU and Tensor Parallelism are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for GPU

GPU comes up when the question is fundamentally about infrastructure.

NVIDIA H100: ~2 TB/s memory bandwidth, ~989 TF/s BF16.

When you would reach for Tensor Parallelism

Tensor Parallelism comes up when the question is fundamentally about infrastructure.

Llama 3 70B in BF16 (~140 GB) split across 4× H100 (80 GB each) with TP=4.

Frequently asked

What is the difference between GPU and Tensor Parallelism?

GPU: GPUs are the parallel processors that train and run nearly every modern AI model. Their throughput on matrix multiplication is what makes deep learning practical. Tensor Parallelism: Tensor parallelism shards individual layers across multiple GPUs — splitting each matrix multiplication so different GPUs compute different output dimensions in parallel.

When should I use GPU vs Tensor Parallelism?

GPU is the right concept when you are focused on infrastructure. Tensor Parallelism applies when you are focused on infrastructure.

Are GPU and Tensor Parallelism the same thing?

No. GPU is infrastructure; Tensor Parallelism is infrastructure. They are related but address different parts of the AI stack.