Comparison

Continuous Batching vs Tensor Parallelism

Continuous Batching and Tensor Parallelism are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Continuous Batching

Continuous Batching comes up when the question is fundamentally about inference.

A vLLM server: 200 concurrent users with variable-length responses; GPU utilization stays at 95% vs ~30% on static batching.

When you would reach for Tensor Parallelism

Tensor Parallelism comes up when the question is fundamentally about infrastructure.

Llama 3 70B in BF16 (~140 GB) split across 4× H100 (80 GB each) with TP=4.

Frequently asked

What is the difference between Continuous Batching and Tensor Parallelism?

Continuous Batching: Continuous batching lets new requests join an in-flight batch on the next decode step rather than waiting for the current batch to finish, dramatically raising GPU utilization on variable-length workloads. Tensor Parallelism: Tensor parallelism shards individual layers across multiple GPUs — splitting each matrix multiplication so different GPUs compute different output dimensions in parallel.

When should I use Continuous Batching vs Tensor Parallelism?

Continuous Batching is the right concept when you are focused on inference. Tensor Parallelism applies when you are focused on infrastructure.

Are Continuous Batching and Tensor Parallelism the same thing?

No. Continuous Batching is inference; Tensor Parallelism is infrastructure. They are related but address different parts of the AI stack.