Skip to main content
ModelTerms

Comparison

Continuous Batching vs Time per Output Token

Continuous Batching and Time per Output Token are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Continuous Batching

Continuous Batching comes up when the question is fundamentally about inference.

A vLLM server: 200 concurrent users with variable-length responses; GPU utilization stays at 95% vs ~30% on static batching.

When you would reach for Time per Output Token

Time per Output Token comes up when the question is fundamentally about inference.

A 70B model on H100: TPOT ~25ms (~40 tokens/sec).

Frequently asked

What is the difference between Continuous Batching and Time per Output Token?

Continuous Batching: Continuous batching lets new requests join an in-flight batch on the next decode step rather than waiting for the current batch to finish, dramatically raising GPU utilization on variable-length workloads. Time per Output Token: Time per output token (TPOT) is the average wall-clock delay between consecutive generated tokens during streaming. Determines how fast text appears once generation starts.

When should I use Continuous Batching vs Time per Output Token?

Continuous Batching is the right concept when you are focused on inference. Time per Output Token applies when you are focused on inference.

Are Continuous Batching and Time per Output Token the same thing?

No. Continuous Batching is inference; Time per Output Token is inference. They are related but address different parts of the AI stack.