Skip to main content
ModelTerms

Comparison

Inference vs Speculative Decoding

Inference and Speculative Decoding are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Inference

Inference comes up when the question is fundamentally about inference.

A ChatGPT response: one inference call per turn.

When you would reach for Speculative Decoding

Speculative Decoding comes up when the question is fundamentally about inference.

Llama 3 70B accelerated by Llama 3 8B as draft.

Frequently asked

What is the difference between Inference and Speculative Decoding?

Inference: Inference is what happens when you actually run a trained model on new input. For LLMs that means generating tokens one at a time, with sampling and a KV cache. Speculative Decoding: Speculative decoding speeds up generation by having a small "draft" model propose several tokens, then verifying them in a single batched call to the big model.

When should I use Inference vs Speculative Decoding?

Inference is the right concept when you are focused on inference. Speculative Decoding applies when you are focused on inference.

Are Inference and Speculative Decoding the same thing?

No. Inference is inference; Speculative Decoding is inference. They are related but address different parts of the AI stack.