Inference · beginner
Batch API (batch inference)
Batch APIs (OpenAI, Anthropic) accept up to 50K LLM requests in a single submission, run them asynchronously over hours, and return results at ~50% of the synchronous price. The cheap option for bulk processing.
Explanation
For workloads that don't need real-time responses — generating embeddings for a million documents, evaluating a benchmark, running synthetic data generation, regrading a corpus — providers offer a batch tier at ~50% off.
The catch: latency goes from sub-second to "within 24 hours." For embeddings or eval that's fine; for user-facing chat it's a non-starter.
OpenAI launched the Batch API in April 2024; Anthropic followed with Message Batches; Google Vertex has long offered batch prediction. Spec is similar: submit a JSONL of requests, poll for completion, download the JSONL of responses.
Examples
- Generating embeddings for 10M support tickets via OpenAI Batch: $0.05 / 1M tokens instead of $0.10, completed overnight.
- Running an eval suite of 50K traces through GPT-4o Batch for a fraction of synchronous cost.
When to use batch api
Any time the work is bulk, async, and not user-facing — embedding pipelines, evals, synthetic data, batch labeling.
Frequently asked
What is Batch API?
Batch APIs (OpenAI, Anthropic) accept up to 50K LLM requests in a single submission, run them asynchronously over hours, and return results at ~50% of the synchronous price. The cheap option for bulk processing.
What is an example of batch api?
Generating embeddings for 10M support tickets via OpenAI Batch: $0.05 / 1M tokens instead of $0.10, completed overnight.
How is Batch API related to Inference?
Batch API and Inference are both inference concepts. Inference is what happens when you actually run a trained model on new input. For LLMs that means generating tokens one at a time, with sampling and a KV cache.
When should I use batch api?
Any time the work is bulk, async, and not user-facing — embedding pipelines, evals, synthetic data, batch labeling.
Is Batch API considered beginner?
Batch API is generally considered beginner-level material in the AI and LLM space.