Inference · intermediate

Structured Output (JSON mode, structured generation)

Structured output constrains an LLM to emit text matching a schema — usually JSON. The model can be guaranteed to produce valid output that your code can parse without retries.

Published May 30, 2026

Explanation

Asking a model to "respond in JSON" mostly works but occasionally fails on edge cases. Structured-output APIs (OpenAI Structured Outputs, Anthropic tool-use response shapes, llama.cpp grammars, Outlines) constrain sampling at decode time — invalid tokens are masked out before they can be generated.

This trades a tiny amount of quality (some token choices ruled out) for 100% schema compliance, eliminating an entire class of production retries and parsing bugs.

Use whenever the downstream consumer is code rather than a human.

Examples

OpenAI Structured Outputs with a Pydantic / JSON Schema.
Anthropic tool_use blocks returning typed parameters.
Llama.cpp GBNF grammars enforcing valid JSON.

When to use structured output

Any time you need to programmatically parse model output — extraction, function arguments, classification, multi-step pipelines.

Frequently asked

What is Structured Output?

Structured output constrains an LLM to emit text matching a schema — usually JSON. The model can be guaranteed to produce valid output that your code can parse without retries.

What is an example of structured output?

OpenAI Structured Outputs with a Pydantic / JSON Schema.

How is Structured Output related to Function Calling?

Structured Output and Function Calling are both inference concepts. Function calling is the specific API mechanism by which an LLM emits a structured request to invoke a named function with typed arguments. The OpenAI-popularized way to do tool use.

When should I use structured output?

Any time you need to programmatically parse model output — extraction, function arguments, classification, multi-step pipelines.

Is Structured Output considered intermediate?

Structured Output is generally considered intermediate-level material in the AI and LLM space.

Function CallingAgents & Tools

Function calling is the specific API mechanism by which an LLM emits a structured request to invoke a named function with typed arguments. The OpenAI-popularized way to do tool use.

Tool UseAgents & Tools

Tool use is when an LLM can call external functions — APIs, code interpreters, databases, web fetchers — and read their results. The mechanism that turns chat into action.

SamplingInference

Sampling is the act of choosing the next token from the model's output distribution, typically after applying temperature and a truncation strategy like top-p or top-k.

Prompt EngineeringPrompting

Prompt engineering is the craft of writing prompts that reliably produce the behavior you want from an LLM. It blends formatting, examples, tone, and constraints.

JSON ModeInference

JSON mode is a provider-specific feature that forces the model to emit syntactically valid JSON. Stronger than asking nicely; weaker than full structured output with a schema.

Side-by-side comparisons

Sources

OpenAI Structured Outputs