Infrastructure · intermediate

Span

A span is a single unit of work within a trace — one LLM call, one tool call, one retrieval — with a start time, end time, attributes (model, tokens, cost), and a parent span that links it into the trace tree.

Published May 31, 2026

Explanation

Trace : span :: log file : log line. A trace is the whole story; each span is one event in it. In an LLM app the typical span types are LLM (a completion call), RETRIEVER (a vector / keyword search), TOOL (a function call), EMBEDDING (an embedding call), CHAIN (a composed step), and AGENT (a multi-step decision).

OpenTelemetry standardizes span attributes for GenAI workloads: `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.id`. Tools like Arize Phoenix and LangSmith parse these to compute cost, latency, and quality metrics automatically.

For debugging, the most useful span fields are: input (the prompt or query), output (the response or retrieved chunks), and any error.

Examples

An LLM span: model=gpt-4o, input=system+user msgs, output=response text, tokens_in=820, tokens_out=410, latency=1.8s, cost=$0.0061.
A retriever span: query="how to set up oauth", top_k=5, returned=[5 chunks with similarity scores].

Frequently asked

What is Span?

What is an example of span?

An LLM span: model=gpt-4o, input=system+user msgs, output=response text, tokens_in=820, tokens_out=410, latency=1.8s, cost=$0.0061.

How is Span related to Tracing?

Span and Tracing are both infrastructure concepts. Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.

Is Span considered intermediate?

Span is generally considered intermediate-level material in the AI and LLM space.

TracingInfrastructure

Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.

LLM ObservabilityInfrastructure

LLM observability is the practice of capturing, analyzing, and acting on every LLM call in a production system — inputs, outputs, latencies, costs, errors, and quality scores — so you can debug regressions and improve quality over time.

Arize PhoenixInfrastructure

Arize Phoenix is an open-source LLM observability and evaluation tool. It ingests OpenTelemetry traces, renders them in a debug UI, and provides built-in LLM-as-judge evaluators for hallucination, relevance, and toxicity.

LangSmithInfrastructure

LangSmith is LangChain's commercial LLM observability and evaluation platform. It captures traces (LangChain-native and OTel), runs evaluations, manages prompt versions, and supports dataset curation.

Side-by-side comparisons

Sources

OpenTelemetry — Spans