Infrastructure · intermediate

Tracing (LLM tracing, distributed tracing)

Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.

Published May 31, 2026

Explanation

A single user message in a RAG app might trigger a query rewrite, a vector search, three document fetches, a reranker call, a final LLM completion, and two tool calls. A trace records each step as a span with start time, end time, inputs, outputs, and parent — so you can see the whole tree and pinpoint which step was slow or wrong.

OpenTelemetry's GenAI semantic conventions define standard attribute names (`gen_ai.request.model`, `gen_ai.usage.input_tokens`, etc.) so traces are portable across backends. Arize Phoenix, LangSmith, Langfuse, OpenLLMetry, and Traceloop all emit or ingest OTel-compatible traces.

Traces also feed evaluation: every saved trace is a test case waiting to happen, and offline eval suites typically re-run prompts against captured traces.

Examples

A trace showing: user_query → retrieve(top_k=5) → rerank → completion(gpt-4o) with each step's tokens, latency, and content visible.
A coding agent trace where one of 12 tool calls returns an error; the trace makes the failure obvious.

Frequently asked

What is Tracing?

What is an example of tracing?

A trace showing: user_query → retrieve(top_k=5) → rerank → completion(gpt-4o) with each step's tokens, latency, and content visible.

How is Tracing related to Span?

Tracing and Span are both infrastructure concepts. A span is a single unit of work within a trace — one LLM call, one tool call, one retrieval — with a start time, end time, attributes (model, tokens, cost), and a parent span that links it into the trace tree.

Is Tracing considered intermediate?

Tracing is generally considered intermediate-level material in the AI and LLM space.

SpanInfrastructure

A span is a single unit of work within a trace — one LLM call, one tool call, one retrieval — with a start time, end time, attributes (model, tokens, cost), and a parent span that links it into the trace tree.

LLM ObservabilityInfrastructure

LLM observability is the practice of capturing, analyzing, and acting on every LLM call in a production system — inputs, outputs, latencies, costs, errors, and quality scores — so you can debug regressions and improve quality over time.

Arize PhoenixInfrastructure

Arize Phoenix is an open-source LLM observability and evaluation tool. It ingests OpenTelemetry traces, renders them in a debug UI, and provides built-in LLM-as-judge evaluators for hallucination, relevance, and toxicity.

LangSmithInfrastructure

LangSmith is LangChain's commercial LLM observability and evaluation platform. It captures traces (LangChain-native and OTel), runs evaluations, manages prompt versions, and supports dataset curation.

AgentAgents & Tools

An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

Tracing (LLM tracing, distributed tracing)

Explanation

Examples

Frequently asked

What is Tracing?

What is an example of tracing?

How is Tracing related to Span?

Is Tracing considered intermediate?

Side-by-side comparisons

Sources

Explanation

Examples

Frequently asked

What is Tracing?

What is an example of tracing?

How is Tracing related to Span?

Is Tracing considered intermediate?

Related terms

Side-by-side comparisons

Sources