Skip to main content
ModelTerms

Infrastructure · intermediate

Tracing (LLM tracing, distributed tracing)

Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.

Explanation

A single user message in a RAG app might trigger a query rewrite, a vector search, three document fetches, a reranker call, a final LLM completion, and two tool calls. A trace records each step as a span with start time, end time, inputs, outputs, and parent — so you can see the whole tree and pinpoint which step was slow or wrong.

OpenTelemetry's GenAI semantic conventions define standard attribute names (`gen_ai.request.model`, `gen_ai.usage.input_tokens`, etc.) so traces are portable across backends. Arize Phoenix, LangSmith, Langfuse, OpenLLMetry, and Traceloop all emit or ingest OTel-compatible traces.

Traces also feed evaluation: every saved trace is a test case waiting to happen, and offline eval suites typically re-run prompts against captured traces.

Examples

  • A trace showing: user_query → retrieve(top_k=5) → rerank → completion(gpt-4o) with each step's tokens, latency, and content visible.
  • A coding agent trace where one of 12 tool calls returns an error; the trace makes the failure obvious.

Frequently asked

What is Tracing?

Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.

What is an example of tracing?

A trace showing: user_query → retrieve(top_k=5) → rerank → completion(gpt-4o) with each step's tokens, latency, and content visible.

How is Tracing related to Span?

Tracing and Span are both infrastructure concepts. A span is a single unit of work within a trace — one LLM call, one tool call, one retrieval — with a start time, end time, attributes (model, tokens, cost), and a parent span that links it into the trace tree.

Is Tracing considered intermediate?

Tracing is generally considered intermediate-level material in the AI and LLM space.

SpanInfrastructure

A span is a single unit of work within a trace — one LLM call, one tool call, one retrieval — with a start time, end time, attributes (model, tokens, cost), and a parent span that links it into the trace tree.

LLM ObservabilityInfrastructure

LLM observability is the practice of capturing, analyzing, and acting on every LLM call in a production system — inputs, outputs, latencies, costs, errors, and quality scores — so you can debug regressions and improve quality over time.

Arize PhoenixInfrastructure

Arize Phoenix is an open-source LLM observability and evaluation tool. It ingests OpenTelemetry traces, renders them in a debug UI, and provides built-in LLM-as-judge evaluators for hallucination, relevance, and toxicity.

LangSmithInfrastructure

LangSmith is LangChain's commercial LLM observability and evaluation platform. It captures traces (LangChain-native and OTel), runs evaluations, manages prompt versions, and supports dataset curation.

AgentAgents & Tools

An AI agent is an LLM-driven system that decides which actions to take, executes them via tools, observes the results, and iterates until a goal is met.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

Side-by-side comparisons

Sources