Infrastructure · intermediate
Tracing (LLM tracing, distributed tracing)
Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.
Explanation
A single user message in a RAG app might trigger a query rewrite, a vector search, three document fetches, a reranker call, a final LLM completion, and two tool calls. A trace records each step as a span with start time, end time, inputs, outputs, and parent — so you can see the whole tree and pinpoint which step was slow or wrong.
OpenTelemetry's GenAI semantic conventions define standard attribute names (`gen_ai.request.model`, `gen_ai.usage.input_tokens`, etc.) so traces are portable across backends. Arize Phoenix, LangSmith, Langfuse, OpenLLMetry, and Traceloop all emit or ingest OTel-compatible traces.
Traces also feed evaluation: every saved trace is a test case waiting to happen, and offline eval suites typically re-run prompts against captured traces.
Examples
- A trace showing: user_query → retrieve(top_k=5) → rerank → completion(gpt-4o) with each step's tokens, latency, and content visible.
- A coding agent trace where one of 12 tool calls returns an error; the trace makes the failure obvious.
Frequently asked
What is Tracing?
Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.
What is an example of tracing?
A trace showing: user_query → retrieve(top_k=5) → rerank → completion(gpt-4o) with each step's tokens, latency, and content visible.
How is Tracing related to Span?
Tracing and Span are both infrastructure concepts. A span is a single unit of work within a trace — one LLM call, one tool call, one retrieval — with a start time, end time, attributes (model, tokens, cost), and a parent span that links it into the trace tree.
Is Tracing considered intermediate?
Tracing is generally considered intermediate-level material in the AI and LLM space.