Infrastructure · intermediate
Arize Phoenix (Phoenix)
Arize Phoenix is an open-source LLM observability and evaluation tool. It ingests OpenTelemetry traces, renders them in a debug UI, and provides built-in LLM-as-judge evaluators for hallucination, relevance, and toxicity.
Explanation
Phoenix is the OSS arm of Arize AI's commercial observability platform. It runs locally or hosted, accepts traces via OpenInference (an OpenTelemetry extension for GenAI), and surfaces them in a notebook-friendly UI.
Beyond viewing, Phoenix ships pre-built LLM-as-judge templates: hallucination detection (does the answer match retrieved context?), Q&A relevance (does it answer the question?), RAG relevance (are retrieved chunks on-topic?), toxicity, summarization quality. You can run these evals on captured traces, get pass/fail per span, and slice by feature flag, prompt version, or user segment.
The pitch: instrument once with OpenInference and you get a debug UI + an eval harness + a dataset builder for free. Used heavily in teams that want LLMOps tooling without committing to a vendor.
Examples
- A team instruments their RAG pipeline with the Phoenix tracer, then runs the built-in faithfulness eval on yesterday's traffic to find sessions where the model contradicted the docs.
- Phoenix notebook session: load traces from production, sample 500, run hallucination eval, save the failures as a regression test set.
When to use arize phoenix
When you want open-source LLMOps tooling that works in notebooks, the IDE, and production with the same instrumentation.
Frequently asked
What is Arize Phoenix?
Arize Phoenix is an open-source LLM observability and evaluation tool. It ingests OpenTelemetry traces, renders them in a debug UI, and provides built-in LLM-as-judge evaluators for hallucination, relevance, and toxicity.
What is an example of arize phoenix?
A team instruments their RAG pipeline with the Phoenix tracer, then runs the built-in faithfulness eval on yesterday's traffic to find sessions where the model contradicted the docs.
How is Arize Phoenix related to LLM Observability?
Arize Phoenix and LLM Observability are both infrastructure concepts. LLM observability is the practice of capturing, analyzing, and acting on every LLM call in a production system — inputs, outputs, latencies, costs, errors, and quality scores — so you can debug regressions and improve quality over time.
When should I use arize phoenix?
When you want open-source LLMOps tooling that works in notebooks, the IDE, and production with the same instrumentation.
Is Arize Phoenix considered intermediate?
Arize Phoenix is generally considered intermediate-level material in the AI and LLM space.