Evaluation · intermediate
Faithfulness (groundedness)
Faithfulness measures whether an LLM's answer is supported by the retrieved context — every claim either appears in the source material or follows directly from it. The most important RAG quality metric.
Explanation
A faithful answer never introduces information not present in the retrieved chunks. An unfaithful answer hallucinates — confidently asserting facts that the source did not contain.
Standard scoring: LLM-as-judge reads (retrieved context, generated answer), decomposes the answer into atomic claims, and checks whether each claim is entailed by the context. Pass = all claims entailed; partial-fail = some entailed; fail = at least one major claim contradicts or fabricates.
Phoenix, Ragas, and Anthropic's eval recipes all ship faithfulness graders out of the box. Most RAG quality dashboards lead with faithfulness as the headline metric — it correlates strongly with user trust and is the easiest hallucination to catch in CI.
Examples
- Faithfulness eval flags an answer that cited "California enacted X in 2024" when the retrieved policy said 2023; the trace surfaces the original failure.
- Phoenix's built-in faithfulness evaluator running on 5% of production traces.
When to use faithfulness
Always for RAG — faithfulness is the single most actionable production metric.
Frequently asked
What is Faithfulness?
Faithfulness measures whether an LLM's answer is supported by the retrieved context — every claim either appears in the source material or follows directly from it. The most important RAG quality metric.
What is an example of faithfulness?
Faithfulness eval flags an answer that cited "California enacted X in 2024" when the retrieved policy said 2023; the trace surfaces the original failure.
How is Faithfulness related to Retrieval-Augmented Generation?
Faithfulness and Retrieval-Augmented Generation are both evaluation concepts. RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.
When should I use faithfulness?
Always for RAG — faithfulness is the single most actionable production metric.
Is Faithfulness considered intermediate?
Faithfulness is generally considered intermediate-level material in the AI and LLM space.