Evaluation · intermediate
Answer Relevance (Q&A relevance)
Answer relevance measures whether the response actually answers the question asked — independent of whether it is true. The complement to faithfulness in RAG eval.
Explanation
An answer can be faithful (every claim supported by the context) but irrelevant (it answered a different question). And it can be relevant but unfaithful (correct shape, wrong facts). Both metrics together pin down RAG quality.
Standard scoring: LLM-as-judge reads (question, answer), asks "does this answer the question?" on a 1-5 scale or pass/fail. Some implementations also reverse-engineer questions from the answer and check whether they match the original.
Combined faithfulness + relevance + retrieval-relevance (were the retrieved chunks on-topic?) is the canonical RAG triad — Ragas and Phoenix both standardize on this.
Examples
- A user asks "what's the cancellation policy?" and the model returns the refund policy: faithful but low answer-relevance.
- Phoenix Q&A relevance evaluator running per-trace alongside faithfulness.
Frequently asked
What is Answer Relevance?
Answer relevance measures whether the response actually answers the question asked — independent of whether it is true. The complement to faithfulness in RAG eval.
What is an example of answer relevance?
A user asks "what's the cancellation policy?" and the model returns the refund policy: faithful but low answer-relevance.
How is Answer Relevance related to Faithfulness?
Answer Relevance and Faithfulness are both evaluation concepts. Faithfulness measures whether an LLM's answer is supported by the retrieved context — every claim either appears in the source material or follows directly from it. The most important RAG quality metric.
Is Answer Relevance considered intermediate?
Answer Relevance is generally considered intermediate-level material in the AI and LLM space.