Comparison

ELO Rating vs Win Rate

ELO Rating and Win Rate are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for ELO Rating

ELO Rating comes up when the question is fundamentally about evaluation.

Chatbot Arena: Claude Sonnet 4 with an ELO of ~1325; GPT-3.5 around ~1100.

When you would reach for Win Rate

Win Rate comes up when the question is fundamentally about evaluation.

Llama 3 70B Instruct vs GPT-3.5: ~60% win rate on AlpacaEval.

Frequently asked

What is the difference between ELO Rating and Win Rate?

ELO Rating: ELO is a rating system originally from chess that converts pairwise wins between players into a single skill number. Chatbot Arena uses it to rank LLMs from anonymous user votes. Win Rate: Win rate is the share of pairwise comparisons one candidate wins against another. The standard scalar for "model A is better than model B" in modern LLM evaluation.

When should I use ELO Rating vs Win Rate?

ELO Rating is the right concept when you are focused on evaluation. Win Rate applies when you are focused on evaluation.

Are ELO Rating and Win Rate the same thing?

No. ELO Rating is evaluation; Win Rate is evaluation. They are related but address different parts of the AI stack.