Comparison
ELO Rating vs Win Rate
ELO Rating and Win Rate are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.
When you would reach for ELO Rating
ELO Rating comes up when the question is fundamentally about evaluation.
Chatbot Arena: Claude Sonnet 4 with an ELO of ~1325; GPT-3.5 around ~1100.
When you would reach for Win Rate
Win Rate comes up when the question is fundamentally about evaluation.
Llama 3 70B Instruct vs GPT-3.5: ~60% win rate on AlpacaEval.
Frequently asked
What is the difference between ELO Rating and Win Rate?
ELO Rating: ELO is a rating system originally from chess that converts pairwise wins between players into a single skill number. Chatbot Arena uses it to rank LLMs from anonymous user votes. Win Rate: Win rate is the share of pairwise comparisons one candidate wins against another. The standard scalar for "model A is better than model B" in modern LLM evaluation.
When should I use ELO Rating vs Win Rate?
ELO Rating is the right concept when you are focused on evaluation. Win Rate applies when you are focused on evaluation.
Are ELO Rating and Win Rate the same thing?
No. ELO Rating is evaluation; Win Rate is evaluation. They are related but address different parts of the AI stack.