Comparison

MMLU vs Perplexity

MMLU and Perplexity are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for MMLU

MMLU comes up when the question is fundamentally about evaluation.

GPT-4: 86.4% MMLU (5-shot, original release).

When you would reach for Perplexity

Perplexity comes up when the question is fundamentally about evaluation.

Perplexity 12 on WikiText is much better than perplexity 30.

Frequently asked

What is the difference between MMLU and Perplexity?

MMLU: MMLU is a benchmark of ~16K multiple-choice questions across 57 subjects from elementary to professional. It is one of the most widely cited LLM benchmarks. Perplexity: Perplexity measures how "surprised" a language model is by held-out text. Lower is better. It is the natural intrinsic eval for next-token prediction.

When should I use MMLU vs Perplexity?

MMLU is the right concept when you are focused on evaluation. Perplexity applies when you are focused on evaluation.

Are MMLU and Perplexity the same thing?

No. MMLU is evaluation; Perplexity is evaluation. They are related but address different parts of the AI stack.