Comparison

Data Contamination vs HumanEval

Data Contamination and HumanEval are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Data Contamination

Data Contamination comes up when the question is fundamentally about evaluation.

MMLU questions appearing verbatim in pretraining data crawls.

When you would reach for HumanEval

HumanEval comes up when the question is fundamentally about evaluation.

GPT-4: ~88% pass@1 on HumanEval.

Frequently asked

What is the difference between Data Contamination and HumanEval?

Data Contamination: Data contamination is when benchmark questions or answers leak into a model's pretraining corpus, inflating its score because it memorized rather than reasoned. HumanEval: HumanEval is a benchmark of 164 hand-written Python programming problems, each with a function signature, docstring, and unit tests. The model writes the function body.

When should I use Data Contamination vs HumanEval?

Data Contamination is the right concept when you are focused on evaluation. HumanEval applies when you are focused on evaluation.

Are Data Contamination and HumanEval the same thing?

No. Data Contamination is evaluation; HumanEval is evaluation. They are related but address different parts of the AI stack.