Comparison

Distillation vs Synthetic Data

Distillation and Synthetic Data are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Distillation

Distillation comes up when the question is fundamentally about training.

DistilBERT: a 6-layer student of 12-layer BERT, 60% the size, 95%+ of the performance.

When you would reach for Synthetic Data

Synthetic Data comes up when the question is fundamentally about training.

Phi-3 trained heavily on textbook-quality synthetic data.

Frequently asked

What is the difference between Distillation and Synthetic Data?

Distillation: Distillation trains a smaller "student" model to imitate the outputs of a larger "teacher" model. The student becomes much cheaper to run while retaining much of the teacher's quality. Synthetic Data: Synthetic data is training data produced by a model — instructions distilled from GPT-4, code generated and filtered by tests, reasoning traces sampled from a stronger model — rather than handwritten by humans.

When should I use Distillation vs Synthetic Data?

Distillation is the right concept when you are focused on training. Synthetic Data applies when you are focused on training.

Are Distillation and Synthetic Data the same thing?

No. Distillation is training; Synthetic Data is training. They are related but address different parts of the AI stack.