Training · intermediate

Synthetic Data

Synthetic data is training data produced by a model — instructions distilled from GPT-4, code generated and filtered by tests, reasoning traces sampled from a stronger model — rather than handwritten by humans.

Published May 30, 2026

Explanation

Modern post-training relies heavily on synthetic data. Smaller open models (Llama-3-Instruct, Phi, Gemma) are often fine-tuned on outputs from much larger teachers, sometimes filtered for correctness against verifiers.

The economics: a fixed budget that buys 10K human-written examples might buy millions of synthetic ones at GPT-4 inference prices. As long as quality remains high (filtering, ranking, deduplication), the math wins.

Risks: model collapse if you train on too much of your own output, and inherited blind spots from the teacher.

Examples

Phi-3 trained heavily on textbook-quality synthetic data.
Tulu-3 post-training mixes synthetic instructions with human-written data.

Frequently asked

What is Synthetic Data?

What is an example of synthetic data?

Phi-3 trained heavily on textbook-quality synthetic data.

How is Synthetic Data related to Distillation?

Synthetic Data and Distillation are both training concepts. Distillation trains a smaller "student" model to imitate the outputs of a larger "teacher" model. The student becomes much cheaper to run while retaining much of the teacher's quality.

Is Synthetic Data considered intermediate?

Synthetic Data is generally considered intermediate-level material in the AI and LLM space.

DistillationTraining

Distillation trains a smaller "student" model to imitate the outputs of a larger "teacher" model. The student becomes much cheaper to run while retaining much of the teacher's quality.

Fine-tuningTraining

Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

Preference DataTraining

Preference data is collections of (chosen, rejected) response pairs over the same prompt. It is the fuel for DPO and reward-model training.

Reinforcement Learning from Human FeedbackTraining

RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.

Side-by-side comparisons

Sources

Phi-3 technical report (arXiv)