Training · intermediate
Pretraining
Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.
Explanation
Pretraining is by far the most expensive phase: weeks to months on thousands of GPUs, costing tens of millions of dollars for frontier models. The data is usually a curated mix of web text, books, code, and academic papers, deduplicated and quality-filtered.
The objective is almost always next-token prediction (with masked-token prediction for encoder models). No human feedback is involved — the model just learns the statistical structure of language at scale.
The product of pretraining is the base model. It will autocomplete text fluently but is not yet tuned for following instructions or having helpful conversations — that comes in later stages.
Examples
- GPT-3 pretrained on ~300B tokens.
- Llama 3 pretrained on ~15T tokens.
Frequently asked
What is Pretraining?
Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.
What is an example of pretraining?
GPT-3 pretrained on ~300B tokens.
How is Pretraining related to Fine-tuning?
Pretraining and Fine-tuning are both training concepts. Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.
Is Pretraining considered intermediate?
Pretraining is generally considered intermediate-level material in the AI and LLM space.