Foundations · intermediate

Foundation Model

A foundation model is a single large model pretrained on broad data that can be adapted to many downstream tasks. LLMs are the most common type.

Published May 29, 2026

Explanation

The term, coined by Stanford's CRFM in 2021, describes a shift in AI economics: instead of training a separate model per task, organizations train one very large general-purpose model and then fine-tune or prompt it for everything else.

Foundation models are usually trained on internet-scale data, cost millions of dollars to train, and are reused across thousands of applications. The same Claude or GPT model that answers customer support emails also writes code, summarizes meetings, and drafts marketing copy.

LLMs are the most common foundation models, but the term also covers multimodal models (vision + language), code models, and some image generators.

Examples

GPT-4 used by tens of thousands of applications via API.
Llama 3 as the foundation for many open-source fine-tunes.
CLIP as a vision-language foundation model.

Frequently asked

What is Foundation Model?

A foundation model is a single large model pretrained on broad data that can be adapted to many downstream tasks. LLMs are the most common type.

What is an example of foundation model?

GPT-4 used by tens of thousands of applications via API.

How is Foundation Model related to Large Language Model?

Foundation Model and Large Language Model are both foundations concepts. A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

Is Foundation Model considered intermediate?

Foundation Model is generally considered intermediate-level material in the AI and LLM space.

Large Language ModelFoundations

A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

Fine-tuningTraining

Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

MultimodalMultimodal

A multimodal model processes more than one type of input — typically text plus images, sometimes adding audio, video, or 3D. GPT-4o, Claude, and Gemini are all multimodal.

Side-by-side comparisons

Sources

Stanford CRFM — On the Opportunities and Risks of Foundation Models