Foundations · beginner

Large Language Model (LLM)

A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

Published May 29, 2026

Explanation

An LLM's only real job during training is to predict the next token (roughly, a word-piece) given everything before it. Trained on trillions of tokens of internet text, code, and books, this simple objective produces models that can answer questions, write code, summarize documents, and hold conversations.

"Large" in 2026 typically means tens to hundreds of billions of parameters, though smaller models (1-10B) have become surprisingly capable thanks to better training techniques and curated data.

After pretraining on raw text, most production LLMs go through instruction tuning and RLHF to make them helpful and harmless in dialogue settings.

Examples

Claude Sonnet — Anthropic's general-purpose LLM.
GPT-4o — OpenAI's multimodal LLM.
Llama 3 — Meta's open-weights LLM family.

Frequently asked

What is Large Language Model?

A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

What is an example of large language model?

Claude Sonnet — Anthropic's general-purpose LLM.

How is Large Language Model related to Transformer?

Large Language Model and Transformer are both foundations concepts. The transformer is the neural network architecture behind virtually every modern large language model. It uses self-attention to model relationships between all positions in a sequence in parallel.

Is Large Language Model considered beginner?

Large Language Model is generally considered beginner-level material in the AI and LLM space.

TransformerArchitecture

The transformer is the neural network architecture behind virtually every modern large language model. It uses self-attention to model relationships between all positions in a sequence in parallel.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

TokenInference

A token is the basic unit an LLM reads and writes — usually a word piece (3-4 characters). LLMs are priced and sized by tokens, not words.

Reinforcement Learning from Human FeedbackTraining

RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.

Fine-tuningTraining

Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

Foundation ModelFoundations

A foundation model is a single large model pretrained on broad data that can be adapted to many downstream tasks. LLMs are the most common type.

Side-by-side comparisons

Sources

Wikipedia — Large language model