Foundations · beginner
Large Language Model (LLM)
A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.
Explanation
An LLM's only real job during training is to predict the next token (roughly, a word-piece) given everything before it. Trained on trillions of tokens of internet text, code, and books, this simple objective produces models that can answer questions, write code, summarize documents, and hold conversations.
"Large" in 2026 typically means tens to hundreds of billions of parameters, though smaller models (1-10B) have become surprisingly capable thanks to better training techniques and curated data.
After pretraining on raw text, most production LLMs go through instruction tuning and RLHF to make them helpful and harmless in dialogue settings.
Examples
- Claude Sonnet — Anthropic's general-purpose LLM.
- GPT-4o — OpenAI's multimodal LLM.
- Llama 3 — Meta's open-weights LLM family.
Frequently asked
What is Large Language Model?
A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.
What is an example of large language model?
Claude Sonnet — Anthropic's general-purpose LLM.
How is Large Language Model related to Transformer?
Large Language Model and Transformer are both foundations concepts. The transformer is the neural network architecture behind virtually every modern large language model. It uses self-attention to model relationships between all positions in a sequence in parallel.
Is Large Language Model considered beginner?
Large Language Model is generally considered beginner-level material in the AI and LLM space.