Inference · beginner
Long-Context Model
A long-context model accepts very long inputs — 100K+ tokens, in some cases millions. Claude (200K), GPT-4o (128K), and Gemini 1.5 Pro (1M+) are current examples.
Explanation
Until 2022, 2K-8K tokens was standard. RoPE scaling, FlashAttention, KV-cache tricks, and sparse attention variants unlocked context windows that fit whole books, code repos, or hours of meeting transcripts.
Longer contexts let you skip retrieval ("just paste the docs"), do whole-codebase reasoning, and run agents with deep histories. Costs scale linearly: a 1M-token prompt costs 1M-token's worth of input pricing.
Quality degrades — the "lost in the middle" effect — and benchmarks like needle-in-haystack and RULER measure how reliably the model uses the full window.
Examples
- Claude Sonnet: 200K-token context — about 500 pages.
- Gemini 1.5 Pro: 1M-2M tokens, enough for a whole movie of frames.
When to use long-context model
When the inputs genuinely need to fit together and chunking + retrieval would lose context.
Frequently asked
What is Long-Context Model?
A long-context model accepts very long inputs — 100K+ tokens, in some cases millions. Claude (200K), GPT-4o (128K), and Gemini 1.5 Pro (1M+) are current examples.
What is an example of long-context model?
Claude Sonnet: 200K-token context — about 500 pages.
How is Long-Context Model related to Context Window?
Long-Context Model and Context Window are both inference concepts. The context window is the maximum number of tokens an LLM can consider in a single call — prompt plus generated output combined.
When should I use long-context model?
When the inputs genuinely need to fit together and chunking + retrieval would lose context.
Is Long-Context Model considered beginner?
Long-Context Model is generally considered beginner-level material in the AI and LLM space.