Inference · beginner
Context Window (context length)
The context window is the maximum number of tokens an LLM can consider in a single call — prompt plus generated output combined.
Explanation
Early LLMs had context windows of 2K-4K tokens. Modern frontier models offer 128K (GPT-4o, Claude Sonnet) up to 1M+ (Gemini 1.5 Pro, Claude with extended context). Larger windows let you stuff in whole books, long codebases, or extensive conversation histories.
Two practical caveats: longer context costs more (linearly with input tokens) and degrades quality (the "lost in the middle" effect — models often pay less attention to information buried deep in long inputs).
Window extension techniques include longer RoPE scaling, FlashAttention, and retrieval-augmented setups that fetch only the relevant chunks.
Examples
- GPT-4o: 128K context.
- Claude Sonnet: 200K context.
- Gemini 1.5 Pro: 1M+ context.
Frequently asked
What is Context Window?
The context window is the maximum number of tokens an LLM can consider in a single call — prompt plus generated output combined.
What is an example of context window?
GPT-4o: 128K context.
How is Context Window related to Token?
Context Window and Token are both inference concepts. A token is the basic unit an LLM reads and writes — usually a word piece (3-4 characters). LLMs are priced and sized by tokens, not words.
Is Context Window considered beginner?
Context Window is generally considered beginner-level material in the AI and LLM space.