Architecture · advanced
Mamba (state-space model, SSM)
Mamba is a state-space model architecture that replaces transformer attention with selective state updates. It scales linearly with sequence length and matches transformer quality on many tasks.
Explanation
Standard attention is O(n²) in sequence length. Mamba processes a sequence as a recurrence with a learned, input-dependent state, achieving O(n) compute and constant memory per token at inference — appealing for very long contexts and edge deployment.
Mamba and its descendants (Mamba-2, Jamba — a hybrid Mamba+attention model) are the most prominent post-transformer architectures with real production traction. Most frontier labs still ship transformers, but hybrid designs are increasingly common.
The bet: as context windows stretch to millions of tokens, sub-quadratic attention becomes essential.
Examples
- Mamba-2 reaching transformer-equivalent quality at the 1B-2B scale.
- Jamba (AI21) combining Mamba blocks with sparse attention.
Frequently asked
What is Mamba?
Mamba is a state-space model architecture that replaces transformer attention with selective state updates. It scales linearly with sequence length and matches transformer quality on many tasks.
What is an example of mamba?
Mamba-2 reaching transformer-equivalent quality at the 1B-2B scale.
How is Mamba related to Transformer?
Mamba and Transformer are both architecture concepts. The transformer is the neural network architecture behind virtually every modern large language model. It uses self-attention to model relationships between all positions in a sequence in parallel.
Is Mamba considered advanced?
Mamba is generally considered advanced-level material in the AI and LLM space.