Architecture · advanced
Encoder-Decoder (seq2seq)
An encoder-decoder model has a separate encoder that reads the input and a decoder that generates the output, with cross-attention linking them. T5 and the original transformer are encoder-decoders.
Explanation
This architecture maps cleanly onto translation: encode the source sentence, then decode in the target language. Cross-attention in the decoder attends to the encoder's representation of the input.
Encoder-decoder models tend to be stronger than decoder-only at tasks with a clear input-to-output structure (translation, summarization), but decoder-only models have largely taken over because they're simpler and benefit more from massive pretraining scale.
Examples
- T5: every NLP task framed as text-to-text.
- Original transformer: machine translation.
Frequently asked
What is Encoder-Decoder?
An encoder-decoder model has a separate encoder that reads the input and a decoder that generates the output, with cross-attention linking them. T5 and the original transformer are encoder-decoders.
What is an example of encoder-decoder?
T5: every NLP task framed as text-to-text.
How is Encoder-Decoder related to Encoder?
Encoder-Decoder and Encoder are both architecture concepts. An encoder is a transformer module that reads an input sequence and produces a contextualized representation — a vector per token that captures meaning in context.
Is Encoder-Decoder considered advanced?
Encoder-Decoder is generally considered advanced-level material in the AI and LLM space.