Skip to main content
ModelTerms

Architecture · advanced

Encoder-Decoder (seq2seq)

An encoder-decoder model has a separate encoder that reads the input and a decoder that generates the output, with cross-attention linking them. T5 and the original transformer are encoder-decoders.

Explanation

This architecture maps cleanly onto translation: encode the source sentence, then decode in the target language. Cross-attention in the decoder attends to the encoder's representation of the input.

Encoder-decoder models tend to be stronger than decoder-only at tasks with a clear input-to-output structure (translation, summarization), but decoder-only models have largely taken over because they're simpler and benefit more from massive pretraining scale.

Examples

  • T5: every NLP task framed as text-to-text.
  • Original transformer: machine translation.

Frequently asked

What is Encoder-Decoder?

An encoder-decoder model has a separate encoder that reads the input and a decoder that generates the output, with cross-attention linking them. T5 and the original transformer are encoder-decoders.

What is an example of encoder-decoder?

T5: every NLP task framed as text-to-text.

How is Encoder-Decoder related to Encoder?

Encoder-Decoder and Encoder are both architecture concepts. An encoder is a transformer module that reads an input sequence and produces a contextualized representation — a vector per token that captures meaning in context.

Is Encoder-Decoder considered advanced?

Encoder-Decoder is generally considered advanced-level material in the AI and LLM space.

Side-by-side comparisons

Sources