Architecture · intermediate

Encoder

An encoder is a transformer module that reads an input sequence and produces a contextualized representation — a vector per token that captures meaning in context.

Published May 29, 2026

Explanation

Encoders use bidirectional self-attention: each token can see every other token in the input. That makes them strong at understanding tasks (classification, retrieval, embeddings) where you have the whole input up front but no need to generate text autoregressively.

BERT is the famous encoder-only model. Embedding models like OpenAI's text-embedding-3 are also encoders. Encoder outputs are typically pooled into a single vector representing the whole input.

Examples

BERT classifying a sentence as positive or negative.
A sentence embedding model encoding a query for vector search.

Frequently asked

What is Encoder?

An encoder is a transformer module that reads an input sequence and produces a contextualized representation — a vector per token that captures meaning in context.

What is an example of encoder?

BERT classifying a sentence as positive or negative.

How is Encoder related to Decoder?

Encoder and Decoder are both architecture concepts. A decoder is a transformer module that generates a sequence one token at a time, using causal self-attention so each token only sees earlier ones. GPT-style LLMs are decoder-only.

Is Encoder considered intermediate?

Encoder is generally considered intermediate-level material in the AI and LLM space.

DecoderArchitecture

A decoder is a transformer module that generates a sequence one token at a time, using causal self-attention so each token only sees earlier ones. GPT-style LLMs are decoder-only.

Encoder-DecoderArchitecture

An encoder-decoder model has a separate encoder that reads the input and a decoder that generates the output, with cross-attention linking them. T5 and the original transformer are encoder-decoders.

TransformerArchitecture

The transformer is the neural network architecture behind virtually every modern large language model. It uses self-attention to model relationships between all positions in a sequence in parallel.

EmbeddingArchitecture

An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

Side-by-side comparisons

Sources

BERT paper (arXiv)