Agents & Tools · advanced

Cross-Encoder

A cross-encoder takes a (query, document) pair as joint input and outputs a single relevance score. Slower than the bi-encoders used for dense retrieval but much more accurate — the standard reranker architecture.

Published May 31, 2026

Explanation

Bi-encoder (the typical embedding model): encode the query and each document independently into vectors, score by dot product. Fast — you can pre-encode the whole corpus. Cross-encoder: feed (query + document) into a single transformer pass, output one score. Slow — N forward passes per query, no pre-indexing — but every layer can attend across the boundary.

In practice cross-encoders are used as rerankers on top of bi-encoder retrieval. The bi-encoder filters to top-K, the cross-encoder reranks K. K is typically 25-100.

BERT-base cross-encoders trained on MS MARCO are a strong default. BGE-rerank and Cohere Rerank are modern productionized versions.

Examples

BGE-rerank-large taking 50 (query, doc) pairs and outputting 50 scores in ~100ms on a GPU.
A ColBERT model — a "late interaction" variant — that gives some bi-encoder speed with cross-encoder-like quality.

Frequently asked

What is Cross-Encoder?

What is an example of cross-encoder?

BGE-rerank-large taking 50 (query, doc) pairs and outputting 50 scores in ~100ms on a GPU.

How is Cross-Encoder related to Reranker?

Cross-Encoder and Reranker are both agents & tools concepts. A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

Is Cross-Encoder considered advanced?

Cross-Encoder is generally considered advanced-level material in the AI and LLM space.

RerankerAgents & Tools

A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

EmbeddingArchitecture

An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

Semantic SearchAgents & Tools

Semantic search ranks documents by meaning rather than keyword match, using embedding similarity. "Affordable laptops" can match "cheap notebooks" even with no overlapping words.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

Side-by-side comparisons

Sources

Sentence Transformers — Cross-encoders