Skip to main content
ModelTerms

Agents & Tools · intermediate

Reranker (re-ranking, reranking)

A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

Explanation

Vector retrieval is fast but imprecise — it scores each document independently of the query during indexing. A reranker takes (query, candidate) pairs and computes a joint score, catching nuances the bi-encoder embedding missed.

Typical pipeline: retrieve top 50 with vector + BM25 hybrid, then rerank to top 5 with a cross-encoder. The reranker runs ~50 forward passes — cheap compared to the LLM call that follows — and consistently lifts answer quality by 5-15 points on RAG benchmarks.

Common rerankers: Cohere Rerank (proprietary, very strong), BGE Reranker (open, small), Voyage Rerank, ColBERT (a different architecture entirely).

Adding a reranker is the single highest-leverage RAG improvement after fixing chunking.

Examples

  • A hybrid-search RAG returns 50 candidates; Cohere Rerank trims to the 5 most relevant; faithfulness score jumps from 0.68 to 0.81.
  • BGE-rerank-large-v2 used in an offline pipeline to clean up top-K results before LLM call.

When to use reranker

After chunking is sorted, before model upgrades. Reranking is consistently the best dollar-for-quality RAG investment.

Frequently asked

What is Reranker?

A reranker is a second-pass scoring model that takes the top-K retrieved candidates and reorders them by joint relevance to the query. Typically a cross-encoder; dramatically improves retrieval precision at low cost.

What is an example of reranker?

A hybrid-search RAG returns 50 candidates; Cohere Rerank trims to the 5 most relevant; faithfulness score jumps from 0.68 to 0.81.

How is Reranker related to Retrieval-Augmented Generation?

Reranker and Retrieval-Augmented Generation are both agents & tools concepts. RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

When should I use reranker?

After chunking is sorted, before model upgrades. Reranking is consistently the best dollar-for-quality RAG investment.

Is Reranker considered intermediate?

Reranker is generally considered intermediate-level material in the AI and LLM space.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

Cross-EncoderAgents & Tools

A cross-encoder takes a (query, document) pair as joint input and outputs a single relevance score. Slower than the bi-encoders used for dense retrieval but much more accurate — the standard reranker architecture.

Hybrid SearchAgents & Tools

Hybrid search combines vector (semantic) and keyword (BM25) retrieval and fuses their results — usually via Reciprocal Rank Fusion — to get the best of both: semantic recall and exact-match precision.

EmbeddingArchitecture

An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

Semantic SearchAgents & Tools

Semantic search ranks documents by meaning rather than keyword match, using embedding similarity. "Affordable laptops" can match "cheap notebooks" even with no overlapping words.

ChunkingAgents & Tools

Chunking is the process of splitting source documents into smaller passages before embedding them for retrieval. Chunk size and boundaries control how relevant retrievals will be.

Side-by-side comparisons

Sources