Infrastructure · intermediate

Model Router (LLM router)

A model router picks the cheapest model that's likely to handle a given request well — based on a small classifier, embedding similarity, or rule-based filters — so you don't pay frontier prices for trivial queries.

Published May 31, 2026

Explanation

Not every query needs your most expensive model. "What time is it?" doesn't justify Opus or GPT-4. A router classifies incoming queries by difficulty, topic, or required capability and dispatches to a small/cheap model for simple ones, escalating to frontier only when needed.

Approaches: train a tiny classifier on past (query, "did smallest model work?") labels; use embedding similarity to past escalations; use an LLM call to a Haiku-tier model that explicitly returns a routing decision; or rule-based on prompt length, presence of code, or detected complexity.

Cost wins are large — 50-80% reduction for chat workloads is common — at the cost of building and maintaining the router.

Examples

A support bot routing FAQ-style queries to Haiku ($0.25/Mtok) and complex multi-step ones to Sonnet ($3/Mtok); avg cost drops 70%.
RouteLLM (BAIR) open-source routers trained on Arena vote data.

When to use model router

Cost-sensitive applications with diverse query difficulty. Skip for narrow, uniformly hard workloads.

Frequently asked

What is Model Router?

What is an example of model router?

A support bot routing FAQ-style queries to Haiku ($0.25/Mtok) and complex multi-step ones to Sonnet ($3/Mtok); avg cost drops 70%.

How is Model Router related to LLM Gateway?

Model Router and LLM Gateway are both infrastructure concepts. An LLM gateway is a proxy layer that sits between application code and one or more LLM providers — handling auth, rate-limit retries, cost tracking, observability, prompt caching, model routing, and PII redaction.

When should I use model router?

Cost-sensitive applications with diverse query difficulty. Skip for narrow, uniformly hard workloads.

Is Model Router considered intermediate?

Model Router is generally considered intermediate-level material in the AI and LLM space.

LLM GatewayInfrastructure

An LLM gateway is a proxy layer that sits between application code and one or more LLM providers — handling auth, rate-limit retries, cost tracking, observability, prompt caching, model routing, and PII redaction.

InferenceInference

Inference is what happens when you actually run a trained model on new input. For LLMs that means generating tokens one at a time, with sampling and a KV cache.

LLM-as-JudgeEvaluation

LLM-as-judge uses a strong LLM to score or compare outputs from other LLMs. It is how most production teams evaluate quality at scale when human review is too slow.

Large Language ModelFoundations

A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

Side-by-side comparisons

Sources

RouteLLM paper (arXiv)