Skip to main content
ModelTerms

Infrastructure · intermediate

Model Router (LLM router)

A model router picks the cheapest model that's likely to handle a given request well — based on a small classifier, embedding similarity, or rule-based filters — so you don't pay frontier prices for trivial queries.

Explanation

Not every query needs your most expensive model. "What time is it?" doesn't justify Opus or GPT-4. A router classifies incoming queries by difficulty, topic, or required capability and dispatches to a small/cheap model for simple ones, escalating to frontier only when needed.

Approaches: train a tiny classifier on past (query, "did smallest model work?") labels; use embedding similarity to past escalations; use an LLM call to a Haiku-tier model that explicitly returns a routing decision; or rule-based on prompt length, presence of code, or detected complexity.

Cost wins are large — 50-80% reduction for chat workloads is common — at the cost of building and maintaining the router.

Examples

  • A support bot routing FAQ-style queries to Haiku ($0.25/Mtok) and complex multi-step ones to Sonnet ($3/Mtok); avg cost drops 70%.
  • RouteLLM (BAIR) open-source routers trained on Arena vote data.

When to use model router

Cost-sensitive applications with diverse query difficulty. Skip for narrow, uniformly hard workloads.

Frequently asked

What is Model Router?

A model router picks the cheapest model that's likely to handle a given request well — based on a small classifier, embedding similarity, or rule-based filters — so you don't pay frontier prices for trivial queries.

What is an example of model router?

A support bot routing FAQ-style queries to Haiku ($0.25/Mtok) and complex multi-step ones to Sonnet ($3/Mtok); avg cost drops 70%.

How is Model Router related to LLM Gateway?

Model Router and LLM Gateway are both infrastructure concepts. An LLM gateway is a proxy layer that sits between application code and one or more LLM providers — handling auth, rate-limit retries, cost tracking, observability, prompt caching, model routing, and PII redaction.

When should I use model router?

Cost-sensitive applications with diverse query difficulty. Skip for narrow, uniformly hard workloads.

Is Model Router considered intermediate?

Model Router is generally considered intermediate-level material in the AI and LLM space.

Side-by-side comparisons

Sources