Infrastructure · intermediate
Model Router (LLM router)
A model router picks the cheapest model that's likely to handle a given request well — based on a small classifier, embedding similarity, or rule-based filters — so you don't pay frontier prices for trivial queries.
Explanation
Not every query needs your most expensive model. "What time is it?" doesn't justify Opus or GPT-4. A router classifies incoming queries by difficulty, topic, or required capability and dispatches to a small/cheap model for simple ones, escalating to frontier only when needed.
Approaches: train a tiny classifier on past (query, "did smallest model work?") labels; use embedding similarity to past escalations; use an LLM call to a Haiku-tier model that explicitly returns a routing decision; or rule-based on prompt length, presence of code, or detected complexity.
Cost wins are large — 50-80% reduction for chat workloads is common — at the cost of building and maintaining the router.
Examples
- A support bot routing FAQ-style queries to Haiku ($0.25/Mtok) and complex multi-step ones to Sonnet ($3/Mtok); avg cost drops 70%.
- RouteLLM (BAIR) open-source routers trained on Arena vote data.
When to use model router
Cost-sensitive applications with diverse query difficulty. Skip for narrow, uniformly hard workloads.
Frequently asked
What is Model Router?
A model router picks the cheapest model that's likely to handle a given request well — based on a small classifier, embedding similarity, or rule-based filters — so you don't pay frontier prices for trivial queries.
What is an example of model router?
A support bot routing FAQ-style queries to Haiku ($0.25/Mtok) and complex multi-step ones to Sonnet ($3/Mtok); avg cost drops 70%.
How is Model Router related to LLM Gateway?
Model Router and LLM Gateway are both infrastructure concepts. An LLM gateway is a proxy layer that sits between application code and one or more LLM providers — handling auth, rate-limit retries, cost tracking, observability, prompt caching, model routing, and PII redaction.
When should I use model router?
Cost-sensitive applications with diverse query difficulty. Skip for narrow, uniformly hard workloads.
Is Model Router considered intermediate?
Model Router is generally considered intermediate-level material in the AI and LLM space.