Skip to main content
ModelTerms

Infrastructure · advanced

Drift Detection (model drift, data drift)

Drift detection watches for changes in the statistical distribution of inputs, outputs, or quality scores over time — so you can catch a model degrading in production before users complain.

Explanation

An LLM that was great at launch can degrade silently as user behavior shifts, your prompts drift through edits, or an upstream model provider quietly changes their weights. Drift detection sets a baseline and alerts when current production traffic looks materially different.

Three drift types matter for LLMs. **Input drift**: users are asking different things — new topics, new languages, longer prompts. **Output drift**: the model is producing different distributions of responses — more refusals, more JSON parse failures, more code blocks. **Quality drift**: LLM-as-judge faithfulness scores or human ratings are dropping vs the historical mean.

Detection methods include population stability index (PSI), KL divergence between embedding distributions, and rolling-window quality-metric comparison.

Arize specifically built its first product around this exact problem in classic ML and extended it to embeddings and LLM outputs.

Examples

  • A support bot's P50 latency jumps 30%: a drift alert fires on the input distribution, revealing users started pasting in entire emails instead of short queries.
  • Faithfulness score on the last 7 days is down 8 points vs. the trailing 28 days — drift signal triggers an investigation.

When to use drift detection

Once you have stable traffic and a baseline; drift detection is most valuable in mature production systems.

Frequently asked

What is Drift Detection?

Drift detection watches for changes in the statistical distribution of inputs, outputs, or quality scores over time — so you can catch a model degrading in production before users complain.

What is an example of drift detection?

A support bot's P50 latency jumps 30%: a drift alert fires on the input distribution, revealing users started pasting in entire emails instead of short queries.

How is Drift Detection related to LLM Observability?

Drift Detection and LLM Observability are both infrastructure concepts. LLM observability is the practice of capturing, analyzing, and acting on every LLM call in a production system — inputs, outputs, latencies, costs, errors, and quality scores — so you can debug regressions and improve quality over time.

When should I use drift detection?

Once you have stable traffic and a baseline; drift detection is most valuable in mature production systems.

Is Drift Detection considered advanced?

Drift Detection is generally considered advanced-level material in the AI and LLM space.

LLM ObservabilityInfrastructure

LLM observability is the practice of capturing, analyzing, and acting on every LLM call in a production system — inputs, outputs, latencies, costs, errors, and quality scores — so you can debug regressions and improve quality over time.

Embedding DriftInfrastructure

Embedding drift is a specific kind of drift detection — comparing the distribution of input or response embeddings between two time windows to surface semantic shifts that simple statistics would miss.

Online EvaluationEvaluation

Online evaluation runs scoring functions over live production traffic — usually a sample of recent traces — to monitor quality continuously instead of relying solely on a fixed offline dataset.

TracingInfrastructure

Tracing captures the full causal tree of an LLM request — the user input, retrieval calls, tool calls, intermediate prompts, and the final response — as a hierarchy of timed spans you can replay and inspect.

Side-by-side comparisons

Sources