Skip to main content
ModelTerms

Infrastructure · advanced

Embedding Drift

Embedding drift is a specific kind of drift detection — comparing the distribution of input or response embeddings between two time windows to surface semantic shifts that simple statistics would miss.

Explanation

If your users start asking about a new product feature, their text might look superficially similar (length, vocabulary) but live in a different region of embedding space. Comparing the centroid or distribution of embeddings between this week and the last 4 weeks catches semantic drift that token-level stats miss.

Typical workflow: embed every input and every output, store the vectors alongside traces, periodically compute distributional distance (Wasserstein, MMD, cosine to centroid) between time windows or between production and your eval set. Alert when the distance exceeds a threshold.

Embedding drift is also the technique behind "find similar production failures": embed the failing example, retrieve the nearest neighbors, prioritize fixing the bucket.

Examples

  • A weekly embedding-drift report surfaces that 12% of new traffic is in a region the eval set never covered.
  • After a marketing campaign, input embeddings cluster around a new topic; teams add eval cases for that topic.

Frequently asked

What is Embedding Drift?

Embedding drift is a specific kind of drift detection — comparing the distribution of input or response embeddings between two time windows to surface semantic shifts that simple statistics would miss.

What is an example of embedding drift?

A weekly embedding-drift report surfaces that 12% of new traffic is in a region the eval set never covered.

How is Embedding Drift related to Drift Detection?

Embedding Drift and Drift Detection are both infrastructure concepts. Drift detection watches for changes in the statistical distribution of inputs, outputs, or quality scores over time — so you can catch a model degrading in production before users complain.

Is Embedding Drift considered advanced?

Embedding Drift is generally considered advanced-level material in the AI and LLM space.

Side-by-side comparisons

Sources