Skip to main content
ModelTerms

Architecture · intermediate

Embedding (vector embedding)

An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

Explanation

Inside an LLM, every token is converted to a vector via an embedding lookup table at the very first layer. Each layer then transforms these vectors. The final vectors carry the model's "understanding" of the input.

Separately, embedding models are encoders trained specifically to produce useful sentence- or document-level vectors. These power semantic search and retrieval-augmented generation: encode your documents once, encode a query, return the documents whose vectors are closest.

Embedding dimensionality is typically 256-4096; closeness is usually measured by cosine similarity.

Examples

  • OpenAI's text-embedding-3-large produces 3,072-dim vectors.
  • "king" - "man" + "woman" approximately equals "queen" in classic word2vec embeddings.
  • Semantic search: nearest neighbors in embedding space.

Frequently asked

What is Embedding?

An embedding is a list of numbers (a vector) that represents a piece of input — a word, a sentence, an image — in a space where similar things end up close together.

What is an example of embedding?

OpenAI's text-embedding-3-large produces 3,072-dim vectors.

How is Embedding related to Vector Database?

Embedding and Vector Database are both architecture concepts. A vector database stores high-dimensional embeddings and answers "find the K nearest vectors to this query" extremely fast. The retrieval engine behind most RAG systems.

Is Embedding considered intermediate?

Embedding is generally considered intermediate-level material in the AI and LLM space.

Vector DatabaseAgents & Tools

A vector database stores high-dimensional embeddings and answers "find the K nearest vectors to this query" extremely fast. The retrieval engine behind most RAG systems.

Semantic SearchAgents & Tools

Semantic search ranks documents by meaning rather than keyword match, using embedding similarity. "Affordable laptops" can match "cheap notebooks" even with no overlapping words.

Retrieval-Augmented GenerationAgents & Tools

RAG retrieves relevant documents from a corpus at query time and includes them in the prompt, letting an LLM answer with up-to-date, source-cited, private information without retraining.

TransformerArchitecture

The transformer is the neural network architecture behind virtually every modern large language model. It uses self-attention to model relationships between all positions in a sequence in parallel.

TokenInference

A token is the basic unit an LLM reads and writes — usually a word piece (3-4 characters). LLMs are priced and sized by tokens, not words.

Embedding DriftInfrastructure

Embedding drift is a specific kind of drift detection — comparing the distribution of input or response embeddings between two time windows to surface semantic shifts that simple statistics would miss.

Semantic ChunkingAgents & Tools

Semantic chunking embeds each sentence and inserts a chunk boundary wherever consecutive embeddings diverge sharply — producing chunks that respect topic boundaries rather than character counts.

Hybrid SearchAgents & Tools

Hybrid search combines vector (semantic) and keyword (BM25) retrieval and fuses their results — usually via Reciprocal Rank Fusion — to get the best of both: semantic recall and exact-match precision.

Side-by-side comparisons

Sources