Inference · beginner

Token

A token is the basic unit an LLM reads and writes — usually a word piece (3-4 characters). LLMs are priced and sized by tokens, not words.

Published May 29, 2026

Explanation

Before text reaches the model, it is tokenized into a sequence of integers via a vocabulary of 30,000-200,000 tokens. Common English words tend to be a single token; rare words or punctuation get split into multiple tokens.

A rough rule of thumb for English: 1 token equals about 4 characters or 0.75 words. So a 1,000-word essay is roughly 1,300 tokens. Non-English languages and code can use noticeably more tokens per equivalent meaning.

The total tokens (input + output) you can feed in a single call is the context window. API pricing is per million tokens for both input and output.

Examples

"Hello, world!" tokenizes to roughly 4 GPT-4o tokens.
A 100K-token document is roughly a 300-page book.

When to use token

Always think in tokens, not characters, when planning prompts, budgets, and context windows.

Frequently asked

What is Token?

A token is the basic unit an LLM reads and writes — usually a word piece (3-4 characters). LLMs are priced and sized by tokens, not words.

What is an example of token?

"Hello, world!" tokenizes to roughly 4 GPT-4o tokens.

How is Token related to Tokenization?

Token and Tokenization are both inference concepts. Tokenization is the process of splitting raw text into the discrete tokens an LLM consumes. Most modern LLMs use a learned byte-pair-encoding (BPE) tokenizer.

When should I use token?

Always think in tokens, not characters, when planning prompts, budgets, and context windows.

Is Token considered beginner?

Token is generally considered beginner-level material in the AI and LLM space.

TokenizationInference

Tokenization is the process of splitting raw text into the discrete tokens an LLM consumes. Most modern LLMs use a learned byte-pair-encoding (BPE) tokenizer.

Context WindowInference

The context window is the maximum number of tokens an LLM can consider in a single call — prompt plus generated output combined.

PromptPrompting

A prompt is the text you send to an LLM to elicit a response. It typically includes a system message, optional examples, and the user's query.

Large Language ModelFoundations

A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.

Side-by-side comparisons

Sources

OpenAI Tokenizer