Architecture · beginner
Parameter Count (model size)
Parameter count is the total number of learnable weights in a model — "7B" means 7 billion parameters. It is the most cited model-size metric, though not always the most informative.
Explanation
Each weight is a number adjusted during training. Today's open models span 1B-400B+ parameters; frontier closed models are presumed to be larger, often as Mixture-of-Experts where the "total" and "active" counts differ.
Parameter count is correlated with capability but only roughly. A well-trained 70B can beat a poorly-trained 175B. The Chinchilla finding showed many older "huge" models were undertrained — too many parameters per token.
For inference, parameters directly determine memory: a 70B BF16 model needs ~140 GB of GPU memory just to hold weights.
Examples
- Llama 3 family: 8B, 70B, 405B.
- Mixtral 8x7B: 47B total / ~13B active per token.
Frequently asked
What is Parameter Count?
Parameter count is the total number of learnable weights in a model — "7B" means 7 billion parameters. It is the most cited model-size metric, though not always the most informative.
What is an example of parameter count?
Llama 3 family: 8B, 70B, 405B.
How is Parameter Count related to Large Language Model?
Parameter Count and Large Language Model are both architecture concepts. A large language model is a neural network trained on huge amounts of text to predict the next token in a sequence. GPT-4, Claude, and Gemini are all LLMs.
Is Parameter Count considered beginner?
Parameter Count is generally considered beginner-level material in the AI and LLM space.