Comparison
Learning Rate vs Pretraining
Learning Rate and Pretraining are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.
When you would reach for Learning Rate
Learning Rate comes up when the question is fundamentally about training.
Pretraining: peak LR around 1e-4 with cosine decay.
When you would reach for Pretraining
Pretraining comes up when the question is fundamentally about training.
GPT-3 pretrained on ~300B tokens.
Frequently asked
What is the difference between Learning Rate and Pretraining?
Learning Rate: The learning rate is the step size used to update weights during training. Too high and training diverges; too low and it crawls or gets stuck. Pretraining: Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.
When should I use Learning Rate vs Pretraining?
Learning Rate is the right concept when you are focused on training. Pretraining applies when you are focused on training.
Are Learning Rate and Pretraining the same thing?
No. Learning Rate is training; Pretraining is training. They are related but address different parts of the AI stack.