Skip to main content
ModelTerms

Training · intermediate

Fine-tuning

Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

Explanation

Pretraining gives you a generally-capable base model. Fine-tuning specializes it: on instruction-following data (instruction tuning), on a particular domain (medical, legal), or on a particular style (a brand voice, a fictional character).

Full fine-tuning updates every parameter and requires significant GPU memory. Parameter-efficient fine-tuning methods like LoRA freeze the original weights and train only a small number of additional parameters, making fine-tuning cheap enough to run on a single GPU for many models.

Fine-tuning is most useful when prompting cannot get you reliably to the behavior you want — for instance, very specific output formats, niche jargon, or consistent persona.

Examples

  • Fine-tuning Llama 3 on medical Q&A for a clinical assistant.
  • LoRA fine-tuning to teach a model a company's internal vocabulary.

When to use fine-tuning

After you've exhausted prompting and retrieval, and you have a few hundred to thousands of clean labeled examples.

Frequently asked

What is Fine-tuning?

Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

What is an example of fine-tuning?

Fine-tuning Llama 3 on medical Q&A for a clinical assistant.

How is Fine-tuning related to Pretraining?

Fine-tuning and Pretraining are both training concepts. Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

When should I use fine-tuning?

After you've exhausted prompting and retrieval, and you have a few hundred to thousands of clean labeled examples.

Is Fine-tuning considered intermediate?

Fine-tuning is generally considered intermediate-level material in the AI and LLM space.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

LoRATraining

LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

Instruction TuningTraining

Instruction tuning is fine-tuning on examples of (instruction, desired response) pairs so a base model learns to follow natural-language directions.

Supervised Fine-TuningTraining

SFT is fine-tuning where each training example has an explicit input and a desired output, supervised by a loss that penalizes deviation from that output.

Reinforcement Learning from Human FeedbackTraining

RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.

DistillationTraining

Distillation trains a smaller "student" model to imitate the outputs of a larger "teacher" model. The student becomes much cheaper to run while retaining much of the teacher's quality.

QLoRATraining

QLoRA fine-tunes a 4-bit quantized base model with LoRA adapters, letting you train 70B-class models on a single 48 GB GPU at near-full fine-tuning quality.

Learning RateTraining

The learning rate is the step size used to update weights during training. Too high and training diverges; too low and it crawls or gets stuck.

Side-by-side comparisons

Sources