Skip to main content
ModelTerms

Training · intermediate

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

Explanation

Instead of updating an LLM's billions of weights, LoRA adds tiny "adapter" matrices A and B (with low rank r, often 8-64) to each attention layer. Only these adapters are trained, and the original weights stay frozen. Adapter sizes are typically under 1% of the full model.

The advantages: lower memory, faster training, and the ability to swap adapters at runtime to switch behaviors without reloading the whole model. QLoRA combines LoRA with 4-bit quantization to enable fine-tuning of 70B+ models on a single consumer GPU.

LoRA adapters can be merged back into the base weights for deployment, or kept separate for multi-tenant serving.

Examples

  • Fine-tuning Llama-3-8B for a domain on a single A100 with LoRA.
  • QLoRA fine-tuning Llama-3-70B on one 48 GB GPU.

When to use lora

When full fine-tuning is too expensive or you want swappable specialized adapters.

Frequently asked

What is LoRA?

LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

What is an example of lora?

Fine-tuning Llama-3-8B for a domain on a single A100 with LoRA.

How is LoRA related to Fine-tuning?

LoRA and Fine-tuning are both training concepts. Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

When should I use lora?

When full fine-tuning is too expensive or you want swappable specialized adapters.

Is LoRA considered intermediate?

LoRA is generally considered intermediate-level material in the AI and LLM space.

Side-by-side comparisons

Sources