Training · intermediate

LoRA (Low-Rank Adaptation)

LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

Published May 29, 2026

Explanation

Instead of updating an LLM's billions of weights, LoRA adds tiny "adapter" matrices A and B (with low rank r, often 8-64) to each attention layer. Only these adapters are trained, and the original weights stay frozen. Adapter sizes are typically under 1% of the full model.

The advantages: lower memory, faster training, and the ability to swap adapters at runtime to switch behaviors without reloading the whole model. QLoRA combines LoRA with 4-bit quantization to enable fine-tuning of 70B+ models on a single consumer GPU.

LoRA adapters can be merged back into the base weights for deployment, or kept separate for multi-tenant serving.

Examples

Fine-tuning Llama-3-8B for a domain on a single A100 with LoRA.
QLoRA fine-tuning Llama-3-70B on one 48 GB GPU.

When to use lora

When full fine-tuning is too expensive or you want swappable specialized adapters.

Frequently asked

What is LoRA?

LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

What is an example of lora?

Fine-tuning Llama-3-8B for a domain on a single A100 with LoRA.

How is LoRA related to Fine-tuning?

LoRA and Fine-tuning are both training concepts. Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

When should I use lora?

When full fine-tuning is too expensive or you want swappable specialized adapters.

Is LoRA considered intermediate?

LoRA is generally considered intermediate-level material in the AI and LLM space.

Fine-tuningTraining

Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

QuantizationInfrastructure

Quantization reduces model weights from 16- or 32-bit floats to lower-precision types (INT8, INT4) so the model needs less memory and runs faster, usually with minor quality loss.

Instruction TuningTraining

Instruction tuning is fine-tuning on examples of (instruction, desired response) pairs so a base model learns to follow natural-language directions.

LoRA (Low-Rank Adaptation)

Explanation

Examples

When to use lora

Frequently asked

What is LoRA?

What is an example of lora?

How is LoRA related to Fine-tuning?

When should I use lora?

Is LoRA considered intermediate?

Side-by-side comparisons

Sources

Explanation

Examples

When to use lora

Frequently asked

What is LoRA?

What is an example of lora?

How is LoRA related to Fine-tuning?

When should I use lora?

Is LoRA considered intermediate?

Related terms

Side-by-side comparisons

Sources