Skip to main content
ModelTerms

Training · advanced

QLoRA

QLoRA fine-tunes a 4-bit quantized base model with LoRA adapters, letting you train 70B-class models on a single 48 GB GPU at near-full fine-tuning quality.

Explanation

LoRA freezes the base model and trains tiny adapter matrices; quantization shrinks the frozen weights to 4-bit NF4. QLoRA combines both, plus paged optimizers and gradient checkpointing, to drive training memory down dramatically.

Result: Llama-3-70B can be fine-tuned on a single H100 (or even a 48 GB A6000) with quality close to a full BF16 fine-tune. Democratized open-model fine-tuning more than any other technique.

Adapters can be merged into the base or kept separate for hot-swappable behaviors.

Examples

  • Fine-tuning Llama-3-70B on a domain corpus on a single A100.
  • Open-source projects that ship 70B+ fine-tunes on consumer hardware.

When to use qlora

When you want to fine-tune a frontier-sized open model on a single GPU.

Frequently asked

What is QLoRA?

QLoRA fine-tunes a 4-bit quantized base model with LoRA adapters, letting you train 70B-class models on a single 48 GB GPU at near-full fine-tuning quality.

What is an example of qlora?

Fine-tuning Llama-3-70B on a domain corpus on a single A100.

How is QLoRA related to LoRA?

QLoRA and LoRA are both training concepts. LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

When should I use qlora?

When you want to fine-tune a frontier-sized open model on a single GPU.

Is QLoRA considered advanced?

QLoRA is generally considered advanced-level material in the AI and LLM space.

Side-by-side comparisons

Sources