Training · advanced
QLoRA
QLoRA fine-tunes a 4-bit quantized base model with LoRA adapters, letting you train 70B-class models on a single 48 GB GPU at near-full fine-tuning quality.
Explanation
LoRA freezes the base model and trains tiny adapter matrices; quantization shrinks the frozen weights to 4-bit NF4. QLoRA combines both, plus paged optimizers and gradient checkpointing, to drive training memory down dramatically.
Result: Llama-3-70B can be fine-tuned on a single H100 (or even a 48 GB A6000) with quality close to a full BF16 fine-tune. Democratized open-model fine-tuning more than any other technique.
Adapters can be merged into the base or kept separate for hot-swappable behaviors.
Examples
- Fine-tuning Llama-3-70B on a domain corpus on a single A100.
- Open-source projects that ship 70B+ fine-tunes on consumer hardware.
When to use qlora
When you want to fine-tune a frontier-sized open model on a single GPU.
Frequently asked
What is QLoRA?
QLoRA fine-tunes a 4-bit quantized base model with LoRA adapters, letting you train 70B-class models on a single 48 GB GPU at near-full fine-tuning quality.
What is an example of qlora?
Fine-tuning Llama-3-70B on a domain corpus on a single A100.
How is QLoRA related to LoRA?
QLoRA and LoRA are both training concepts. LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.
When should I use qlora?
When you want to fine-tune a frontier-sized open model on a single GPU.
Is QLoRA considered advanced?
QLoRA is generally considered advanced-level material in the AI and LLM space.