Training · advanced

QLoRA

QLoRA fine-tunes a 4-bit quantized base model with LoRA adapters, letting you train 70B-class models on a single 48 GB GPU at near-full fine-tuning quality.

Published May 30, 2026

Explanation

LoRA freezes the base model and trains tiny adapter matrices; quantization shrinks the frozen weights to 4-bit NF4. QLoRA combines both, plus paged optimizers and gradient checkpointing, to drive training memory down dramatically.

Result: Llama-3-70B can be fine-tuned on a single H100 (or even a 48 GB A6000) with quality close to a full BF16 fine-tune. Democratized open-model fine-tuning more than any other technique.

Adapters can be merged into the base or kept separate for hot-swappable behaviors.

Examples

Fine-tuning Llama-3-70B on a domain corpus on a single A100.
Open-source projects that ship 70B+ fine-tunes on consumer hardware.

When to use qlora

When you want to fine-tune a frontier-sized open model on a single GPU.

Frequently asked

What is QLoRA?

QLoRA fine-tunes a 4-bit quantized base model with LoRA adapters, letting you train 70B-class models on a single 48 GB GPU at near-full fine-tuning quality.

What is an example of qlora?

Fine-tuning Llama-3-70B on a domain corpus on a single A100.

How is QLoRA related to LoRA?

QLoRA and LoRA are both training concepts. LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

When should I use qlora?

When you want to fine-tune a frontier-sized open model on a single GPU.

Is QLoRA considered advanced?

QLoRA is generally considered advanced-level material in the AI and LLM space.

LoRATraining

LoRA is a parameter-efficient fine-tuning method that freezes a model's original weights and learns small low-rank update matrices alongside them. Cheap fine-tuning on a single GPU.

QuantizationInfrastructure

Quantization reduces model weights from 16- or 32-bit floats to lower-precision types (INT8, INT4) so the model needs less memory and runs faster, usually with minor quality loss.

Fine-tuningTraining

Fine-tuning continues training a pretrained model on a smaller, task-specific dataset, adjusting its weights to specialize behavior or knowledge.

BFloat16Infrastructure

BFloat16 is a 16-bit floating-point format with FP32's exponent range but only 8 bits of mantissa. The default precision for LLM training and most inference.

Side-by-side comparisons

Sources

QLoRA paper (arXiv)