Comparison

Reward Model vs Supervised Fine-Tuning

Reward Model and Supervised Fine-Tuning are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Reward Model

Reward Model comes up when the question is fundamentally about training.

Anthropic's preference model trained on HH-RLHF data.

When you would reach for Supervised Fine-Tuning

Supervised Fine-Tuning comes up when the question is fundamentally about training.

Training Llama-3-Base on Anthropic's HH-RLHF "chosen" responses as a first pass.

Frequently asked

What is the difference between Reward Model and Supervised Fine-Tuning?

Reward Model: A reward model scores model outputs the way humans would, learned from preference data. RLHF then optimizes the policy LLM to maximize the reward model's score. Supervised Fine-Tuning: SFT is fine-tuning where each training example has an explicit input and a desired output, supervised by a loss that penalizes deviation from that output.

When should I use Reward Model vs Supervised Fine-Tuning?

Reward Model is the right concept when you are focused on training. Supervised Fine-Tuning applies when you are focused on training.

Are Reward Model and Supervised Fine-Tuning the same thing?

No. Reward Model is training; Supervised Fine-Tuning is training. They are related but address different parts of the AI stack.