Comparison

Reinforcement Learning from Human Feedback vs Supervised Fine-Tuning

Reinforcement Learning from Human Feedback and Supervised Fine-Tuning are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback comes up when the question is fundamentally about training.

ChatGPT trained with RLHF to refuse unsafe requests.

When you would reach for Supervised Fine-Tuning

Supervised Fine-Tuning comes up when the question is fundamentally about training.

Training Llama-3-Base on Anthropic's HH-RLHF "chosen" responses as a first pass.

Frequently asked

What is the difference between Reinforcement Learning from Human Feedback and Supervised Fine-Tuning?

Reinforcement Learning from Human Feedback: RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses. Supervised Fine-Tuning: SFT is fine-tuning where each training example has an explicit input and a desired output, supervised by a loss that penalizes deviation from that output.

When should I use Reinforcement Learning from Human Feedback vs Supervised Fine-Tuning?

Reinforcement Learning from Human Feedback is the right concept when you are focused on training. Supervised Fine-Tuning applies when you are focused on training.

Are Reinforcement Learning from Human Feedback and Supervised Fine-Tuning the same thing?

No. Reinforcement Learning from Human Feedback is training; Supervised Fine-Tuning is training. They are related but address different parts of the AI stack.