Skip to main content
ModelTerms

Comparison

Alignment vs Reinforcement Learning from Human Feedback

Alignment and Reinforcement Learning from Human Feedback are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Alignment

Alignment comes up when the question is fundamentally about safety & alignment.

Tuning a model to refuse to help with bioweapon synthesis.

When you would reach for Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback comes up when the question is fundamentally about training.

ChatGPT trained with RLHF to refuse unsafe requests.

Frequently asked

What is the difference between Alignment and Reinforcement Learning from Human Feedback?

Alignment: Alignment is the problem of making an AI system pursue what humans actually want rather than the literal letter of its training objective. RLHF and Constitutional AI are alignment techniques. Reinforcement Learning from Human Feedback: RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.

When should I use Alignment vs Reinforcement Learning from Human Feedback?

Alignment is the right concept when you are focused on safety & alignment. Reinforcement Learning from Human Feedback applies when you are focused on training.

Are Alignment and Reinforcement Learning from Human Feedback the same thing?

No. Alignment is safety & alignment; Reinforcement Learning from Human Feedback is training. They are related but address different parts of the AI stack.