Comparison
Alignment vs Reinforcement Learning from Human Feedback
Alignment and Reinforcement Learning from Human Feedback are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.
When you would reach for Alignment
Alignment comes up when the question is fundamentally about safety & alignment.
Tuning a model to refuse to help with bioweapon synthesis.
When you would reach for Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback comes up when the question is fundamentally about training.
ChatGPT trained with RLHF to refuse unsafe requests.
Frequently asked
What is the difference between Alignment and Reinforcement Learning from Human Feedback?
Alignment: Alignment is the problem of making an AI system pursue what humans actually want rather than the literal letter of its training objective. RLHF and Constitutional AI are alignment techniques. Reinforcement Learning from Human Feedback: RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.
When should I use Alignment vs Reinforcement Learning from Human Feedback?
Alignment is the right concept when you are focused on safety & alignment. Reinforcement Learning from Human Feedback applies when you are focused on training.
Are Alignment and Reinforcement Learning from Human Feedback the same thing?
No. Alignment is safety & alignment; Reinforcement Learning from Human Feedback is training. They are related but address different parts of the AI stack.