Comparison

Constitutional AI vs Reinforcement Learning from Human Feedback

Constitutional AI and Reinforcement Learning from Human Feedback are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Constitutional AI

Constitutional AI comes up when the question is fundamentally about safety & alignment.

A constitutional principle: "Choose the response that is least harmful and most helpful."

When you would reach for Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback comes up when the question is fundamentally about training.

ChatGPT trained with RLHF to refuse unsafe requests.

Frequently asked

What is the difference between Constitutional AI and Reinforcement Learning from Human Feedback?

Constitutional AI: Constitutional AI is Anthropic's alignment technique that uses a written set of principles ("constitution") plus AI feedback to shape model behavior instead of relying entirely on human labels. Reinforcement Learning from Human Feedback: RLHF fine-tunes an LLM to maximize a reward model that was itself trained on human preference judgments between candidate responses.

When should I use Constitutional AI vs Reinforcement Learning from Human Feedback?

Constitutional AI is the right concept when you are focused on safety & alignment. Reinforcement Learning from Human Feedback applies when you are focused on training.

Are Constitutional AI and Reinforcement Learning from Human Feedback the same thing?

No. Constitutional AI is safety & alignment; Reinforcement Learning from Human Feedback is training. They are related but address different parts of the AI stack.