Skip to main content
ModelTerms

Comparison

Alignment vs Red-Teaming

Alignment and Red-Teaming are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Alignment

Alignment comes up when the question is fundamentally about safety & alignment.

Tuning a model to refuse to help with bioweapon synthesis.

When you would reach for Red-Teaming

Red-Teaming comes up when the question is fundamentally about safety & alignment.

OpenAI's pre-release red team for GPT-4.

Frequently asked

What is the difference between Alignment and Red-Teaming?

Alignment: Alignment is the problem of making an AI system pursue what humans actually want rather than the literal letter of its training objective. RLHF and Constitutional AI are alignment techniques. Red-Teaming: Red-teaming is the practice of deliberately trying to elicit dangerous, biased, or otherwise undesired behavior from an AI system, to surface problems before deployment.

When should I use Alignment vs Red-Teaming?

Alignment is the right concept when you are focused on safety & alignment. Red-Teaming applies when you are focused on safety & alignment.

Are Alignment and Red-Teaming the same thing?

No. Alignment is safety & alignment; Red-Teaming is safety & alignment. They are related but address different parts of the AI stack.