Skip to main content
ModelTerms

Comparison

Alignment vs Jailbreak

Alignment and Jailbreak are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Alignment

Alignment comes up when the question is fundamentally about safety & alignment.

Tuning a model to refuse to help with bioweapon synthesis.

When you would reach for Jailbreak

Jailbreak comes up when the question is fundamentally about safety & alignment.

"DAN" ("Do Anything Now") prompts — early ChatGPT jailbreaks.

Frequently asked

What is the difference between Alignment and Jailbreak?

Alignment: Alignment is the problem of making an AI system pursue what humans actually want rather than the literal letter of its training objective. RLHF and Constitutional AI are alignment techniques. Jailbreak: A jailbreak is a prompt that bypasses an LLM's safety training, getting it to produce content it would normally refuse. A perennial cat-and-mouse game with model providers.

When should I use Alignment vs Jailbreak?

Alignment is the right concept when you are focused on safety & alignment. Jailbreak applies when you are focused on safety & alignment.

Are Alignment and Jailbreak the same thing?

No. Alignment is safety & alignment; Jailbreak is safety & alignment. They are related but address different parts of the AI stack.