Comparison

Jailbreak vs Red-Teaming

Jailbreak and Red-Teaming are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Jailbreak

Jailbreak comes up when the question is fundamentally about safety & alignment.

"DAN" ("Do Anything Now") prompts — early ChatGPT jailbreaks.

When you would reach for Red-Teaming

Red-Teaming comes up when the question is fundamentally about safety & alignment.

OpenAI's pre-release red team for GPT-4.

Frequently asked

What is the difference between Jailbreak and Red-Teaming?

Jailbreak: A jailbreak is a prompt that bypasses an LLM's safety training, getting it to produce content it would normally refuse. A perennial cat-and-mouse game with model providers. Red-Teaming: Red-teaming is the practice of deliberately trying to elicit dangerous, biased, or otherwise undesired behavior from an AI system, to surface problems before deployment.

When should I use Jailbreak vs Red-Teaming?

Jailbreak is the right concept when you are focused on safety & alignment. Red-Teaming applies when you are focused on safety & alignment.

Are Jailbreak and Red-Teaming the same thing?

No. Jailbreak is safety & alignment; Red-Teaming is safety & alignment. They are related but address different parts of the AI stack.