Skip to main content
ModelTerms

Comparison

Attention vs Self-Attention

Attention and Self-Attention are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.

When you would reach for Attention

Attention comes up when the question is fundamentally about architecture.

Translating "the bank by the river": attention helps "bank" attend more to "river" than to "money".

When you would reach for Self-Attention

Self-Attention comes up when the question is fundamentally about architecture.

In a sentence about a pronoun, self-attention links "it" to its antecedent.

Frequently asked

What is the difference between Attention and Self-Attention?

Attention: Attention is the mechanism a transformer uses to decide which earlier tokens matter most when producing each new one. It mixes information across positions by weighted sum. Self-Attention: Self-attention is attention applied within a single sequence: each token attends to every other token in the same input, including itself.

When should I use Attention vs Self-Attention?

Attention is the right concept when you are focused on architecture. Self-Attention applies when you are focused on architecture.

Are Attention and Self-Attention the same thing?

No. Attention is architecture; Self-Attention is architecture. They are related but address different parts of the AI stack.