Comparison
Attention vs Self-Attention
Attention and Self-Attention are both common AI/LLM terms but cover different ideas. Here is a quick side-by-side.
When you would reach for Attention
Attention comes up when the question is fundamentally about architecture.
Translating "the bank by the river": attention helps "bank" attend more to "river" than to "money".
When you would reach for Self-Attention
Self-Attention comes up when the question is fundamentally about architecture.
In a sentence about a pronoun, self-attention links "it" to its antecedent.
Frequently asked
What is the difference between Attention and Self-Attention?
Attention: Attention is the mechanism a transformer uses to decide which earlier tokens matter most when producing each new one. It mixes information across positions by weighted sum. Self-Attention: Self-attention is attention applied within a single sequence: each token attends to every other token in the same input, including itself.
When should I use Attention vs Self-Attention?
Attention is the right concept when you are focused on architecture. Self-Attention applies when you are focused on architecture.
Are Attention and Self-Attention the same thing?
No. Attention is architecture; Self-Attention is architecture. They are related but address different parts of the AI stack.