Multimodal · intermediate
Diffusion Model (diffusion)
Diffusion models generate images (and now video, audio) by learning to reverse a step-by-step noising process. Starting from pure noise, they denoise back into a coherent sample.
Explanation
Training: take a clean image, progressively add noise across many steps, train a network to predict the noise added at each step. Inference: start from pure noise and iteratively run the denoiser in reverse, optionally conditioned on a text prompt.
Stable Diffusion, DALL-E, Midjourney, FLUX, and most image-generation services use diffusion (or a close relative like flow matching). Recent work has scaled diffusion to video (Sora, Veo) and audio (Suno, Udio).
Examples
- Stable Diffusion generating an image from "a photo of an astronaut on a horse."
- Sora generating short video clips from a text description.
Frequently asked
What is Diffusion Model?
Diffusion models generate images (and now video, audio) by learning to reverse a step-by-step noising process. Starting from pure noise, they denoise back into a coherent sample.
What is an example of diffusion model?
Stable Diffusion generating an image from "a photo of an astronaut on a horse."
How is Diffusion Model related to Generative AI?
Diffusion Model and Generative AI are both multimodal concepts. Generative AI refers to models that produce new content — text, images, audio, video, or code — rather than classifying or predicting from a fixed set of labels.
Is Diffusion Model considered intermediate?
Diffusion Model is generally considered intermediate-level material in the AI and LLM space.